I've seen Samba serverrs, where files are still locked though the corresponding
smbd process doesn't exist anymore. First I thought it would be a reiserfs
problem but the problem also occured on ext3. I did parrallel tdbtorture test on
different fiesystems and I noticed that on all tested filesystems (reiser3,
reiser4, ext2/3, xfs) tdbtorture sooner or later throws fatal error messages like:
rec_read bad magic 0x42424242 at offset=776
It looks like there are some problems in the tdb code.
Ok, I'm trying to reproduce this. Any more details on how long it takes to
reproduce with tdbtorture ?
I can't reproduce this on a Fedora core1 (2.4.22 kernel) with ext3 + ACL patches.
What kernel are you reproducing tdbtorture corruptions on ?
Have you tried modifying it to only use pread/pwrite rather than mmap ?
I was using SuSE 9.2 kernel (2.6.8 based) on a 1.4GHz x86 system and I ran 4 to
6 tdb tortures parallel on the same filesystem in different directories,
starting them again and again, it took about 10 minutes to see those errors. The
filesystems were freshly created and contained just the torture files. I did not
modify the torture test, if you want I can do whatever modifications you want
and retry the test.
I'd like you to remove the -DHAVE_MMAP=1 from the standalone compile
and then try and reproduce the error. This will tell me if it's in the
kernel or in the tdb libraries.
I this ended up being a bug in the torture test wasn't it ?
even without the -DHAVE_MMAP=1 I got those errors but I didn't investigate here
much deeper. I will try once more later with a recent samba version and newer
kernel and keep you up-to-date here. I can't see why this should be a tdbtorture
bug, this might however explain, why some people have problems with locked files
not being "unlocked" with recent samba versions.
No I know what this is. I tracked down the problem but then didn't update
the bug (sorry). It's something we wouldn't run into in smbd or the rest of
Samba but a problem in the way tdbtorture uses the tdb library (it allows
a re-open race).
I'll update this when I have more time.
Didn't this get fixed now? Intdbtorture at least?
Yes this got fixed by tridge's changes to tdb.c - integrated in 3.0.20a.