Bug 2157 - tdb corruption proven by tdbtorture
Summary: tdb corruption proven by tdbtorture
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: File Services (show other bugs)
Version: 3.0.9
Hardware: All Linux
: P3 major
Target Milestone: none
Assignee: Samba Bugzilla Account
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-12-14 05:55 UTC by Björn Jacke
Modified: 2005-09-27 15:06 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Björn Jacke 2004-12-14 05:55:32 UTC
I've seen Samba serverrs, where files are still locked though the corresponding
smbd process doesn't exist anymore. First I thought it would be a reiserfs
problem but the problem also occured on ext3. I did parrallel tdbtorture test on
different fiesystems and I noticed that on all tested filesystems (reiser3,
reiser4, ext2/3, xfs) tdbtorture sooner or later throws fatal error messages like:

rec_read bad magic 0x42424242 at offset=776

It looks like there are some problems in the tdb code.
Comment 1 Jeremy Allison 2004-12-14 11:02:08 UTC
Ok, I'm trying to reproduce this. Any more details on how long it takes to
reproduce with tdbtorture ?

Jeremy.
Comment 2 Jeremy Allison 2004-12-14 11:16:02 UTC
I can't reproduce this on a Fedora core1 (2.4.22 kernel) with ext3 + ACL patches.
What kernel are you reproducing tdbtorture corruptions on ?

Have you tried modifying it to only use pread/pwrite rather than mmap ?

Jeremy.
Comment 3 Björn Jacke 2004-12-14 11:33:09 UTC
I was using SuSE 9.2 kernel (2.6.8 based) on a 1.4GHz x86 system and I ran 4 to
6 tdb tortures parallel on the same filesystem in different directories,
starting them again and again, it took about 10 minutes to see those errors. The
filesystems were freshly created and contained just the torture files. I did not
modify the torture test, if you want I can do whatever modifications you want
and retry the test.
Comment 4 Jeremy Allison 2004-12-14 15:09:33 UTC
I'd like you to remove the -DHAVE_MMAP=1 from the standalone compile
and then try and reproduce the error. This will tell me if it's in the
kernel or in the tdb libraries.
Jeremy.
Comment 5 Gerald (Jerry) Carter (dead mail address) 2005-02-17 10:07:53 UTC
I this ended up being a bug in the torture test wasn't it ?
Comment 6 Björn Jacke 2005-02-18 03:06:07 UTC
even without the -DHAVE_MMAP=1 I got those errors but I didn't investigate here
much deeper. I will try once more later with a recent samba version and newer
kernel and keep you up-to-date here. I can't see why this should be a tdbtorture
bug, this might however explain, why some people have problems with locked files
not being "unlocked" with recent samba versions.
Comment 7 Jeremy Allison 2005-02-18 13:26:11 UTC
No I know what this is. I tracked down the problem but then didn't update
the bug (sorry). It's something we wouldn't run into in smbd or the rest of
Samba but a problem in the way tdbtorture uses the tdb library (it allows
a re-open race).
I'll update this when I have more time.
Jeremy.
Comment 8 Gerald (Jerry) Carter (dead mail address) 2005-09-27 14:45:33 UTC
Didn't this get fixed now?  Intdbtorture at least?
Comment 9 Jeremy Allison 2005-09-27 15:06:56 UTC
Yes this got fixed by tridge's changes to tdb.c - integrated in 3.0.20a.
Jeremy.