The Samba-Bugzilla – Bug 6648
smbd/ctdb infinite loop on unlink of non-existent file (stat()=ENOENT), reproduce with smbtorture BASE-BENCH-TORTURE
Last modified: 2009-09-29 08:16:44 UTC
Running two-node Samba I can get smbd into an infinite loop where it continues to try to delete a non-existent file (stat( "torture.lck" ) returns ENOENT). This occurs *after* the end of the smbtorture program ... one or two smbd processes can be stuck in the "deleting the file" loop described here.
Samba is 3.3.7 with CTDB, downloaded as "rsync -avz samba.org::ftp/unpacked/ctdb .", using a hacked "vfs objects = fileid"/vfs_fileid.c to work on FreeBSD which uses different calls to get info on mounted file systems. CTDB was "hacked" to take out the IP-takeover calls/functionality which do not work on FreeBSD.
Might be more appropriately labelled as a CTDB bug, but the problem seems to be in smbd rather than any ctdb code. Could be an interaction trouble.
Works fine against single-node Windows.
Works fine on single-node Centos/Linux (Linux meddy-centos-1 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64 x86_64 x86_64 GNU/Linux) with Samba (3.3.4).
Does not work when run against Samba/CTDB running on a single node CTDB cluster.
Does not work when run against a single node of a two-node CTDB cluster with Samba 3.3.7 on FreeBSD 7.0, amd64.
/opt/samba-4.0.0alpha7/source4/bin/smbtorture //10.0.10.10/smbtorture --failures 0 --user XXXXXX --password XXXXXXXX --num-ops 100 --debuglevel 5 BASE-BENCH-TORTURE
Only seems to happen at higher values of "--num-ops". Works at the default of "10", gets into trouble with "100".
In the process of servicing the delete request, all of the stat() command set errno=2 (ENOENT) and the situation percolates back to smbd/reply.c:2638 where the code realizes that "open_was_deferred()" == True and no error is returned back to the caller.
I have text from a couple GDB sessions and debuglevel 10 and 5 log files of the situation to attach to this bug.
Recovering from the situation:
The problem stems from trying to delete a file that is not there (it used to exist, but has been deleted). If I re-create that file, then the spinning smbd process recovers.
Created attachment 4573 [details]
GDB session stepping through smbd that is exhibiting the "file isn't there, but I'm trying to delete it" behavior
Created attachment 4574 [details]
Another GDB session stepping through smbd that is exhibiting the "file isn't there, but I'm trying to delete it" behavior
Created attachment 4575 [details]
Samba log at debug level 5 for an smbd showing the buggy behavior.
Created attachment 4576 [details]
Samba log at debug level 10 for an smbd showing the buggy behavior.
Created attachment 4578 [details]
Output of "truss" for the loop that smbd is stuck in.
Any traction on this one?
Should it be submitted to CTDB?
Is there any more information that I could provide?
Well, it's just that we're all too busy, and this bug seems like at least a few hours of reproducing. Without some external pressure (like money for example :-)) I pick the more low-hanging fruit first :-))
Understandable. I was just checking. :-)
I'll see about building a patch from my CTDB changes (that get it partially working on FreeBSD) to help speed things along. (It just occurred to me that those patches would be particularly useful to reproducing the situation! Although any underlying/latent logic error is probably still there.)
What clustered file system are you using on FreeBSD?
Using a clustered (e.g., it does locking correctly between participating nodes, passes SPEC08) NFSv3 file server as backend storage.