I am running Debian 6 (Linux 2.6.32-5-orion5x) on q qnap nas 109 (armv5tel). After a few hours use smbd seems to reliably fail, ceasing to accept any new connections though it continues to serve existing connections. restarting the daemon using service samba restart does nothing, but killing the process using kill -9 on the first entry found using ps -e | grep smb and then issuing service samba start reliably solves the problem. When listing ps -e | grep smb there are sometimes multiple smbd <defunct> entries. I have only seen this after a failure. The failure seems to occur even if no samba connections have been made (typically within 24 hours) but I seem to be able to make this happen more quickly by either: * running rsync on the samba drives (but connected locally, not over samba), typically causes the failure within minutes or an hour * running dozens of concurrent "smbclient -L" requests causes the failure within minutes reliably. I have reinstalled debian twice with the same problems observed. I have run with an empty smb.conf and observed the same problems.
(In reply to comment #0) > I am running Debian 6 (Linux 2.6.32-5-orion5x) on q qnap nas 109 (armv5tel). > > After a few hours use smbd seems to reliably fail, ceasing to accept any new > connections though it continues to serve existing connections. > > restarting the daemon using > > service samba restart > > does nothing, but killing the process using kill -9 on the first entry found > using ps -e | grep smb and then issuing service samba start reliably solves the > problem. > > When listing ps -e | grep smb there are sometimes multiple smbd <defunct> > entries. I have only seen this after a failure. > > The failure seems to occur even if no samba connections have been made > (typically within 24 hours) but I seem to be able to make this happen more > quickly by either: > > * running rsync on the samba drives (but connected locally, not over samba), > typically causes the failure within minutes or an hour > * running dozens of concurrent "smbclient -L" requests causes the failure > within minutes reliably. > > I have reinstalled debian twice with the same problems observed. I have run > with an empty smb.conf and observed the same problems. Sorry - I should add the log files show nothing unusual (even on level 10) and I have also checked the log files of every other service, and tried restarting every other service.
I should add the way I am checking whether connections fail (in addition to normal experimentation) is the use of smbclient -L 127.0.0.1 -Uuser%pass which works normally, until it starts to fail when I get: protocol negotiation failed: NT_STATUS_IO_TIMEOUT Once it stops working, it never starts working again until I reboot the server or issue a kill -9
When it's in that state, can you do a strace -ttT -f -p <pid> -o /tmp/smbd.strace with <pid> being the parent smbd and watch a reconnect? Please upload /tmp/smbd.strace (potentially after bzip2 -9). Thanks, Volker
Created attachment 7529 [details] strace -ttT -f -p 610 -o /tmp/smbd.strace generated by running: strace -ttT -f -p 610 -o /tmp/smbd.strace pid determined using both ps -e and more /var/run/samba/smbd.pid Then connected to samba from a remote device which times out. Log appears though to be nearly empty.
Created attachment 7530 [details] strace -ttT -o /tmp/smbclient.strace /usr/bin/smbclient -L 127.0.0.1 -Uuser%password As another option for strace I ran it whilst trying to connect locally using /usr/bin/smbclient -L 127.0.0.1 -Uuser%password I edited file to remove my password
Hi. Thank you for your quick reply. I ran the command as requested by it produced very little, so I tried an alternative that I've seen listed elsewhere. Please let me know if you'd like me to retry your original request. Leo
It hangs in the futex call. This stronly points at a threading problem. Samba itself does not use threads (unless you are using the aio_pthread module), so the problem must come from some library. Next step: Can you get us a gdb backtrace from the parent? gdb /usr/sbin/smbd --pid=<parent-smbd> at the prompt please do a "bt full" and get us the output. It might be necessary to also have debugging symbols installed. Volker
I get the following info: warning: The current binary is a PIE (Position Independent Executable), which GDB does NOT currently support. Most debugger features will fail if used in this session. Reading symbols from /usr/sbin/smbd...(no debugging symbols found)...done. Attaching to program: /usr/sbin/smbd, process 610 0x4039bd58 in ?? () (gdb) bt full #0 0x4039bd58 in ?? () No symbol table info available. #1 0x4039bd40 in ?? () No symbol table info available. Backtrace stopped: previous frame identical to this frame (corrupt stack?) I therefore suspect I need to install debugging symbols. Is there anything you could point me at so I know how to do that? Incidentally samba was installed from the debian repository.
No idea, sorry. But we definitely need that correct backtrace information.
In order to get gdb to process the PIE (Position Independent Executable) file I had to upgrade it. It did this by updating to debian unstable (sid). on installing gdb I was forced to resolve a dependency with python, which in turn seemed to upgrade a whole bunch of stuff (not sure why). Since that upgrade I have thrown hundreds of concurrent connections, and run an rysnc for 24 hours and still non managed to make it stop accepting new connections (assumed fixed). For info the complete list of things that were upgraded is show below Leo gdb 7.4.1-1 newer than version in archive krb5-multidev 1.10+dfsg~beta1-2.1 newer than version in archive libacl1 2.2.51-5 newer than version in archive libattr1 1:2.4.46-5 newer than version in archive libexpat1 2.1.0-1 newer than version in archive libgssapi-krb5-2 1.10+dfsg~beta1-2.1 newer than version in archive libgssrpc4 1.10+dfsg~beta1-2.1 newer than version in archive libk5crypto3 1.10+dfsg~beta1-2.1 newer than version in archive libkrb5-3 1.10+dfsg~beta1-2.1 newer than version in archive libkrb5-dev 1.10+dfsg~beta1-2.1 newer than version in archive libkrb5support0 1.10+dfsg~beta1-2.1 newer than version in archive libtalloc2 2.0.7+git20120207-1 newer than version in archive libtdb1 1.2.10-1 newer than version in archive libwbclient0 2:3.6.5-1 newer than version in archive openssh-client 1:5.9p1-5 newer than version in archive openssh-server 1:5.9p1-5 newer than version in archive samba 2:3.6.5-1 newer than version in archive samba-common 2:3.6.5-1 newer than version in archive samba-dbg 2:3.6.5-1 newer than version in archive smbclient 2:3.6.5-1 newer than version in archive tdb-tools 1.2.10-1 newer than version in archive winbind 2:3.6.5-1 newer than version in archive
Closing as fixed so that it does not stay around in case you can not reproduce it. Please re-open this bug if you got the backtrace. Thanks, Volker