The Samba-Bugzilla – Bug 3636
SMBD processes build out of control until service becomes unavailable to clients
Last modified: 2006-04-20 09:31:24 UTC
Using samba 3.0.20b as a domain member file server. Server is a Dell PowerEdge 700, 2.4 GHz P4 w/ HT, 1GB RAM. Kernel version is kernel-2.6.12-1.1381_FC3, running on Fedora Core 3. We have been upgrading since samba-3.0.14a to where we are now.
Around 50 users connect to this server daily and it serves as a file server and postgres database server. We are sharing MS access frontends that connect to the postgres db. It also shares out various excel, word, and office files. Our Peach Tree accounting software is also stored on this server.
The problem we are seeing happen over and over again is smbd processes will grow and grow until the samba service becomes unavailable and stops responding. All other services on the server still function however. It is only samba that is crashing. Restarting the samba service does not work and we end up rebooting the server to get samba working again.
I wish I could say this happened at regular intervals but that is not the case. I've seen it crash and then run for months on end, and I've also seen it crash, and then crash right away the next day. Sometimes it lasts for 2weeks, sometimes it makes it a month. This has been happening for the last year now. We have tried several parameter changes such as Kernel Oplocks = off, machine password timeout = 0, and setting deadtime = 15. None of which helped. Blow is an error I finaly captured, but if it is relevent to what is going on I don't know. Since I don't know when this is going to happen next I will try and update with an strace next time it crashes. Please advise.
Error from log level = 3 on 3/20/06 (but not the day the server crashed, that was on 03/22/06):
[2006/03/20 15:14:05, 0] lib/util.c:smb_panic2(1548)
PANIC: internal error
[2006/03/20 15:14:05, 0] lib/util.c:smb_panic2(1556)
BACKTRACE: 22 stack frames:
#0 smbd(smb_panic2+0x8a) [0xb7e4fe03]
#1 smbd(smb_panic+0x19) [0xb7e50037]
#2 smbd [0xb7e3bef1]
#4 smbd(cli_start_connection+0x37e) [0xb7d32427]
#5 smbd(cli_full_connection+0x6a) [0xb7d32573]
#6 smbd(enumerate_domain_trusts+0x145) [0xb7e9a45a]
#7 smbd(update_trustdom_cache+0xdd) [0xb7e99f3b]
#8 smbd(is_trusted_domain+0x65) [0xb7e94519]
#9 smbd(make_user_info_map+0x163) [0xb7e94761]
#10 smbd [0xb7e95367]
#11 smbd [0xb7d5870f]
#12 smbd(ntlmssp_update+0x143) [0xb7d57c41]
#13 smbd(auth_ntlmssp_update+0x44) [0xb7e95726]
#14 smbd [0xb7cefaba]
#15 smbd(reply_sesssetup_and_X+0x4f1) [0xb7cf1069]
#16 smbd [0xb7d1cfa3]
#17 smbd(process_smb+0x19b) [0xb7d1d3c8]
#18 smbd(smbd_process+0x13a) [0xb7d1e26d]
#19 smbd(main+0x91e) [0xb7ed8455]
#20 /lib/tls/libc.so.6(__libc_start_main+0xd3) [0xb78b1e23]
#21 smbd [0xb7cb4e41]
Created attachment 1831 [details]
Log Level 10 at time of crash on 03/31/2006
This is a snip of the log when the server crashed. The message in the log repeated consistantly after the crash.
Created attachment 1832 [details]
List of open file descriptors
List of open file descriptors at time of crash on 03/31/2006
Created attachment 1833 [details]
Output of running processes
Output of runing processes at time of crash on 03/31/2006. Seems like a normal amount of processes this time, but it still crashed.
Created attachment 1834 [details]
Smbstatus output at time of crash on 03/31/2006
Smbstatus output at time of crash on 03/31/2006.
Created attachment 1835 [details]
Strace on main SMBD process and a newer SMBD process from 03/31/2006 crash
Added some additional debug attachments after newest crash on 03/31/2006.
Hope the debug info provides some help to getting this resolved.
Created attachment 1836 [details]
Our Smb.conf file
Nothing spectacular. The same config is running on 5 other servers here with no problems.
*** Bug 3638 has been marked as a duplicate of this bug. ***
Created attachment 1853 [details]
Samba Log - set at level 10 on 04/11/2006 Crash
Smbd crashed again on 04/11/2006.
Others have posted on the list about this same thing happening. We were asked to submit a bug, and it's been submitted. I would hope that it gets looked at soon. If I'm not giving enough information please let me know in detail what you would like to see.
I'm willing to work with whomever takes this on to get this fixed.
There's no crash in samba.log.crash.041106...
(In reply to comment #10)
> There's no crash in samba.log.crash.041106...
I didn't really see anything screaming crash either, but then again half of what I'm looking at in the log I don't understand. Something has to be happening because samba stops working. It just stops working. We can not access shares after it crashes, we cannot restart samba using the normal restart command. That becomes unresponsive. The only way I can get it up and going quick is to just restart the server. At the point samba stops working the rest of the servers functions still are running and working. Samba just freezes.
What else can I do or try to give more info or attempt to fix this?
Just crashed again. One day later. 04/12/06
(In reply to comment #11)
> (In reply to comment #10)
> > There's no crash in samba.log.crash.041106...
> > Volker
> I didn't really see anything screaming crash either, but then again half of
> what I'm looking at in the log I don't understand. Something has to be
> happening because samba stops working. It just stops working. We can not
> access shares after it crashes, we cannot restart samba using the normal
> restart command. That becomes unresponsive. The only way I can get it up and
> going quick is to just restart the server. At the point samba stops working
> the rest of the servers functions still are running and working. Samba just
> What else can I do or try to give more info or attempt to fix this?
Moving applications off the server to another server to isolate the problem. May be one of our applications hanging the server.
Just an idea: Can you start winbind on that box (no nss_winbind necessary)? Maybe that helps as a work-around. We still have to find/fix that bug though.
Winbind was not started on the server, but it is not started on any of our other servers either. The rest are not crashing like this one though. I'll start it to see if that makes a difference. We have also moved our PeachTree accounting software to a different server. It has been having problems lately and it seems to spawn the most connections. Thanks. We will try this for now.
severity should be determined by the developers and not the reporter.
(In reply to comment #16)
> severity should be determined by the developers and not the reporter.
I'll leave that alone next time then. Or maybe it should be removed from the reporters submission form.