Bug 3636 - SMBD processes build out of control until service becomes unavailable to clients
SMBD processes build out of control until service becomes unavailable to clients
Status: NEW
Product: Samba 3.0
Classification: Unclassified
Component: File Services
3.0.20b
x86 Linux
: P3 normal
: none
Assigned To: Samba Bugzilla Account
Samba QA Contact
:
: 3638 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-03-28 12:08 UTC by Matt Lung
Modified: 2006-04-20 09:31 UTC (History)
1 user (show)

See Also:


Attachments
Log Level 10 at time of crash on 03/31/2006 (26.65 KB, text/plain)
2006-04-03 04:59 UTC, Matt Lung
no flags Details
List of open file descriptors (15.04 KB, text/plain)
2006-04-03 05:01 UTC, Matt Lung
no flags Details
Output of running processes (2.13 KB, text/plain)
2006-04-03 05:04 UTC, Matt Lung
no flags Details
Smbstatus output at time of crash on 03/31/2006 (17.71 KB, text/plain)
2006-04-03 05:05 UTC, Matt Lung
no flags Details
Strace on main SMBD process and a newer SMBD process from 03/31/2006 crash (8.02 KB, text/plain)
2006-04-03 05:14 UTC, Matt Lung
no flags Details
Our Smb.conf file (1.27 KB, text/plain)
2006-04-03 05:27 UTC, Matt Lung
no flags Details
Samba Log - set at level 10 on 04/11/2006 Crash (965.93 KB, application/x-zip-compressed)
2006-04-12 08:06 UTC, Matt Lung
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matt Lung 2006-03-28 12:08:41 UTC
Using samba 3.0.20b as a domain member file server.  Server is a Dell PowerEdge 700, 2.4 GHz P4 w/ HT, 1GB RAM.  Kernel version is kernel-2.6.12-1.1381_FC3, running on Fedora Core 3.  We have been upgrading since samba-3.0.14a to where we are now.  

Around 50 users connect to this server daily and it serves as a file server and postgres database server.  We are sharing MS access frontends that connect to the postgres db.  It also shares out various excel, word, and office files.  Our Peach Tree accounting software is also stored on this server.  

The problem we are seeing happen over and over again is smbd processes will grow and grow until the samba service becomes unavailable and stops responding.  All other services on the server still function however.  It is only samba that is crashing.  Restarting the samba service does not work and we end up rebooting the server to get samba working again.  

I wish I could say this happened at regular intervals but that is not the case.  I've seen it crash and then run for months on end, and I've also seen it crash, and then crash right away the next day.  Sometimes it lasts for 2weeks, sometimes it makes it a month.  This has been happening for the last year now.  We have tried several parameter changes such as Kernel Oplocks = off, machine password timeout = 0, and setting deadtime = 15.  None of which helped.  Blow is an error I finaly captured, but if it is relevent to what is going on I don't know.  Since I don't know when this is going to happen next I will try and update with an strace next time it crashes.  Please advise.

--------
Error from log level  = 3 on 3/20/06 (but not the day the server crashed, that was on 03/22/06):

[2006/03/20 15:14:05, 0] lib/util.c:smb_panic2(1548)
 PANIC: internal error
[2006/03/20 15:14:05, 0] lib/util.c:smb_panic2(1556)
 BACKTRACE: 22 stack frames:
  #0 smbd(smb_panic2+0x8a) [0xb7e4fe03]
  #1 smbd(smb_panic+0x19) [0xb7e50037]
  #2 smbd [0xb7e3bef1]
  #3 [0xb7c76420]
  #4 smbd(cli_start_connection+0x37e) [0xb7d32427]
  #5 smbd(cli_full_connection+0x6a) [0xb7d32573]
  #6 smbd(enumerate_domain_trusts+0x145) [0xb7e9a45a]
  #7 smbd(update_trustdom_cache+0xdd) [0xb7e99f3b]
  #8 smbd(is_trusted_domain+0x65) [0xb7e94519]
  #9 smbd(make_user_info_map+0x163) [0xb7e94761]
  #10 smbd [0xb7e95367]
  #11 smbd [0xb7d5870f]
  #12 smbd(ntlmssp_update+0x143) [0xb7d57c41]
  #13 smbd(auth_ntlmssp_update+0x44) [0xb7e95726]
  #14 smbd [0xb7cefaba]
  #15 smbd(reply_sesssetup_and_X+0x4f1) [0xb7cf1069]
  #16 smbd [0xb7d1cfa3]
  #17 smbd(process_smb+0x19b) [0xb7d1d3c8]
  #18 smbd(smbd_process+0x13a) [0xb7d1e26d]
  #19 smbd(main+0x91e) [0xb7ed8455]
  #20 /lib/tls/libc.so.6(__libc_start_main+0xd3) [0xb78b1e23]
  #21 smbd [0xb7cb4e41]
Comment 1 Matt Lung 2006-04-03 04:59:51 UTC
Created attachment 1831 [details]
Log Level 10 at time of crash on 03/31/2006

This is a snip of the log when the server crashed.  The message in the log repeated consistantly after the crash.
Comment 2 Matt Lung 2006-04-03 05:01:50 UTC
Created attachment 1832 [details]
List of open file descriptors

List of open file descriptors at time of crash on 03/31/2006
Comment 3 Matt Lung 2006-04-03 05:04:26 UTC
Created attachment 1833 [details]
Output of running processes

Output of runing processes at time of crash on 03/31/2006.  Seems like a normal amount of processes this time, but it still crashed.
Comment 4 Matt Lung 2006-04-03 05:05:38 UTC
Created attachment 1834 [details]
Smbstatus output at time of crash on 03/31/2006

Smbstatus output at time of crash on 03/31/2006.
Comment 5 Matt Lung 2006-04-03 05:14:00 UTC
Created attachment 1835 [details]
Strace on main SMBD process and a newer SMBD process from 03/31/2006 crash
Comment 6 Matt Lung 2006-04-03 05:23:12 UTC
Added some additional debug attachments after newest crash on 03/31/2006.  

Hope the debug info provides some help to getting this resolved.  

Comment 7 Matt Lung 2006-04-03 05:27:38 UTC
Created attachment 1836 [details]
Our Smb.conf file

Nothing spectacular.  The same config is running on 5 other servers here with no problems.
Comment 8 Gerald (Jerry) Carter 2006-04-08 11:30:19 UTC
*** Bug 3638 has been marked as a duplicate of this bug. ***
Comment 9 Matt Lung 2006-04-12 08:06:24 UTC
Created attachment 1853 [details]
Samba Log - set at level 10 on 04/11/2006 Crash

Smbd crashed again on 04/11/2006.  

Others have posted on the list about this same thing happening.  We were asked to submit a bug, and it's been submitted.  I would hope that it gets looked at soon.  If I'm not giving enough information please let me know in detail what you would like to see.  

I'm willing to work with whomever takes this on to get this fixed.  

Thanks
Comment 10 Volker Lendecke 2006-04-12 08:56:36 UTC
There's no crash in samba.log.crash.041106...

Volker
Comment 11 Matt Lung 2006-04-12 09:39:39 UTC
(In reply to comment #10)
> There's no crash in samba.log.crash.041106...
> 
> Volker
> 

I didn't really see anything screaming crash either, but then again half of what I'm looking at in the log I don't understand.  Something has to be happening because samba stops working.  It just stops working.  We can not access shares after it crashes, we cannot restart samba using the normal restart command.  That becomes unresponsive.  The only way I can get it up and going quick is to just restart the server.  At the point samba stops working the rest of the servers functions still are running and working.  Samba just freezes.    

What else can I do or try to give more info or attempt to fix this?  

Comment 12 Matt Lung 2006-04-12 13:58:45 UTC
Just crashed again.  One day later.  04/12/06

(In reply to comment #11)
> (In reply to comment #10)
> > There's no crash in samba.log.crash.041106...
> > 
> > Volker
> > 
> 
> I didn't really see anything screaming crash either, but then again half of
> what I'm looking at in the log I don't understand.  Something has to be
> happening because samba stops working.  It just stops working.  We can not
> access shares after it crashes, we cannot restart samba using the normal
> restart command.  That becomes unresponsive.  The only way I can get it up and
> going quick is to just restart the server.  At the point samba stops working
> the rest of the servers functions still are running and working.  Samba just
> freezes.    
> 
> What else can I do or try to give more info or attempt to fix this?  
> 

Comment 13 Matt Lung 2006-04-17 07:26:54 UTC
Moving applications off the server to another server to isolate the problem.  May be one of our applications hanging the server.
Comment 14 Volker Lendecke 2006-04-17 07:30:35 UTC
Just an idea: Can you start winbind on that box (no nss_winbind necessary)? Maybe that helps as a work-around. We still have to find/fix that bug though.

Volker
Comment 15 Matt Lung 2006-04-17 08:44:08 UTC
Winbind was not started on the server, but it is not started on any of our other servers either.  The rest are not crashing like this one though.  I'll start it to see if that makes a difference.  We have also moved our PeachTree accounting software to a different server.  It has been having problems lately and it seems to spawn the most connections.  Thanks.  We will try this for now.
Comment 16 Gerald (Jerry) Carter 2006-04-20 08:03:40 UTC
severity should be determined by the developers and not the reporter.
Comment 17 Matt Lung 2006-04-20 09:31:24 UTC
(In reply to comment #16)
> severity should be determined by the developers and not the reporter.
> 

I'll leave that alone next time then.  Or maybe it should be removed from the reporters submission form.