Hello all, I believe this to be a bug but I'm not quite sure. I figure it's up to the experts to decide. I've noticed that Windows XP clients that connect to my Samba server generate *tons* of smbd processes, with one or more of a few consequences: users lose network connectivity, Samba requests time out, and the Samba server seems to grind to a near halt. These smbd processes don't go away even if Samba is stopped with /etc/init.d/samba stop... I have to kill -9 them. They all end up with root as the owner. This only happens from XP machines. The logs don't seem to yield anything useful. It may be interesting to note that smbstatus sometimes shows users with multiple IPC$'s. I duplicated this in a test environment on a completely separate network. The Samba server is a Debian Sarge box. If you need any more information, I will do my best to provide it. I've not heard back from Jeremy Allison regarding this issue, and figuring it might be a bug, thought this to be the next best step. Please advise if further information is needed. Thanks!
This is for a production environment, and the customer is getting very impatient. I really like Samba, so I want to stick with it and avoid going the AD route :(
It should be noted that this differs from 3636 because Samba doesn't actually crash...it just gets overwhelmed. Also, this has been tested on two different kernels...2.6.5 and 2.6.8.
(In reply to comment #1)
> This is for a production environment, and the customer is getting very
> impatient. I really like Samba, so I want to stick with it and avoid going the
> AD route :(
Oops...didn't mean to say AD...meant M$. Silly acronyms :)
Here's some output that might help... I thought it interesting that the virtual size in kb is roughly the same for all of the stale processes that build up. This is just a small snippet:
root 6110 0.0 0.7 10608 3768 ? S 10:47 0:00 /usr/sbin/smbd -D
root 6143 0.0 0.7 10608 3764 ? S 10:48 0:00 /usr/sbin/smbd -D
root 6145 0.0 0.7 10608 3932 ? S 10:48 0:00 /usr/sbin/smbd -D
root 6163 0.0 0.7 10608 3764 ? S 10:50 0:00 /usr/sbin/smbd -D
root 6180 0.0 0.7 10608 3764 ? S 10:51 0:00 /usr/sbin/smbd -D
root 6243 0.0 0.7 10608 3748 ? S 10:54 0:00 /usr/sbin/smbd -D
root 6269 0.0 0.7 10608 3748 ? S 10:55 0:00 /usr/sbin/smbd -D
root 6289 0.0 0.7 10608 3748 ? S 10:56 0:00 /usr/sbin/smbd -D
root 6304 0.0 0.7 10608 3752 ? S 10:57 0:00 /usr/sbin/smbd -D
root 6391 0.0 0.7 10608 3748 ? S 11:05 0:00 /usr/sbin/smbd -D
root 7094 0.0 0.7 10608 3764 ? S 11:06 0:00 /usr/sbin/smbd -D
root 7291 0.0 0.7 10608 3812 ? S 11:23 0:00 /usr/sbin/smbd -D
root 7297 0.0 0.7 10608 3756 ? S 11:24 0:00 /usr/sbin/smbd -D
root 7298 0.0 0.7 10616 3820 ? S 11:24 0:00 /usr/sbin/smbd -D
root 7320 0.0 0.7 10616 3756 ? S 11:26 0:00 /usr/sbin/smbd -D
root 7334 0.0 0.7 10616 3824 ? S 11:27 0:00 /usr/sbin/smbd -D
root 7340 0.0 0.7 10616 3816 ? S 11:28 0:00 /usr/sbin/smbd -D
One more note (sorry for the multiple posts...things just keep occurring to me): It looks like the CPU time on all of those is 00:00...does this indicate that the connections were never really established...as if they tried to connect and failed but the process still stayed around? Ugh, getting really confused here...thanks again for taking a look at this...
FYI, just tested this with an Apple Powerbook...the smbd daemon is killed properly. So, it appears that my hunch was correct, this really is an interaction with Windows XP where these daemons aren't properly killed by the Samba server.
Created attachment 1958 [details]
An output from "tail -f log.localhost" on the Samba server
This is what is generated in the logs when I attempt to execute a "smbclient -L localhost -U user" from the server itself on the commandline after the Samba server has gotten into a state where there are a ton of built-up processes. Here is the output to stdout:
abbott:/var/log/samba# smbclient -L localhost -U steele
Domain=[ASPA] OS=[Unix] Server=[Samba 3.0.22-Debian]
Sharename Type Comment
--------- ---- -------
public Disk Public Repository
downloads Disk Helpful Downloads
IPC$ IPC IPC Service (Samba Server 3.0.22-Debian)
ADMIN$ IPC IPC Service (Samba Server 3.0.22-Debian)
hplaserjet Printer hp8150dn
steele Disk Home of steele, steele
session setup failed: Call timed out: server did not respond after 20000 milliseconds
NetBIOS over TCP disabled -- no workgroup available
Do you have a deadtime value set in smb.conf ? If not try that.
Also look at the output from netstat and see if the connection
is still ESTABLISHED. On linux 'netstat -pant' will help to
match the socket connection with a process id.
user cannot reproduce. Closing.