I'm now running 3.0.23d, as a PDC, with ldapsam backend, openldap 2.2.23, on linux kernel 2.4.27-3-686-smp. I've got an smbd process for one particular windows host that steadily climbs from 12MB on startup on up to 260MB within a week, and then fails. I've taken several strace's once it's failed. The process looks like this once it's frozen: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND IUSR_JU 8083 8.9 12.4 264368 258412 ? S Nov27 522:20 /usr/sbin/smbd -D Some background: the windows client is a win2k IIS5 server, with well over 60 sites that are hosted off a samba share on this linux server. Smbstatus tells me it's got 60 to 65 connections open for that client (IUSR_JUNIPER). There's 2GB ram in this server so I don't think it ran out of memory, the system reports about 1GB free (-/+ buffers/cache). The only way to recover is send a sigkill to the process and restart samba. The process doesn't respond to sigterm. smb.conf: [global] workgroup = FOREST server string = %h domain server interfaces = <snip> bind interfaces only = Yes obey pam restrictions = Yes passdb backend = ldapsam:ldaps://kingwood.ourdomain.com/ pam password change = Yes passwd program = /usr/bin/passwd %u passwd chat = *Enter\snew\sUNIX\spassword:* %n\n *Retype\snew\sUNIX\spassword:* %n\n *password\supdated\ssuccessfully* . unix password sync = Yes log level = 1 syslog = 0 log file = /var/log/samba/log.%m max log size = 1000 max mux = 2048 time server = Yes socket options = TCP_NODELAY SO_RCVBUF=8192 SO_SNDBUF=8192 add user script = /usr/sbin/smbldap-useradd -m '%u' delete user script = /usr/sbin/smbldap-userdel '%u' add group script = /usr/sbin/smbldap-groupadd -p '%g' delete group script = /usr/sbin/smbldap-groupdel '%g' add user to group script = /usr/sbin/smbldap-groupmod -m '%u' '%g' delete user from group script = /usr/sbin/smbldap-groupmod -x '%u' '%g' set primary group script = /usr/sbin/smbldap-usermod -g '%g' '%u' add machine script = /usr/sbin/smbldap-useradd -w '%u' logon script = %U.cmd logon path = logon drive = H: logon home = domain logons = Yes preferred master = Yes domain master = Yes dns proxy = No wins support = Yes ldap admin dn = cn=samba,ou=DSA,dc=ourdomain,dc=com ldap group suffix = ou=Group ldap idmap suffix = ou=Idmap ldap machine suffix = ou=Computers ldap passwd sync = Yes ldap suffix = dc=ourdomain,dc=com ldap user suffix = ou=People message command = /bin/sh -c '/usr/bin/linpopup "%f" "%m" %s; rm %s' & panic action = /usr/share/samba/panic-action %d idmap backend = ldap:ldaps://kingwood.ourdomain.com/ idmap uid = 10000-20000 idmap gid = 10000-20000 template shell = /bin/bash force unknown acl user = Yes use sendfile = Yes dos filemode = Yes [homes] comment = Home Directories read only = No create mask = 0700 directory mask = 0700 browseable = No [netlogon] comment = Network Logon Service path = /var/lib/samba/netlogon/scripts guest ok = Yes share modes = No root preexec = /var/lib/samba/netlogon/scripts/logon.pl %U %I [printers] comment = All Printers path = /tmp create mask = 0700 printable = Yes browseable = No [print$] comment = Printer Drivers path = /var/lib/samba/printers write list = root, @ntadmin [web] comment = Shared Web Hosting path = /home/web read only = No create mask = 0770 directory mask = 0770
Here's some strace's (too large to attach): http://secure.inline.net/strace.8083.gz http://secure.inline.net/strace.14105.gz
I'm afraid the straces are useless to diagnose the problem. What you can do is increase the debug level of that one growing process with smbcontrol <pid> debug 10 and send us the resulting log file. (btw, 80kb is not really large!). Please set 'max log size = 0', don't be afraid to send in many megabytes of log files. If you're done with logging you can decrease the debug level with smbcontrol <pid> debug 0 BTW, can you try to set 'max stat cache size = 1000' and see if it helps? Volker
Thanks. I've added the two settings, unfortunately the smbd process is still growing gradually. I took two different log snapshots at level 10. These are approx 100mb logs (uncompressed) during which the smbd process grew about 200-500k in memory. I'm watching the process grow an 3mb now http://secure.inline.net/log.juniper.1.gz http://secure.inline.net/log.juniper.2.gz I'll do this again once it reaches the threshhold where fileserving fails. It's grown now to 58MB, so this slow growth in memory still continues. If you'd like more logs, let me know.
Just to give you feedback: The logs are more helpful, thanks. I tried to reproduce a memleak for some of the more unusual calls that your client makes. In particular, it queries the file's security descriptor and it tries to connect to the [web] share as guest, which fails. Both did not show obvious memleaks here. To check all the calls right now I don't have the time, so it might take a bit. Sorry, Volker
Alright I understand it's a lot to debug with the 100MB debug files, what can we do narrow down the problem so it's easier to locate the call? I need to bring some more stability to this server soon. I've tried watching the process in top, and run a debug 10 for just a few seconds while the process grows a little. That log is now at, http://secure.inline.net/log.juniper.2.gz I'm also not sure why the DFS calls appear in the log, so I disabled DFS on the samba server with 'host msdfs = no' (an there are no dfs root's defined). That had a bad effect because the client could no longer connect to the fileshares via a UNC path, so that's very odd since it's not a DFS tree. Since that's something unexpected, can you check the dfs calls in the logs for memleak's? There has to be something different about this setup than most, because samba's known for it's stability. My guess is either ldap related, or the behaviour of the client which is win2k/IIS5 and for that the limit on MaxMpxCT is lifted to 2048 (instead of the default of 50) via the setting 'max mux = 2048'.
I suspect this bug to be fixed in 3.2 as the parts affected (enhanced by the patch proposed by Andrew bartlett) have changed a lot since then Christian Perrier
(In reply to Debian samba package maintainers (PUBLIC MAILING LIST) from comment #6) On this basis, marking as fixed. In any case Samba 3.0 is long out of support, and the code greatly reworked in the time since :-)