Bug 4282 - memory leak and fileserving fails within a single smbd process
Summary: memory leak and fileserving fails within a single smbd process
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: File Services (show other bugs)
Version: 3.0.23d
Hardware: x86 Linux
: P3 critical
Target Milestone: none
Assignee: Samba Bugzilla Account
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-12-04 22:23 UTC by James ffolliott
Modified: 2017-07-16 23:13 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description James ffolliott 2006-12-04 22:23:37 UTC
I'm now running 3.0.23d, as a PDC, with ldapsam backend, openldap 2.2.23, on linux kernel 2.4.27-3-686-smp.

I've got an smbd process for one particular windows host that steadily climbs from 12MB on startup on up to 260MB within a week, and then fails.

I've taken several strace's once it's failed.

The process looks like this once it's frozen:
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
IUSR_JU   8083  8.9 12.4 264368 258412 ?     S    Nov27 522:20 /usr/sbin/smbd -D

Some background: the windows client is a win2k IIS5 server, with well over 60 sites that are hosted off a samba share on this linux server.  Smbstatus tells me it's got 60 to 65 connections open for that client (IUSR_JUNIPER).

There's 2GB ram in this server so I don't think it ran out of memory, the system reports about 1GB free (-/+ buffers/cache).

The only way to recover is send a sigkill to the process and restart samba.  The process doesn't respond to sigterm.

smb.conf:

[global]
        workgroup = FOREST
        server string = %h domain server
        interfaces = <snip>
        bind interfaces only = Yes
        obey pam restrictions = Yes
        passdb backend = ldapsam:ldaps://kingwood.ourdomain.com/
        pam password change = Yes
        passwd program = /usr/bin/passwd %u
        passwd chat = *Enter\snew\sUNIX\spassword:* %n\n *Retype\snew\sUNIX\spassword:* %n\n *password\supdated\ssuccessfully* .
        unix password sync = Yes
        log level = 1
        syslog = 0
        log file = /var/log/samba/log.%m
        max log size = 1000
        max mux = 2048
        time server = Yes
        socket options = TCP_NODELAY SO_RCVBUF=8192 SO_SNDBUF=8192
        add user script = /usr/sbin/smbldap-useradd -m '%u'
        delete user script = /usr/sbin/smbldap-userdel '%u'
        add group script = /usr/sbin/smbldap-groupadd -p '%g'
        delete group script = /usr/sbin/smbldap-groupdel '%g'
        add user to group script = /usr/sbin/smbldap-groupmod -m '%u' '%g'
        delete user from group script = /usr/sbin/smbldap-groupmod -x '%u' '%g'
        set primary group script = /usr/sbin/smbldap-usermod -g '%g' '%u'
        add machine script = /usr/sbin/smbldap-useradd -w '%u'
        logon script = %U.cmd
        logon path =
        logon drive = H:
        logon home =
        domain logons = Yes
        preferred master = Yes
        domain master = Yes
        dns proxy = No
        wins support = Yes
        ldap admin dn = cn=samba,ou=DSA,dc=ourdomain,dc=com
        ldap group suffix = ou=Group
        ldap idmap suffix = ou=Idmap
        ldap machine suffix = ou=Computers
        ldap passwd sync = Yes
        ldap suffix = dc=ourdomain,dc=com
        ldap user suffix = ou=People
        message command = /bin/sh -c '/usr/bin/linpopup "%f" "%m" %s; rm %s' &
        panic action = /usr/share/samba/panic-action %d
        idmap backend = ldap:ldaps://kingwood.ourdomain.com/
        idmap uid = 10000-20000
        idmap gid = 10000-20000
        template shell = /bin/bash
        force unknown acl user = Yes
        use sendfile = Yes
        dos filemode = Yes

[homes]
        comment = Home Directories
        read only = No
        create mask = 0700
        directory mask = 0700
        browseable = No

[netlogon]
        comment = Network Logon Service
        path = /var/lib/samba/netlogon/scripts
        guest ok = Yes
        share modes = No
        root preexec = /var/lib/samba/netlogon/scripts/logon.pl %U %I

[printers]
        comment = All Printers
        path = /tmp
        create mask = 0700
        printable = Yes
        browseable = No

[print$]
        comment = Printer Drivers
        path = /var/lib/samba/printers
        write list = root, @ntadmin

[web]
        comment = Shared Web Hosting
        path = /home/web
        read only = No
        create mask = 0770
        directory mask = 0770
Comment 1 James ffolliott 2006-12-04 22:31:06 UTC
Here's some strace's (too large to attach):
http://secure.inline.net/strace.8083.gz
http://secure.inline.net/strace.14105.gz
Comment 2 Volker Lendecke 2006-12-05 00:29:23 UTC
I'm afraid the straces are useless to diagnose the problem. What you can do is increase the debug level of that one growing process with

smbcontrol <pid> debug 10

and send us the resulting log file. (btw, 80kb is not really large!). Please set 'max log size = 0', don't be afraid to send in many megabytes of log files.

If you're done with logging you can decrease the debug level with

smbcontrol <pid> debug 0

BTW, can you try to set 'max stat cache size = 1000' and see if it helps?

Volker
Comment 3 James ffolliott 2006-12-05 21:44:53 UTC
Thanks.  I've added the two settings, unfortunately the smbd process is still growing gradually.

I took two different log snapshots at level 10.  These are approx 100mb logs (uncompressed) during which the smbd process grew about 200-500k in memory.  I'm watching the process grow an 3mb now 

http://secure.inline.net/log.juniper.1.gz
http://secure.inline.net/log.juniper.2.gz

I'll do this again once it reaches the threshhold where fileserving fails.

It's grown now to 58MB, so this slow growth in memory still continues.

If you'd like more logs, let me know.
Comment 4 Volker Lendecke 2006-12-06 03:15:37 UTC
Just to give you feedback: The logs are more helpful, thanks. I tried to reproduce a memleak for some of the more unusual calls that your client makes. In particular, it queries the file's security descriptor and it tries to connect to the [web] share as guest, which fails. Both did not show obvious memleaks here.

To check all the calls right now I don't have the time, so it might take a bit.

Sorry,

Volker
Comment 5 James ffolliott 2006-12-09 17:08:31 UTC
Alright I understand it's a lot to debug with the 100MB debug files, what can we do narrow down the problem so it's easier to locate the call?  I need to bring some more stability to this server soon.

I've tried watching the process in top, and run a debug 10 for just a few seconds while the process grows a little.  That log is now at,
http://secure.inline.net/log.juniper.2.gz

I'm also not sure why the DFS calls appear in the log, so I disabled DFS on the samba server with 'host msdfs = no' (an there are no dfs root's defined).  That had a bad effect because the client could no longer connect to the fileshares via a UNC path, so that's very odd since it's not a DFS tree.

Since that's something unexpected, can you check the dfs calls in the logs for memleak's?

There has to be something different about this setup than most, because samba's known for it's stability.  My guess is either ldap related, or the behaviour of the client which is win2k/IIS5 and for that the limit on MaxMpxCT is lifted to 2048 (instead of the default of 50) via the setting 'max mux = 2048'.
Comment 6 Debian samba package maintainers (PUBLIC MAILING LIST) 2009-01-03 12:00:03 UTC
I suspect this bug to be fixed in 3.2 as the parts affected (enhanced by the patch proposed by Andrew bartlett) have changed a lot since then

Christian Perrier
Comment 7 Andrew Bartlett 2017-07-16 23:13:54 UTC
(In reply to Debian samba package maintainers (PUBLIC MAILING LIST) from comment #6)
On this basis, marking as fixed.  In any case Samba 3.0 is long out of support, and the code greatly reworked in the time since :-)