Bug 9993 - Memory leaks since upgrading to 3.6.15
Summary: Memory leaks since upgrading to 3.6.15
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.6
Classification: Unclassified
Component: File services (show other bugs)
Version: 3.6.18
Hardware: All All
: P5 normal
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-07-03 14:53 UTC by Alex K
Modified: 2014-03-25 09:20 UTC (History)
0 users

See Also:


Attachments
leaks.txt - Stats gathered by top about smbd process (20.91 KB, application/octet-stream)
2013-07-03 14:53 UTC, Alex K
no flags Details
smb-control-27697.gz (1.27 MB, application/x-gzip)
2013-07-03 15:03 UTC, Alex K
no flags Details
smb-control-9676.gz (393.42 KB, application/x-gzip)
2013-07-05 20:04 UTC, Alex K
no flags Details
Patch (3.44 KB, patch)
2013-07-06 08:53 UTC, Volker Lendecke
no flags Details
pool-usage-20362-1.txt (337.31 KB, application/octet-stream)
2013-07-08 15:11 UTC, Alex K
no flags Details
pool-usage-20362-2.txt (584.18 KB, application/octet-stream)
2013-07-08 15:11 UTC, Alex K
no flags Details
Three files gathering pool usage with 1 hour interval (14.78 KB, application/zip)
2013-09-17 15:32 UTC, Alex K
no flags Details
valgrind-smbd.log -- Smbd running under Valgrind (1.51 MB, text/x-log)
2014-02-24 18:25 UTC, Alex K
no flags Details
Test fix. (364 bytes, patch)
2014-02-24 23:09 UTC, Jeremy Allison
no flags Details
v4-1-test patch (1.05 KB, patch)
2014-02-25 13:01 UTC, Andreas Schneider
jra: review+
ddiss: review+
Details
v4-0-test patch (1.05 KB, patch)
2014-02-25 13:02 UTC, Andreas Schneider
jra: review+
ddiss: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alex K 2013-07-03 14:53:27 UTC
Created attachment 9022 [details]
leaks.txt - Stats gathered by top about smbd process

Hi team, 

I've upgraded from 3.6.13 to 3.6.15 a few weeks ago, and notice memory leaking since then on all the Samba servers. 

Running top -b 1 -n 1 every hour confirmed that smbd is leaking. I'm attaching my top stats for you to see for yourself. 

Sambs 3.6.15, built with these parameters (exactly same as previous 3.6.13), on Ubuntu Linux 10.04 x64

                --cache-file=./config.cache \
                --with-fhs \
                --enable-static \
                --with-privatedir=/etc/samba \
                --with-piddir=/var/run/samba \
                --with-rootsbindir=/sbin \
                --with-pammodulesdir=/lib/$(DEB_HOST_MULTIARCH)/security \
                --with-pam \
                --with-syslog \
                --with-utmp \
                --with-readline \
                --with-pam_smbpass \
                --with-winbind \
                --with-shared-modules=idmap_rid,idmap_ad,idmap_adex,idmap_hash,idmap_ldap,idmap_tdb2 \
                --with-automount \
                --with-ldap \
                --with-ads \
                --without-libtdb \
                --without-libnetapi \
                --with-modulesdir=/usr/lib/samba \
                --datadir=/usr/share/samba \
                --with-swatdir=/usr/share/samba/swat \
                --with-lockdir=/var/run/samba \
                --with-statedir=/var/lib/samba \
                --with-cachedir=/var/cache/samba \
                --with-codepagedir=/usr/share/samba \
                --with-nmbdsocketdir=/var/run/samba \
                --enable-external-libtalloc \
                --without-libtalloc \
                --without-cifsmount \
                --disable-avahi \
                --without-libtdb \
                --with-external-libtdb \
                --without-dnsupdate

I'd love to provide more debugging details, please let me know how I can help.
Comment 1 Volker Lendecke 2013-07-03 14:59:28 UTC
Can you please run "smbcontrol <pid> pool-usage" on one of the large smbds and upload the output?

Thanks a LOT

Volker
Comment 2 Alex K 2013-07-03 15:03:18 UTC
Created attachment 9023 [details]
smb-control-27697.gz

smb-control 27697 pool-usage, gzipped
Comment 3 Alex K 2013-07-05 20:04:53 UTC
Created attachment 9027 [details]
smb-control-9676.gz

One more pool-usage report, from another process started three days ago and grown from 100 to 164 Mb.
Comment 4 Volker Lendecke 2013-07-06 08:53:24 UTC
Created attachment 9029 [details]
Patch

Can you try the attached patchset?

Thanks,

Volker
Comment 5 Alex K 2013-07-07 00:53:28 UTC
Deployed, let's wait a couple of days.
Comment 6 Alex K 2013-07-08 15:10:10 UTC
Unfortunately, it's still leaking. Here is stats from top taken hourly for the smbd process 20362:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
20362 root      20   0 94796 4620 2084 S    0  0.1   0:01.51 smbd               
20362 root      20   0 94796 4948 2084 S    0  0.1   0:04.79 smbd               
20362 root      20   0 94796 5624 2084 S    0  0.1   0:08.10 smbd               
20362 root      20   0 94928 6312 2092 S    0  0.1   0:11.44 smbd               
20362 root      20   0 95588 7000 2092 S    0  0.1   0:14.79 smbd               
20362 root      20   0 96380 7680 2092 S    0  0.1   0:18.15 smbd               
20362 root      20   0 97040 8364 2092 S    0  0.1   0:21.51 smbd               
20362 root      20   0 97700 9040 2092 S    0  0.1   0:24.89 smbd               
20362 root      20   0 98360 9728 2092 S    0  0.2   0:28.28 smbd               
20362 root      20   0 99020  10m 2092 S    0  0.2   0:31.62 smbd               
20362 root      20   0 99680  10m 2092 S    0  0.2   0:35.03 smbd               
20362 root      20   0 98.0m  11m 2092 S    0  0.2   0:38.44 smbd               
20362 root      20   0 98.8m  12m 2092 S    0  0.2   0:41.86 smbd               
20362 root      20   0 99.4m  12m 2092 S    0  0.2   0:45.29 smbd               
20362 root      20   0  100m  13m 2092 S    0  0.2   0:48.73 smbd               
20362 root      20   0  100m  14m 2092 S    0  0.2   0:52.18 smbd               
20362 root      20   0  101m  14m 2092 S    0  0.2   0:55.63 smbd               
20362 root      20   0  101m  15m 2092 S    0  0.3   0:59.09 smbd               
20362 root      20   0  102m  16m 2092 S    0  0.3   1:02.57 smbd               
20362 root      20   0  103m  16m 2116 S    0  0.3   1:06.07 smbd               
20362 root      20   0  104m  17m 2116 S    0  0.3   1:09.58 smbd               
20362 root      20   0  104m  18m 2116 S    0  0.3   1:13.09 smbd               
20362 root      20   0  105m  18m 2116 S    0  0.3   1:16.61 smbd               
20362 root      20   0  106m  19m 2116 S    0  0.3   1:20.14 smbd               
20362 root      20   0  106m  20m 2116 S    0  0.3   1:23.67 smbd               
20362 root      20   0  107m  20m 2116 S    0  0.3   1:27.21 smbd               
20362 root      20   0  108m  21m 2116 S    0  0.4   1:30.75 smbd               
20362 root      20   0  108m  22m 2116 S    0  0.4   1:34.29 smbd               
20362 root      20   0  110m  23m 2568 S    0  0.4   1:37.87 smbd               
20362 root      20   0  110m  23m 2568 S    0  0.4   1:41.44 smbd               
20362 root      20   0  111m  24m 2568 S    0  0.4   1:44.97 smbd               
20362 root      20   0  112m  25m 2568 S    0  0.4   1:48.59 smbd               
20362 root      20   0  112m  25m 2544 S    0  0.4   1:52.20 smbd               
20362 root      20   0  113m  26m 2568 S    0  0.4   1:55.84 smbd               
20362 root      20   0  114m  27m 2544 S    0  0.5   1:59.50 smbd               
20362 root      20   0  114m  27m 2568 S    0  0.5   2:03.15 smbd               
20362 root      20   0  115m  28m 2544 S    0  0.5   2:06.82 smbd               
20362 root      20   0  116m  29m 2548 S    0  0.5   2:10.73 smbd

I collected two pool-usage stats files with the interval of few hours, attaching them here:
Comment 7 Alex K 2013-07-08 15:11:06 UTC
Created attachment 9030 [details]
pool-usage-20362-1.txt
Comment 8 Alex K 2013-07-08 15:11:36 UTC
Created attachment 9031 [details]
pool-usage-20362-2.txt

Collected a few hours later than pool-usage-20362-1.txt
Comment 9 Volker Lendecke 2013-07-08 15:33:40 UTC
Has it improved at all, i.e. has it slowed down leaking?
Comment 10 Alex K 2013-07-08 17:33:26 UTC
(In reply to comment #9)
> Has it improved at all, i.e. has it slowed down leaking?

It seems so. In my initial post the process grew from 94 to 148m within 38 hours, while yesterday it grew only up to 114m during the same period of time.
Comment 11 Alex K 2013-09-17 15:32:13 UTC
Created attachment 9218 [details]
Three files gathering pool usage with 1 hour interval

I gathered three pool-usage stats with one hour interval so that it was easier for you to diff them and see the growing part.
Comment 12 Alex K 2013-10-02 14:05:59 UTC
Are any more reports needed? 
It always leaks in the same section of pool-usage report, as far as I can say after looking into a few of them.
Comment 13 David Disseldorp 2013-10-02 14:32:48 UTC
Thanks for your thorough report Alex. 

(In reply to comment #12)
> Are any more reports needed? 
> It always leaks in the same section of pool-usage report, as far as I can say
> after looking into a few of them.

A diff between the first and last pool usage reports that you kindly provided shows a very small ~1k increase in usage over the 2 hours.
Your leaks appear to occur much faster than that, which suggest that they are the result of unfreed memory outside of talloc, e.g. tdb, etc.

Please install the Samba debug symbols and run smbd under valgrind (with --trace-children=yes and --leak-check=full). Feel free to ask if you need any help.
Comment 14 Alex K 2013-10-03 20:48:52 UTC
You see, this problem is only visible on the busiest printservers. I cannot imagine running them in debug mode. 

Is there a way to simulate the load on a test machine using any Samba tools?
Comment 15 Alex K 2014-02-24 18:25:40 UTC
Created attachment 9716 [details]
valgrind-smbd.log -- Smbd running under Valgrind

Here is a valgrind log that I collected from a test server, running smbd with this command:
# valgrind --tool=memcheck --leak-check=full --show-reachable=yes --track-fds=yes  --log-file=/tmp/valgrind-smbd.log --trace-children=yes /usr/sbin/smbd -F

After running for 20 minutes with 200 printers and one client, it reported this:
==29750== LEAK SUMMARY:
==29750==    definitely lost: 104,129 bytes in 5,921 blocks
==29750==    indirectly lost: 111 bytes in 5 blocks
==29750==      possibly lost: 661,342 bytes in 6,268 blocks
==29750==    still reachable: 335,456 bytes in 3,668 blocks
==29750==         suppressed: 0 bytes in 0 blocks

Samba 3.6.22, libtdb1 1.2.12, Ubuntu 12.04 x64.
Comment 16 Jeremy Allison 2014-02-24 23:09:10 UTC
Created attachment 9719 [details]
Test fix.

OK, here is a test fix that should apply to v3-6-test.

It should fix the only problematic memory leak I see in your valgind log.

Cheers,

Jeremy.
Comment 17 Andreas Schneider 2014-02-25 13:01:48 UTC
Created attachment 9720 [details]
v4-1-test patch
Comment 18 Andreas Schneider 2014-02-25 13:02:09 UTC
Created attachment 9721 [details]
v4-0-test patch
Comment 19 David Disseldorp 2014-02-25 14:08:07 UTC
Karolin, please merge for 4.0.next and 4.1.next.
Thanks!
Comment 20 Alex K 2014-02-25 16:04:03 UTC
(In reply to comment #16)
> Created attachment 9719 [details]
> Test fix.
> 
> OK, here is a test fix that should apply to v3-6-test.
> 
> It should fix the only problematic memory leak I see in your valgind log.

Test patch works for me, thank you Jeremy!
This is the summary from valgrind:

==21455==
==21455== LEAK SUMMARY:
==21455==    definitely lost: 48 bytes in 1 blocks
==21455==    indirectly lost: 111 bytes in 5 blocks
==21455==      possibly lost: 0 bytes in 0 blocks
==21455==    still reachable: 1,608,050 bytes in 16,216 blocks
==21455==         suppressed: 0 bytes in 0 blocks
==21455==
==21455== For counts of detected and suppressed errors, rerun with: -v
==21455== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)

Let me know if anything in the summary concerns you, I can upload the full log.
Comment 21 Alex K 2014-02-25 16:06:06 UTC
> This is the summary from valgrind:

That's for a similar test load as I reported before: 30 minutes, 1 user, 200 printers.
Comment 22 Alex K 2014-03-04 19:39:32 UTC
Sorry, I forgot to mention that valgrind results and last patch tests were performed with Volker's patch above being applied, too: https://bugzilla.samba.org/attachment.cgi?id=9029

Please include it into the release.
Comment 23 Jeremy Allison 2014-03-06 00:19:04 UTC
Karolin, Volker's patch :

https://bugzilla.samba.org/attachment.cgi?id=9029

is already included in the 4.0.x and 4.1.x source trees, so the only fixes needed to close this bug out are:

https://bugzilla.samba.org/attachment.cgi?id=9720

and

https://bugzilla.samba.org/attachment.cgi?id=9721

Cheers,

Jeremy.
Comment 24 Karolin Seeger 2014-03-10 15:24:01 UTC
(In reply to comment #23)
> Karolin, Volker's patch :
> 
> https://bugzilla.samba.org/attachment.cgi?id=9029
> 
> is already included in the 4.0.x and 4.1.x source trees, so the only fixes
> needed to close this bug out are:
> 
> https://bugzilla.samba.org/attachment.cgi?id=9720
> 
> and
> 
> https://bugzilla.samba.org/attachment.cgi?id=9721
> 
> Cheers,
> 
> Jeremy.

Pushed to autobuild-v4-1-test and autobuild-v4-0-test.
Comment 25 Karolin Seeger 2014-03-25 09:20:49 UTC
Pushed to v4-1-test and v4-0-test.
Closing out bug report.

Thanks!