Created attachment 9022 [details] leaks.txt - Stats gathered by top about smbd process Hi team, I've upgraded from 3.6.13 to 3.6.15 a few weeks ago, and notice memory leaking since then on all the Samba servers. Running top -b 1 -n 1 every hour confirmed that smbd is leaking. I'm attaching my top stats for you to see for yourself. Sambs 3.6.15, built with these parameters (exactly same as previous 3.6.13), on Ubuntu Linux 10.04 x64 --cache-file=./config.cache \ --with-fhs \ --enable-static \ --with-privatedir=/etc/samba \ --with-piddir=/var/run/samba \ --with-rootsbindir=/sbin \ --with-pammodulesdir=/lib/$(DEB_HOST_MULTIARCH)/security \ --with-pam \ --with-syslog \ --with-utmp \ --with-readline \ --with-pam_smbpass \ --with-winbind \ --with-shared-modules=idmap_rid,idmap_ad,idmap_adex,idmap_hash,idmap_ldap,idmap_tdb2 \ --with-automount \ --with-ldap \ --with-ads \ --without-libtdb \ --without-libnetapi \ --with-modulesdir=/usr/lib/samba \ --datadir=/usr/share/samba \ --with-swatdir=/usr/share/samba/swat \ --with-lockdir=/var/run/samba \ --with-statedir=/var/lib/samba \ --with-cachedir=/var/cache/samba \ --with-codepagedir=/usr/share/samba \ --with-nmbdsocketdir=/var/run/samba \ --enable-external-libtalloc \ --without-libtalloc \ --without-cifsmount \ --disable-avahi \ --without-libtdb \ --with-external-libtdb \ --without-dnsupdate I'd love to provide more debugging details, please let me know how I can help.
Can you please run "smbcontrol <pid> pool-usage" on one of the large smbds and upload the output? Thanks a LOT Volker
Created attachment 9023 [details] smb-control-27697.gz smb-control 27697 pool-usage, gzipped
Created attachment 9027 [details] smb-control-9676.gz One more pool-usage report, from another process started three days ago and grown from 100 to 164 Mb.
Created attachment 9029 [details] Patch Can you try the attached patchset? Thanks, Volker
Deployed, let's wait a couple of days.
Unfortunately, it's still leaking. Here is stats from top taken hourly for the smbd process 20362: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20362 root 20 0 94796 4620 2084 S 0 0.1 0:01.51 smbd 20362 root 20 0 94796 4948 2084 S 0 0.1 0:04.79 smbd 20362 root 20 0 94796 5624 2084 S 0 0.1 0:08.10 smbd 20362 root 20 0 94928 6312 2092 S 0 0.1 0:11.44 smbd 20362 root 20 0 95588 7000 2092 S 0 0.1 0:14.79 smbd 20362 root 20 0 96380 7680 2092 S 0 0.1 0:18.15 smbd 20362 root 20 0 97040 8364 2092 S 0 0.1 0:21.51 smbd 20362 root 20 0 97700 9040 2092 S 0 0.1 0:24.89 smbd 20362 root 20 0 98360 9728 2092 S 0 0.2 0:28.28 smbd 20362 root 20 0 99020 10m 2092 S 0 0.2 0:31.62 smbd 20362 root 20 0 99680 10m 2092 S 0 0.2 0:35.03 smbd 20362 root 20 0 98.0m 11m 2092 S 0 0.2 0:38.44 smbd 20362 root 20 0 98.8m 12m 2092 S 0 0.2 0:41.86 smbd 20362 root 20 0 99.4m 12m 2092 S 0 0.2 0:45.29 smbd 20362 root 20 0 100m 13m 2092 S 0 0.2 0:48.73 smbd 20362 root 20 0 100m 14m 2092 S 0 0.2 0:52.18 smbd 20362 root 20 0 101m 14m 2092 S 0 0.2 0:55.63 smbd 20362 root 20 0 101m 15m 2092 S 0 0.3 0:59.09 smbd 20362 root 20 0 102m 16m 2092 S 0 0.3 1:02.57 smbd 20362 root 20 0 103m 16m 2116 S 0 0.3 1:06.07 smbd 20362 root 20 0 104m 17m 2116 S 0 0.3 1:09.58 smbd 20362 root 20 0 104m 18m 2116 S 0 0.3 1:13.09 smbd 20362 root 20 0 105m 18m 2116 S 0 0.3 1:16.61 smbd 20362 root 20 0 106m 19m 2116 S 0 0.3 1:20.14 smbd 20362 root 20 0 106m 20m 2116 S 0 0.3 1:23.67 smbd 20362 root 20 0 107m 20m 2116 S 0 0.3 1:27.21 smbd 20362 root 20 0 108m 21m 2116 S 0 0.4 1:30.75 smbd 20362 root 20 0 108m 22m 2116 S 0 0.4 1:34.29 smbd 20362 root 20 0 110m 23m 2568 S 0 0.4 1:37.87 smbd 20362 root 20 0 110m 23m 2568 S 0 0.4 1:41.44 smbd 20362 root 20 0 111m 24m 2568 S 0 0.4 1:44.97 smbd 20362 root 20 0 112m 25m 2568 S 0 0.4 1:48.59 smbd 20362 root 20 0 112m 25m 2544 S 0 0.4 1:52.20 smbd 20362 root 20 0 113m 26m 2568 S 0 0.4 1:55.84 smbd 20362 root 20 0 114m 27m 2544 S 0 0.5 1:59.50 smbd 20362 root 20 0 114m 27m 2568 S 0 0.5 2:03.15 smbd 20362 root 20 0 115m 28m 2544 S 0 0.5 2:06.82 smbd 20362 root 20 0 116m 29m 2548 S 0 0.5 2:10.73 smbd I collected two pool-usage stats files with the interval of few hours, attaching them here:
Created attachment 9030 [details] pool-usage-20362-1.txt
Created attachment 9031 [details] pool-usage-20362-2.txt Collected a few hours later than pool-usage-20362-1.txt
Has it improved at all, i.e. has it slowed down leaking?
(In reply to comment #9) > Has it improved at all, i.e. has it slowed down leaking? It seems so. In my initial post the process grew from 94 to 148m within 38 hours, while yesterday it grew only up to 114m during the same period of time.
Created attachment 9218 [details] Three files gathering pool usage with 1 hour interval I gathered three pool-usage stats with one hour interval so that it was easier for you to diff them and see the growing part.
Are any more reports needed? It always leaks in the same section of pool-usage report, as far as I can say after looking into a few of them.
Thanks for your thorough report Alex. (In reply to comment #12) > Are any more reports needed? > It always leaks in the same section of pool-usage report, as far as I can say > after looking into a few of them. A diff between the first and last pool usage reports that you kindly provided shows a very small ~1k increase in usage over the 2 hours. Your leaks appear to occur much faster than that, which suggest that they are the result of unfreed memory outside of talloc, e.g. tdb, etc. Please install the Samba debug symbols and run smbd under valgrind (with --trace-children=yes and --leak-check=full). Feel free to ask if you need any help.
You see, this problem is only visible on the busiest printservers. I cannot imagine running them in debug mode. Is there a way to simulate the load on a test machine using any Samba tools?
Created attachment 9716 [details] valgrind-smbd.log -- Smbd running under Valgrind Here is a valgrind log that I collected from a test server, running smbd with this command: # valgrind --tool=memcheck --leak-check=full --show-reachable=yes --track-fds=yes --log-file=/tmp/valgrind-smbd.log --trace-children=yes /usr/sbin/smbd -F After running for 20 minutes with 200 printers and one client, it reported this: ==29750== LEAK SUMMARY: ==29750== definitely lost: 104,129 bytes in 5,921 blocks ==29750== indirectly lost: 111 bytes in 5 blocks ==29750== possibly lost: 661,342 bytes in 6,268 blocks ==29750== still reachable: 335,456 bytes in 3,668 blocks ==29750== suppressed: 0 bytes in 0 blocks Samba 3.6.22, libtdb1 1.2.12, Ubuntu 12.04 x64.
Created attachment 9719 [details] Test fix. OK, here is a test fix that should apply to v3-6-test. It should fix the only problematic memory leak I see in your valgind log. Cheers, Jeremy.
Created attachment 9720 [details] v4-1-test patch
Created attachment 9721 [details] v4-0-test patch
Karolin, please merge for 4.0.next and 4.1.next. Thanks!
(In reply to comment #16) > Created attachment 9719 [details] > Test fix. > > OK, here is a test fix that should apply to v3-6-test. > > It should fix the only problematic memory leak I see in your valgind log. Test patch works for me, thank you Jeremy! This is the summary from valgrind: ==21455== ==21455== LEAK SUMMARY: ==21455== definitely lost: 48 bytes in 1 blocks ==21455== indirectly lost: 111 bytes in 5 blocks ==21455== possibly lost: 0 bytes in 0 blocks ==21455== still reachable: 1,608,050 bytes in 16,216 blocks ==21455== suppressed: 0 bytes in 0 blocks ==21455== ==21455== For counts of detected and suppressed errors, rerun with: -v ==21455== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2) Let me know if anything in the summary concerns you, I can upload the full log.
> This is the summary from valgrind: That's for a similar test load as I reported before: 30 minutes, 1 user, 200 printers.
Sorry, I forgot to mention that valgrind results and last patch tests were performed with Volker's patch above being applied, too: https://bugzilla.samba.org/attachment.cgi?id=9029 Please include it into the release.
Karolin, Volker's patch : https://bugzilla.samba.org/attachment.cgi?id=9029 is already included in the 4.0.x and 4.1.x source trees, so the only fixes needed to close this bug out are: https://bugzilla.samba.org/attachment.cgi?id=9720 and https://bugzilla.samba.org/attachment.cgi?id=9721 Cheers, Jeremy.
(In reply to comment #23) > Karolin, Volker's patch : > > https://bugzilla.samba.org/attachment.cgi?id=9029 > > is already included in the 4.0.x and 4.1.x source trees, so the only fixes > needed to close this bug out are: > > https://bugzilla.samba.org/attachment.cgi?id=9720 > > and > > https://bugzilla.samba.org/attachment.cgi?id=9721 > > Cheers, > > Jeremy. Pushed to autobuild-v4-1-test and autobuild-v4-0-test.
Pushed to v4-1-test and v4-0-test. Closing out bug report. Thanks!