Hi! We are having some memory leaks with samba 3.2.5 (on a Debian system; package version is 3.2.5-4lenny7; also note that the Debian package has some patches backported (security patches mainly)). Output of "valgrind --trace-children=yes --leak-check=full /usr/sbin/smbd" is: # valgrind --trace-children=yes --leak-check=full /usr/sbin/smbd ==2370== Memcheck, a memory error detector. ==2370== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al. ==2370== Using LibVEX rev 1854, a library for dynamic binary translation. ==2370== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP. ==2370== Using valgrind-3.3.1-Debian, a dynamic binary instrumentation framework. ==2370== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al. ==2370== For more details, rerun with: -v ==2370== ==2370== ==2370== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 19 from 2) ==2370== malloc/free: in use at exit: 353,195 bytes in 395 blocks. ==2370== malloc/free: 1,005 allocs, 610 frees, 411,722 bytes allocated. ==2370== For counts of detected errors, rerun with: -v ==2370== searching for pointers to 395 not-freed blocks. ==2370== checked 683,040 bytes. ==2370== ==2370== 15 bytes in 1 blocks are definitely lost in loss record 2 of 30 ==2370== at 0x4C2360E: malloc (vg_replace_malloc.c:207) ==2370== by 0x7917D71: strdup (in /lib/libc-2.7.so) ==2370== by 0x5A6D7B: string_set (in /usr/sbin/smbd) ==2370== by 0x478820: (within /usr/sbin/smbd) ==2370== by 0x47BA1D: lp_load_ex (in /usr/sbin/smbd) ==2370== by 0x47C328: lp_load_initial_only (in /usr/sbin/smbd) ==2370== by 0x46FE57: main (in /usr/sbin/smbd) ==2370== ==2370== ==2370== 85 bytes in 1 blocks are possibly lost in loss record 6 of 30 ==2370== at 0x4C2360E: malloc (vg_replace_malloc.c:207) ==2370== by 0x748F80A: talloc_strdup (in /usr/lib/libtalloc.so.1.2.0) ==2370== by 0x5AEEDB: get_myname (in /usr/sbin/smbd) ==2370== by 0x5AEF00: myhostname (in /usr/sbin/smbd) ==2370== by 0x4787A9: (within /usr/sbin/smbd) ==2370== by 0x47BA1D: lp_load_ex (in /usr/sbin/smbd) ==2370== by 0x47C328: lp_load_initial_only (in /usr/sbin/smbd) ==2370== by 0x46FE57: main (in /usr/sbin/smbd) ==2370== ==2370== ==2370== 216 bytes in 2 blocks are possibly lost in loss record 12 of 30 ==2370== at 0x4C2360E: malloc (vg_replace_malloc.c:207) ==2370== by 0x74910F3: _talloc_zero (in /usr/lib/libtalloc.so.1.2.0) ==2370== by 0x5BD8FA: event_context_init (in /usr/sbin/smbd) ==2370== by 0x46F4AB: smbd_event_context (in /usr/sbin/smbd) ==2370== by 0x46F4DF: smbd_messaging_context (in /usr/sbin/smbd) ==2370== by 0x46FEB1: main (in /usr/sbin/smbd) ==2370== ==2370== ==2370== 920 (648 direct, 272 indirect) bytes in 3 blocks are definitely lost in loss record 18 of 30 ==2370== at 0x4C2360E: malloc (vg_replace_malloc.c:207) ==2370== by 0x7491575: _talloc_array (in /usr/lib/libtalloc.so.1.2.0) ==2370== by 0x5A6776: str_list_make (in /usr/sbin/smbd) ==2370== by 0x478DD1: (within /usr/sbin/smbd) ==2370== by 0x47BA1D: lp_load_ex (in /usr/sbin/smbd) ==2370== by 0x47C328: lp_load_initial_only (in /usr/sbin/smbd) ==2370== by 0x46FE57: main (in /usr/sbin/smbd) ==2370== ==2370== LEAK SUMMARY: ==2370== definitely lost: 663 bytes in 4 blocks. ==2370== indirectly lost: 272 bytes in 3 blocks. ==2370== possibly lost: 301 bytes in 3 blocks. ==2370== still reachable: 351,959 bytes in 385 blocks. ==2370== suppressed: 0 bytes in 0 blocks. ==2370== Reachable blocks (those to which a pointer was found) are not shown. ==2370== To see them, rerun with: --leak-check=full --show-reachable=yes SERVER:~# ==2371== ==2371== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 19 from 2) ==2371== malloc/free: in use at exit: 353,509 bytes in 400 blocks. ==2371== malloc/free: 1,031 allocs, 631 frees, 413,922 bytes allocated. ==2371== For counts of detected errors, rerun with: -v ==2371== searching for pointers to 400 not-freed blocks. ==2371== checked 683,392 bytes. ==2371== ==2371== 15 bytes in 1 blocks are definitely lost in loss record 2 of 30 ==2371== at 0x4C2360E: malloc (vg_replace_malloc.c:207) ==2371== by 0x7917D71: strdup (in /lib/libc-2.7.so) ==2371== by 0x5A6D7B: string_set (in /usr/sbin/smbd) ==2371== by 0x478820: (within /usr/sbin/smbd) ==2371== by 0x47BA1D: lp_load_ex (in /usr/sbin/smbd) ==2371== by 0x47C328: lp_load_initial_only (in /usr/sbin/smbd) ==2371== by 0x46FE57: main (in /usr/sbin/smbd) ==2371== ==2371== ==2371== 85 bytes in 1 blocks are possibly lost in loss record 6 of 30 ==2371== at 0x4C2360E: malloc (vg_replace_malloc.c:207) ==2371== by 0x748F80A: talloc_strdup (in /usr/lib/libtalloc.so.1.2.0) ==2371== by 0x5AEEDB: get_myname (in /usr/sbin/smbd) ==2371== by 0x5AEF00: myhostname (in /usr/sbin/smbd) ==2371== by 0x4787A9: (within /usr/sbin/smbd) ==2371== by 0x47BA1D: lp_load_ex (in /usr/sbin/smbd) ==2371== by 0x47C328: lp_load_initial_only (in /usr/sbin/smbd) ==2371== by 0x46FE57: main (in /usr/sbin/smbd) ==2371== ==2371== ==2371== 216 bytes in 2 blocks are possibly lost in loss record 12 of 30 ==2371== at 0x4C2360E: malloc (vg_replace_malloc.c:207) ==2371== by 0x74910F3: _talloc_zero (in /usr/lib/libtalloc.so.1.2.0) ==2371== by 0x5BD8FA: event_context_init (in /usr/sbin/smbd) ==2371== by 0x46F4AB: smbd_event_context (in /usr/sbin/smbd) ==2371== by 0x46F4DF: smbd_messaging_context (in /usr/sbin/smbd) ==2371== by 0x46FEB1: main (in /usr/sbin/smbd) ==2371== ==2371== ==2371== 920 (648 direct, 272 indirect) bytes in 3 blocks are definitely lost in loss record 18 of 30 ==2371== at 0x4C2360E: malloc (vg_replace_malloc.c:207) ==2371== by 0x7491575: _talloc_array (in /usr/lib/libtalloc.so.1.2.0) ==2371== by 0x5A6776: str_list_make (in /usr/sbin/smbd) ==2371== by 0x478DD1: (within /usr/sbin/smbd) ==2371== by 0x47BA1D: lp_load_ex (in /usr/sbin/smbd) ==2371== by 0x47C328: lp_load_initial_only (in /usr/sbin/smbd) ==2371== by 0x46FE57: main (in /usr/sbin/smbd) ==2371== ==2371== LEAK SUMMARY: ==2371== definitely lost: 663 bytes in 4 blocks. ==2371== indirectly lost: 272 bytes in 3 blocks. ==2371== possibly lost: 301 bytes in 3 blocks. ==2371== still reachable: 352,273 bytes in 390 blocks. ==2371== suppressed: 0 bytes in 0 blocks. ==2371== Reachable blocks (those to which a pointer was found) are not shown. ==2371== To see them, rerun with: --leak-check=full --show-reachable=yes I am having the same problem here and I can test, if necessary, with newer versions of samba (as long as it's on the same 3.2 series; I just cannot upgrade to newer series as my server has 70+ users using it). I can also test patches or some other kind of modifications. Are they real leakages that are still not fixed on 3.2 series? Can I do something else to help, please? See http://bugs.debian.org/538819 for the full logs. Thank you!
These are not real leaks. They are global initializations that are meant to stay around for the lifetime of the process and only freed on exit. So, not a bug. Jeremy.
(In reply to comment #1) > These are not real leaks. They are global initializations that are meant to > stay around for the lifetime of the process and only freed on exit. So, not a > bug. Same problem here, the problem seems related to smbd processes not exiting (debian, package 3.2.5-4lenny7) even after long time of inactivity from the client in case of mapped printers, setting "deadtime" in smb.conf seems to solve the problem. I think it's the normal behavior, but never realized about this before the update to 3.2.5.
On a freshly started samba I can see that the smbd processes use about 3 ~ 5 MB each (looking only at the RSS (resident set size) value; 70 users = 350MB of memory). After some time (one or two weeks) I see that they are using 30+ MB each (totaling more than 2GB of RAM, for the same amount of users). Doesn't it look like a leakage?
(In reply to comment #3) > On a freshly started samba I can see that the smbd processes use about 3 ~ 5 MB > each (looking only at the RSS (resident set size) value; 70 users = 350MB of > memory). After some time (one or two weeks) I see that they are using 30+ MB > each (totaling more than 2GB of RAM, for the same amount of users). Doesn't it > look like a leakage? It does. You might want to upload the output of the smbcontrol <pid> pool-usage where <pid> is replaced by the process id of one of the large smbds. Volker >
Created attachment 5165 [details] "smbcontrol <pid> pool-usage" output "smbcontrol <pid> pool-usage" output of a 13+ MB smbd process. Interesting is that the memory usage of the process increases with every run of smbcontrol (both its resident and its shared values).
Created attachment 5166 [details] Output from another process This is the output of another process, with 14+ MB and growing (all of them are growing). If I am going to bet into something, it would be all the entries like this: char * contains 155 bytes in 3 blocks (ref 0) char contains 8 bytes in 1 blocks (ref 0) char contains 11 bytes in 1 blocks (ref 0)
This one's going to be hard to find. How fast are these processes growing? I'm not really sure how much effort I want to put into 3.2 which is in "security bugs only" mode at this moment. Can you upgrade to 3.4.4 and see if you have the same problem there? Volker
Volker, 3.2 is the version we ship in Debian stable. That explains why this bug was reported. Of course, Debian users can use the backported versions we also provide, but these backports do not have the official status that packages in the stable branch of the distribution have. Of course, we understand that you don't want to invest too much time in investigating this bug in a version that's, from your POV, supported only for security fixes. Hopefully, somebody else will be able to investigate this by using the valuable information that was provided in this bug report. Christian Perrier
As good as the information is, it is not enough, and I don't know how to catch it really. I have spent an hour or so scanning through our talloc calls that use the NULL context, but I could not find anything. What I would do if I had this: I would try to find what the content of these "char *" things is. This requires attaching to the smbd with gdb and walking through the talloc chain. This is a pretty big task to ask fro m a user :-) Volker
Created attachment 5201 [details] Patch for 3.2 with better pool-usage output The attached patch makes "pool-usage" print out the first 50 bytes of a "char" chunk. Can you apply that patch and upload another pool-usage output? Christian, can you provide a patched .deb for the bug reporter? Thanks, Volker
(In reply to comment #10) > Christian, can you provide a patched .deb for the bug reporter? I can create the patched Debian package :-) Will test tomorrow. As soon as the processes start to grow again I will attach a new output.
Test packages available at: http://people.debian.org/~bubulle/packages/samba-debug-lenny/ To anyone testing this package: - the version number of packages is *lower* than the current one in lenny. You must then force dpkg to "downgrade" when installing the .deb files. That will allow you to re-upgrade to normal packages later on by just running "apt-get upgrade".
Created attachment 5203 [details] smbcontrol output with debug patch This one wasn't necessary to wait too long. Attached are two files: 28.txt (a process that is reaching 28+ MB of RAM) and 12.txt (another one that is using 12 MB). In my smb.conf I have: vfs object = extd_audit recycle And I am seeing them on the output. Could it be a leakage on them? @dti, @smg, @smf, @smel, etc, are all system groups.
Wait a second -- I thought those "char *" -> "char" things grow indefinitely. If the process listed by 28.txt has 28MB, then the memory leak is somewhere else. Are you 100% certain that you have the patches for bug 7020 applied in your package as well? If yes, then we need another valgrind run also tracing children of smbd, sorry. Volker
(In reply to comment #14) > Are you 100% certain that you have the patches for bug 7020 applied in your > package as well? Hmmmmm... didn't know about them, sorry. Applied them as http://people.debian.org/~naoliv/misc/samba-7020.patch Will test soon.
(In reply to comment #15) > Applied them as http://people.debian.org/~naoliv/misc/samba-7020.patch Another "hmmmmmmmm" here: rpc_server/srv_pipe_hnd.c: In function 'read_from_internal_pipe': rpc_server/srv_pipe_hnd.c:1089: error: 'output_data' has no member named 'frag' rpc_server/srv_pipe_hnd.c:1094: error: 'output_data' has no member named 'frag' What would be the correct code for this (if possible and if applicable ), please? diff -ur samba-3.2.5/source/rpc_server/srv_pipe_hnd.c samba-3.2.5/source/rpc_server/srv_pipe_hnd.c --- samba-3.2.5/source/rpc_server/srv_pipe_hnd.c 2008-11-18 15:17:17.000000000 +0000 +++ samba-3.2.5/source/rpc_server/srv_pipe_hnd.c 2010-01-18 13:12:13.000000000 +0000 @@ -1086,6 +1086,13 @@ out: (*is_data_outstanding) = p->out_data.current_pdu_len > n; + if (p->out_data.current_pdu_sent == prs_offset(&p->out_data.frag)) { + /* We've returned everything in the out_data.frag + * so we're done with this pdu. Free it and reset + * current_pdu_sent. */ + p->out_data.current_pdu_sent = 0; + prs_mem_free(&p->out_data.frag); + } return data_returned; }
Before going down there, can you do another valgrind run to make sure this is the one? :-) Volker
Created attachment 5204 [details] Full output of valgrind This is a 30 minutes more or less run of valgrind (run as "valgrind --trace-children=yes --leak-check=full /usr/sbin/smbd") Could not run for more time, unfortunately, as users started to complain about "things not working properly".
==21437== 369,725 (6,736 direct, 362,989 indirect) bytes in 14 blocks are definitely lost in loss record 42 of 45 ==21437== at 0x4C203E4: calloc (vg_replace_malloc.c:397) ==21437== by 0x5077FA4: ber_memcalloc_x (in /usr/lib/liblber-2.4.so.2.1.0) ==21437== by 0x4E38FB8: (within /usr/lib/libldap_r-2.4.so.2.1.0) ==21437== by 0x4E3A1CE: ldap_result (in /usr/lib/libldap_r-2.4.so.2.1.0) ==21437== by 0x4E3BA48: ldap_search_ext_s (in /usr/lib/libldap_r-2.4.so.2.1.0) ==21437== by 0x75939B: (within /usr/sbin/smbd) ==21437== by 0x759979: smbldap_search (in /usr/sbin/smbd) ==21437== by 0x56CECA: (within /usr/sbin/smbd) ==21437== by 0x573751: (within /usr/sbin/smbd) ==21437== by 0x5661C2: pdb_get_trusteddom_pw (in /usr/sbin/smbd) ==21437== by 0x5E9F9B: is_trusted_domain (in /usr/sbin/smbd) ==21437== by 0x5EDA8A: make_user_info_map (in /usr/sbin/smbd) This one looks promising. Looking. There are quite a few like ==21520== 134,459 (2,264 direct, 132,195 indirect) bytes in 5 blocks are definitely lost in loss record 47 of 51 ==21520== at 0x4C203E4: calloc (vg_replace_malloc.c:397) ==21520== by 0x5077FA4: ber_memcalloc_x (in /usr/lib/liblber-2.4.so.2.1.0) ==21520== by 0x4E37C9E: ldap_create (in /usr/lib/libldap_r-2.4.so.2.1.0) ==21520== by 0x4E381E9: ldap_initialize (in /usr/lib/libldap_r-2.4.so.2.1.0) ==21520== by 0x98FE17B: ??? ==21520== by 0x9902D23: ??? ==21520== by 0x7C46BBB: (within /lib/libc-2.7.so) ==21520== by 0x7C46E6D: getgrouplist (in /lib/libc-2.7.so) ==21520== by 0x5C01F1: (within /usr/sbin/smbd) ==21520== by 0x5C0268: getgroups_unix_user (in /usr/sbin/smbd) ==21520== by 0x565472: (within /usr/sbin/smbd) ==21520== by 0x566825: pdb_enum_group_memberships (in /usr/sbin/smbd) this one. This looks like we can't do much about them. Volker
Created attachment 5205 [details] Patch Can you try the attached patch? Thanks, Volker
Created attachment 5207 [details] New valgrind run Did a new valgrind run, but with only me testing it. Will give a new run with people using it tomorrow.
==26296== 134,253 (2,208 direct, 132,045 indirect) bytes in 4 blocks are definitely lost in loss record 44 of 47 ==26296== at 0x4C203E4: calloc (vg_replace_malloc.c:397) ==26296== by 0x5077FA4: ber_memcalloc_x (in /usr/lib/liblber-2.4.so.2.1.0) ==26296== by 0x4E37C9E: ldap_create (in /usr/lib/libldap_r-2.4.so.2.1.0) ==26296== by 0x4E381E9: ldap_initialize (in /usr/lib/libldap_r-2.4.so.2.1.0) ==26296== by 0x98FE17B: ??? ==26296== by 0x9902D23: ??? ==26296== by 0x7C46BBB: (within /lib/libc-2.7.so) ==26296== by 0x7C46E6D: getgrouplist (in /lib/libc-2.7.so) ==26296== by 0x5C0311: (within /usr/sbin/smbd) ==26296== by 0x5C0388: getgroups_unix_user (in /usr/sbin/smbd) ==26296== by 0x565472: (within /usr/sbin/smbd) ==26296== by 0x566825: pdb_enum_group_memberships (in /usr/sbin/smbd) This piece looks a bit scary. But it seems to come from deep inside libc or rather nss_ldap. Christian, does that ring a bell for you? Volker
Created attachment 5209 [details] Another smbcontrol output Hi! The server is running with the latest patch. smbd processes are all below 12MB of RAM, except this one (that is using 28MB). Is it normal all these lines? char * contains 145 bytes in 2 blocks (ref 0) char contains 9 bytes in 1 blocks (ref 0): template
(It's another smbcontrol output, and not another 'valgrind' output; sorry)
How many of them do you have, and is that number growing? Volker
Ignore my "how many of them"... I though you had sent a valgrind output. valgrind.txt.gz is a pool-usage output. Yes, that's a lot, but as long as that number is not growing, I would not be worried. Overall, talloc only maintains 114723 plus overhead which is FAR below 28MB. I think someone needs to go in and port the 7020 patches to the debian version. If nobody steps in, I'll look after that over the weekend, during the week I am REALLY busy these days, sorry. BTW, do you have printers that are in active use? Volker
(In reply to comment #26) > Ignore my "how many of them"... I though you had sent a valgrind output. > valgrind.txt.gz is a pool-usage output. Yes, I was a little asleep while writing it. Really sorry. > Overall, talloc only maintains 114723 plus overhead which is FAR below 28MB. I > think someone needs to go in and port the 7020 patches to the debian version. > If nobody steps in, I'll look after that over the weekend, during the week I am > REALLY busy these days, sorry. Sure. I perfectly understand this and there is no need to hurry. > BTW, do you have printers that are in active use? There are no printers in smb.conf (there is only "load printers = no"). People are only using network printers or some printers that are shared by other desktops. Thank you!
Created attachment 5211 [details] Backported patch from #7020 I backported patches from bug 7020 to 3.2.5 Here's the first one Christian Perrier
Created attachment 5212 [details] 1nd backported patch from #7020 I backported patches from bug 7020 to 3.2.5 Here's the second one Christian Perrier
I backjported the two patches used in #7020, for samba 3.2.5. I am in the process of building new Debian packages to test with. These packages will also include the debugging snippet proposed by Volker earlier. Stay tuned. Christian Perrier
Doesn't work: Compiling rpc_server/srv_pipe_hnd.c rpc_server/srv_pipe_hnd.c: In function 'read_from_internal_pipe': rpc_server/srv_pipe_hnd.c:1090: error: 'output_data' has no member named 'frag' rpc_server/srv_pipe_hnd.c:1095: error: 'output_data' has no member named 'frag' The following command failed: gcc -I. -I/tmp/buildd/samba-3.2.5/source -O -D_SAMBA_BUILD_=3 -I/tmp/buildd/samba-3.2.5/source/iniparser/src -Iinclude -I./include -I. -I. -I./lib/replace -I./lib/talloc -I./lib/tdb/include -I./libaddns -I./librpc -DHAVE_CONFIG_H -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -Iinclude -I./include -I. -I. -I./lib/replace -I./lib/talloc -I./lib/tdb/include -I./libaddns -I./librpc -I./popt -DLDAP_DEPRECATED -I/include -I/tmp/buildd/samba-3.2.5/source/lib -D_SAMBA_BUILD_=3 -fPIC -c rpc_server/srv_pipe_hnd.c -o rpc_server/srv_pipe_hnd.o make[1]: *** [rpc_server/srv_pipe_hnd.o] Error 1 In short, just blindly applying the patches from #7020 is not OK. We need better backported patches..:-)
That's the piece I would look at over the weekend :-) Volker
Haven't been able to reproduce the 7020 memleak with 3.2. Jeremy, how did you find the 7020 ones? Volker