I'm running a Samba 3.0.25c domain comprised of a PDC, BDC, W2K3 member servers and XP clients (~200) in a one way NT trust relationship with a W2K3 domain. Staff log on to the Windows domain interactively while Students log on to the Samba domain. On both DCs, one of the winbind processes becomes very large in terms of memory usage which eventually leads to physical and virtual memory exhaustion. Example: # ps -elfy | grep winbindd S root 4189 1 0 75 0 4368 2789 - Aug23 ? 00:02:33 /usr/sbin/winbindd -D S root 4191 4189 0 75 0 3608 2614 429496 Aug23 ? 00:00:00 /usr/sbin/winbindd -D S root 4194 4189 0 75 0 427288 108442 429496 Aug23 ? 00:04:00 /usr/sbin/winbindd -D S root 4195 4189 0 76 0 2644 2355 429496 Aug23 ? 00:00:00 /usr/sbin/winbindd -D S root 4222 4189 0 75 0 3356 2439 429496 Aug23 ? 00:00:01 /usr/sbin/winbindd -D S root 4380 4189 0 75 0 1916 1928 429496 Aug23 ? 00:00:00 /usr/sbin/winbindd -D i.e. one of the processes is currently using over 400 MB of memory. The memory usage is approximately constant overnight (possibly because Staff are not logging on ?) but then grows during the day. I have winbind enum users = no winbind enum groups = no set in smb.conf I should also note that this doesn't seem to be related to the number of smbd processes. I have a deadtime set, and currently there are only 7 smbd processes. I can't spot anything related to this in the logs, and the domain is otherwise running fine. Regards, Patrick
Can you see which type of winbind this is? Most easily figured out by doing a ls -l /proc/4194/fd There you can see which logfile it has open. I'm curious if this is maybe the idmap child, then it should have some logfile like 'log.winbind-imap' or so open. Volker
Hi Volker, Thank you for your reply. The process 4194 is indeed the idmap child: ist-dc1:/var/log/samba# ls -l /proc/4194/fd total 25 lrwx------ 1 root root 64 2007-08-24 08:51 0 -> /dev/null lrwx------ 1 root root 64 2007-08-24 08:51 1 -> /dev/null lrwx------ 1 root root 64 2007-08-24 08:51 10 -> /usr/local/samba/var/locks/netsamlogon_cache.tdb lrwx------ 1 root root 64 2007-08-24 08:51 11 -> socket:[31157] lrwx------ 1 root root 64 2007-08-24 08:51 12 -> socket:[31159] lr-x------ 1 root root 64 2007-08-24 08:51 13 -> pipe:[31164] lrwx------ 1 root root 64 2007-08-24 08:51 14 -> socket:[31162] l-wx------ 1 root root 64 2007-08-24 08:51 15 -> pipe:[31164] lrwx------ 1 root root 64 2007-08-24 08:51 16 -> /usr/local/samba/var/locks/winbindd_cache.tdb lrwx------ 1 root root 64 2007-08-24 08:51 17 -> socket:[31194] l-wx------ 1 root root 64 2007-08-24 08:51 18 -> /usr/local/samba/var/log.winbindd-idmap lrwx------ 1 root root 64 2007-08-24 08:51 19 -> socket:[31193] l-wx------ 1 root root 64 2007-08-24 08:51 2 -> /usr/local/samba/var/log.winbindd-idmap l-wx------ 1 root root 64 2007-08-24 08:51 20 -> pipe:[31196] lrwx------ 1 root root 64 2007-08-24 08:51 21 -> socket:[31199] lrwx------ 1 root root 64 2007-08-24 08:51 22 -> socket:[31201] lrwx------ 1 root root 64 2007-08-24 08:51 23 -> socket:[31203] lrwx------ 1 root root 64 2007-08-24 08:51 24 -> /usr/local/samba/var/locks/login_cache.tdb lr-x------ 1 root root 64 2007-08-24 08:51 3 -> pipe:[31196] lrwx------ 1 root root 64 2007-08-24 08:51 4 -> /usr/local/samba/private/secrets.tdb lr-x------ 1 root root 64 2007-08-24 08:51 5 -> /dev/urandom lrwx------ 1 root root 64 2007-08-24 08:51 6 -> /usr/local/samba/var/locks/gencache.tdb lrwx------ 1 root root 64 2007-08-24 08:51 7 -> /usr/local/samba/var/locks/idmap_cache.tdb l-wx------ 1 root root 64 2007-08-24 08:51 8 -> /usr/local/samba/var/locks/winbindd.pid lrwx------ 1 root root 64 2007-08-24 08:51 9 -> /usr/local/samba/var/locks/messages.tdb Best Regards, Patrick
Ok, thanks. It's the idmap child. Do you have any special idmap options set? Volker
Hi Volker, The idmap configuration is standard, i.e., ist-dc1:/var/log/samba# grep idmap /etc/samba/smb.conf ldap idmap suffix = ou=idmap idmap backend = ldap:ldap://127.0.0.1 idmap uid = 10000-29000 idmap gid = 10000-29000 Regards, Patrick
Created attachment 2880 [details] proposed patch This was checked in with r24649. Thanks! Volker
I'm closing this, because the attached patch fixes a severe memory leak for me. I'm certainly not 100% sure it fixes all, so please re-open if you still get high memory usage. Volker
Hi Volker, Thank you for looking into this bug so promptly. I have applied your patch to our 3.0.25c domain controllers which has significantly decreased the memory leak of the winbind idmap child process. However, I am still seeing a steady memory leak of approximately 20 MB per day (per domain controller) in our production environment. After 4 working days: # ps -elfy | grep winbind S root 10613 1 0 75 0 2660 2222 - Aug25 ? 00:02:32 /usr/sbin/winbindd -D S root 10615 10613 0 78 0 3552 2845 429496 Aug25 ? 00:00:00 /usr/sbin/winbindd -D S root 10618 10613 0 75 0 85424 23222 429496 Aug25 ? 00:04:12 /usr/sbin/winbindd -D S root 10619 10613 0 82 0 936 1999 429496 Aug25 ? 00:00:00 /usr/sbin/winbindd -D S root 10622 10613 0 75 0 3356 2668 429496 Aug25 ? 00:00:06 /usr/sbin/winbindd -D S root 10933 10613 0 75 0 1880 2147 429496 Aug25 ? 00:00:00 /usr/sbin/winbindd -D Please let me know if you require any further information. Best Regards, Patrick
Can you do a "smbcontrol 10618 pool-usage" and upload the output? Thanks, Volker
Hi Volker, When I run that command, I get: ist-dc2:~# smbcontrol 10618 pool-usage No replies received Regards, Patrick
Hmmm. Then that # is not the winbind child anymore... Or it wants to send so much that the smbcontrol timeout is not enough. Do you see a spike in CPU usage of that winbind process you're sending that smbcontrol to? If so, it might help to see that output after it ran for some hours, not with 80 megs. For a sample output you might send that control to other winbind processes. Volker
Hi Volker, Thanks for your reply. The process 10618 is the idmap child: # ls -l /proc/10618/fd | grep idmap l-wx------ 1 root root 64 2007-09-01 06:46 18 -> /usr/local/samba/var/log.winbindd-idmap l-wx------ 1 root root 64 2007-09-01 06:46 2 -> /usr/local/samba/var/log.winbindd-idmap lrwx------ 1 root root 64 2007-09-01 06:46 7 -> /usr/local/samba/var/locks/idmap_cache.tdb When I run "smbcontrol 10618 pool-usage" I do see a spike in CPU usage which continues after the message "No replies received" is displayed (i.e. the smbcontrol timeout occurs before the RPC call completes). Interestingly, the smbcontrol command seems aggravate the leak: # ps -elfy | grep winbind | grep 10618 S root 10618 10613 0 76 0 113592 308608 429496 Aug25 ? 00:05:39 /usr/sbin/winbindd -D I will restart Samba services and post the output of the smbcontrol command on the new idmap child when the total memory usage is smaller (would around 10 - 20 MB be good?). Thank you again for your help. Regards, Patrick
Anything that shows some unusal pattern where some things are repeated a lot would be fine. It might even be after an hour or so, depending on the activity. Volker
Hi again, Just a thought - Could the timeout that smbcontrol is using be changed to say 300 seconds ? I could recompile and run the new smbcontrol binary (i.e "make" but no "make install") against process 10618 (i.e. no restart) - which would allow us to get some data (although I have no idea where the timeout that smbcontrol is using is in the Samba source!) Otherwise, I can go with the original plan (i.e. restart Samba and poll when the idmap child winbind process is smaller). Regards, Patrick
Sure, but this would be overkill. Looking at the source code of smbcontrol I found that it has a "-t <seconds>" parameter. Guess what -t stands for :-)) Volker
Created attachment 2907 [details] smbcontrol output smbcontrol output
Bravo! Thanks Volker. The output of the smbcontrol command is 1.1 million lines long! I have therefore decided to create an attachment, rather than paste it below :-) Cheers, Patrick
Created attachment 2908 [details] Next memleak patch Attached find a patch that should fix this new memleak. I'm leaving this bug open this time, maybe there's more :-) Volker
Hi Volker, Brilliant! Thank you for this. I've applied the patch to our DCs and I will keep an eye on the memory usage of winbind over the next couple of days. I'll hopefully then be back to close this bug ;-) Note that group 1000, the repeated line in the output of smbcontrol, was defined on our system, but in /etc/group rather than within LDAP. Regards, Patrick
Dear Volker, Your patches have resolved this bug. Thank you very much for your prompt & helpful assistance! Regards, Patrick
Thanks for the feedback. As Jerry says in his announcements: Our Code, Our Bugs, Our Responsibility. Volker