Hello, I have a big problem with winbind, winbindd eats almost all memory during 1-2 days and causes OOM-Killer. HW: Inter SDS2 Board with RMA 1 Gb, 2x 1,266MHz Processor on SCSI disks. SW: RHEL-AS-4, Squid-2.5.STABLE6-3.4E.5, Samba-3.0.10-1.4E, Samba-client-3.0.10-1.4E, Samba-common-3.0.10-1.4E, krb5-libs-1.3.4-12,krb5-workstation-1.3.4-12, ntp-4.2.0.a.20040617-4, httpd-2.0.52-9.ent After reload(restart) winbindd process takes about 10Mb... but at the end of the day it grows till (eats) 300Mb - the second day OOM-Killer kills winbindd...squid... and other processes. [elnone@elnone ~]$ cat /etc/krb5.conf [logging] default = FILE:/var/log/krb5libs.log kdc = FILE:/var/log/krb5kdc.log admin_server = FILE:/var/log/kadmind.log [libdefaults] ticket_lifetime = 24000 default_realm = MSTUCA.RU dns_lookup_realm = false dns_lookup_kdc = false default_tgs_enctypes = des-cbc-md5 default_tkt_enctypes = des-cbc-md5 # permitted_enctypes = des-cbc-md5 des-cbc-crc clockskew = 900 [realms] MSTUCA.RU = { kdc = 172.20.40.3 kdc = 172.21.40.3 default_domain = mstuca.ru } [domain_realm] .mstuca.ru = MSTUCA.RU mstuca.ru = MSTUCA.RU #[kdc] # profile = /var/kerberos/krb5kdc/kdc.conf [appdefaults] pam = { debug = false ticket_lifetime = 36000 renew_lifetime = 36000 forwardable = true retain_after_close = false krb4_convert = false } [elnone@elnone ~]$ cat /etc/samba/smb.conf \# Samba config file created using SWAT # from 127.0.0.1 (127.0.0.1) # Date: 2005/05/24 00:32:35 # Global parameters [global] workgroup = MSTUCA realm = MSTUCA.RU netbios name = UNI019 server string = D-309 -= [ Proxy Server ] =- interfaces = eth1, lo bind interfaces only = Yes security = ADS # min password length = 8 obey pam restrictions = Yes password server = 172.20.40.3 172.21.40.3 passwd program = /usr/bin/passwd %u username map = /etc/samba/smbusers restrict anonymous = 2 client NTLMv2 auth = Yes log level = 1 log file = /var/log/samba/%m.log max log size = 10240 max smbd processes = 512 socket options = TCP_NODELAY SO_RCVBUF=8192 SO_SNDBUF=8192 load printers = No preferred master = No local master = No domain master = No dns proxy = No wins server = 172.20.40.3, 172.21.40.3 ldap ssl = no winbind use default domain = Yes hosts allow = 127., 172.20.40., 172.21.40., 172.21.43., 172.21.44., 172.21.45. hosts deny = ALL case sensitive = No getent passwd getent groups wbinfo -u wbinfo -g wbinfo -p wbinfo -t wbinfo -a username%password These commands goes witout errors and all have status succeeded. MS Users authed greatly against MS Windows 2000 AD. W2K DC syncs its clock with RHEL4 ntpd where samba runs. after winbind service start root 12461 0.0 0.3 10512 3912 ? Ss 19:10 0:00 winbindd root 12462 0.0 0.3 10296 3360 ? S 19:10 0:00 winbindd elnone 12552 0.0 0.0 4068 692 pts/2 S+ 19:33 0:00 grep winbind after 10 hours ... it grows ~120Mb. Right now cron does service winbind restart to avoid OOM-Killer. In winbindd.log there are only suspect lines: clikrb5.c:ads_cleanup_expired_creds(339) ads_cleanup_expired_creds: krb5_cc_remove_cred failed, err Ccache function not supported: not implemented ... and [2005/05/25 19:10:52, 1] nsswitch/winbindd.c:main(864) winbindd version 3.0.10-1.4E started. Copyright The Samba Team 2000-2004 [2005/05/25 19:11:58, 1] nsswitch/winbindd_ads.c:enum_dom_groups(282) No rid for Pre-Windows 2000 Compatible Access !? [2005/05/25 19:11:58, 1] nsswitch/winbindd_ads.c:enum_dom_groups(282) No rid for Guests !? [2005/05/25 19:11:58, 1] nsswitch/winbindd_ads.c:enum_dom_groups(282) No rid for Server Operators !? [2005/05/25 19:11:58, 1] nsswitch/winbindd_ads.c:enum_dom_groups(282) No rid for Replicator !? [2005/05/25 19:11:58, 1] nsswitch/winbindd_ads.c:enum_dom_groups(282) No rid for Account Operators !? ... I seeked the samba site, googled... but could not found the decistion how to solve! Please, HELP!!!
you might also want to try the RHEL4 Samba 3 packages you get from ftp.sernet.de/pub/samba/rhel/rhel4/ . These packages have improved ADS support due to the use of heimdal kerberos and are kept uptodate with the latest Samba versions.
I have already tried the version samba-3.0.14a-2. The same situation ps aux for winbind at 2 o'clock (at 1 cron restarted the winbind service) USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 7825 0.0 0.3 9352 3312 ? Ss 01:20 0:00 winbindd root 7826 0.0 0.1 9048 2020 ? S 01:20 0:00 winbindd at 17 o'clock root 8319 0.2 0.3 11268 4044 ? Ss 08:09 1:05 winbindd root 8320 0.0 34.5 365584 357904 ? S 08:09 0:22 winbindd What could cause it to eat the mem? Problem with expired tickets?
(In reply to comment #1) > you might also want to try the RHEL4 Samba 3 packages you get from > ftp.sernet.de/pub/samba/rhel/rhel4/ . These packages have improved ADS support > due to the use of heimdal kerberos and are kept uptodate with the latest Samba > versions. I will try :-)
If you can affort a (really) slow winbind for a while then a rather safe way to find this problem is to run it under valgrind (www.valgrind.org). valgrind --tool=memcheck --leak-check=yes -v --num-callers=20 winbindd -i >vg.log This leaves winbind in the foreground. If you let that run for an hour or so and issue a 'smbcontrol winbindd shutdown' from another window, you should get a report in vg.log that should help us tracking down this problem. Volker
Created attachment 1243 [details] log file from valgrind
Created attachment 1244 [details] screen shot during valgrind running
Created attachment 1245 [details] script to check used memory by winbind
Ok After several hours of running valgrind --tool=memcheck --leak-check=yes -v --num-callers=20 winbindd -i >vg.log screen shot and log are attached. I hope they can help. Currently I wrote little script to check mem hourly for winbind and it it goes above 100Mb, the service is restated. Thank You
Thanks for the logs! This looks like another kerberos memory leak. So I'd like Jeremy to take a look at this. This winbind has not been excessively large yet, right? Volker
Hello, Yes you are right, I didn't wait for the big memory eat. Should I leave valgrind working for a day? Cron did restart of winbind service every night at 1 a.m., but when I come to work I see at afternoon that the 2 winbindd process is above 360Mb. cron.hourly sends every hour 'ps aux' by email.
top - 07:06:49 up 6 days, 8:58, 2 users, load average: 0.19, 0.07, 0.01 Tasks: 90 total, 1 running, 89 sleeping, 0 stopped, 0 zombie Cpu(s): 0.2% us, 0.2% sy, 0.0% ni, 99.5% id, 0.2% wa, 0.0% hi, 0.0% si Mem: 1034484k total, 802732k used, 231752k free, 115680k buffers Swap: 2096472k total, 200k used, 2096272k free, 341880k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8470 squid 15 0 155m 149m 1744 S 0.0 14.8 0:27.98 squid 22507 root 16 0 73580 65m 2808 S 0.0 6.5 0:03.83 winbindd 2273 named 18 0 43108 8284 2268 S 0.0 0.8 0:00.00 named 2387 ntp 16 0 5276 5276 3424 S 0.0 0.5 0:00.55 ntpd 2588 root 16 0 7584 5124 1608 S 0.0 0.5 3:48.95 hald 21821 apache 15 0 8948 4224 2716 S 0.0 0.4 0:00.01 httpd 21816 apache 15 0 8940 4156 2692 S 0.0 0.4 0:00.02 httpd 21823 apache 15 0 8940 4152 2696 S 0.0 0.4 0:00.00 httpd 21817 apache 15 0 8940 4140 2692 S 0.0 0.4 0:00.00 httpd 21815 apache 15 0 8932 4136 2696 S 0.0 0.4 0:00.00 httpd 21820 apache 16 0 8940 4120 2696 S 0.0 0.4 0:00.00 httpd 21818 apache 15 0 8940 4116 2696 S 0.0 0.4 0:00.00 httpd 21822 apache 15 0 8940 4116 2696 S 0.0 0.4 0:00.00 httpd 2448 root 16 0 8640 3660 2492 S 0.0 0.4 0:00.37 httpd 22506 root 16 0 9584 3624 2868 S 0.0 0.4 0:01.59 winbindd 2420 root 16 0 8692 3336 2284 S 0.0 0.3 0:00.25 sendmail 13233 root 15 0 10332 3044 2516 S 0.0 0.3 0:00.31 smbd working 11 hours...
No, I don't think running longer will give additional information. Jeremy might have other questions though. Jeremy? Volker
Hello, I'm getting only one thing in records of winbindd.log "krb5_cc_remove_cred failed, err Ccache function not supported: not implemented" when the used mem begins to grow for a child? winbindd process. During the last several days I tried to use krb5-1.3.6 from download.fedora.redhat.com with default samba from RHAS4 - and installed samba- 3.0.14a, downloaded from same source... it didn't solve tmy problem... Maybe I have wrong parameters in confs??? Maybe should I change the "security" to "DOMAIN" mode not "ADS"? The previous OS ASPLinux Server II worked five monthes withous being checked as also RedHat 9 more that one year before ASP...
I have changed the security mode to RPC and now the system works fine for more 1 week and the used memory by winbind does not go above 10Mbs...
please reopen if the bug still exists in 3.0.20a