Bug 2739 - Samba 3.0.10-1.4E and winbind memory issues, OOM_Killer
Summary: Samba 3.0.10-1.4E and winbind memory issues, OOM_Killer
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: winbind (show other bugs)
Version: 3.0.10
Hardware: x86 Linux
: P3 critical
Target Milestone: none
Assignee: Samba Bugzilla Account
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-05-25 08:53 UTC by Vasiliy Kotikov
Modified: 2005-09-29 05:34 UTC (History)
0 users

See Also:


Attachments
log file from valgrind (552 bytes, application/x-bzip2)
2005-05-28 08:49 UTC, Vasiliy Kotikov
no flags Details
screen shot during valgrind running (13.60 KB, text/plain)
2005-05-28 08:50 UTC, Vasiliy Kotikov
no flags Details
script to check used memory by winbind (526 bytes, application/octet-stream)
2005-05-28 08:52 UTC, Vasiliy Kotikov
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Vasiliy Kotikov 2005-05-25 08:53:04 UTC
Hello,

I have a big problem with winbind, winbindd eats almost all memory during 1-2
days and causes OOM-Killer.
HW: Inter SDS2 Board with RMA 1 Gb, 2x 1,266MHz Processor on SCSI disks.
SW: RHEL-AS-4, Squid-2.5.STABLE6-3.4E.5, Samba-3.0.10-1.4E,
Samba-client-3.0.10-1.4E, Samba-common-3.0.10-1.4E,
krb5-libs-1.3.4-12,krb5-workstation-1.3.4-12, ntp-4.2.0.a.20040617-4,
httpd-2.0.52-9.ent

After reload(restart) winbindd process takes about 10Mb... but at the end of the
day it grows till (eats) 300Mb - the second day OOM-Killer kills
winbindd...squid... and other processes.

[elnone@elnone ~]$ cat /etc/krb5.conf
[logging]
 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

[libdefaults]
 ticket_lifetime = 24000
 default_realm = MSTUCA.RU
 dns_lookup_realm = false
 dns_lookup_kdc = false
 default_tgs_enctypes = des-cbc-md5
 default_tkt_enctypes = des-cbc-md5
# permitted_enctypes = des-cbc-md5 des-cbc-crc
 clockskew = 900

[realms]
 MSTUCA.RU = {
  kdc = 172.20.40.3
  kdc = 172.21.40.3
  default_domain = mstuca.ru
 }

[domain_realm]
 .mstuca.ru = MSTUCA.RU
 mstuca.ru = MSTUCA.RU

#[kdc]
# profile = /var/kerberos/krb5kdc/kdc.conf

[appdefaults]
 pam = {
   debug = false
   ticket_lifetime = 36000
   renew_lifetime = 36000
   forwardable = true
   retain_after_close = false
   krb4_convert = false
 }

[elnone@elnone ~]$ cat /etc/samba/smb.conf
\# Samba config file created using SWAT
# from 127.0.0.1 (127.0.0.1)
# Date: 2005/05/24 00:32:35

# Global parameters
[global]
        workgroup = MSTUCA
        realm = MSTUCA.RU
        netbios name = UNI019
        server string = D-309 -= [ Proxy Server ] =-
        interfaces = eth1, lo
        bind interfaces only = Yes
        security = ADS
#       min password length = 8
        obey pam restrictions = Yes
        password server = 172.20.40.3 172.21.40.3
        passwd program = /usr/bin/passwd %u
        username map = /etc/samba/smbusers
        restrict anonymous = 2
        client NTLMv2 auth = Yes
        log level = 1
        log file = /var/log/samba/%m.log
        max log size = 10240
        max smbd processes = 512
        socket options = TCP_NODELAY SO_RCVBUF=8192 SO_SNDBUF=8192
        load printers = No
        preferred master = No
        local master = No
        domain master = No
        dns proxy = No
        wins server = 172.20.40.3, 172.21.40.3
        ldap ssl = no
        winbind use default domain = Yes
        hosts allow = 127., 172.20.40., 172.21.40., 172.21.43., 172.21.44.,
172.21.45.
        hosts deny = ALL
        case sensitive = No

getent passwd
getent groups
wbinfo -u
wbinfo -g
wbinfo -p
wbinfo -t
wbinfo -a username%password

These commands goes witout errors and all have status succeeded.
MS Users authed greatly against MS Windows 2000 AD.
W2K DC syncs its clock with RHEL4 ntpd where samba runs.

after winbind service start
root     12461  0.0  0.3 10512 3912 ?        Ss   19:10   0:00 winbindd
root     12462  0.0  0.3 10296 3360 ?        S    19:10   0:00 winbindd
elnone   12552  0.0  0.0  4068  692 pts/2    S+   19:33   0:00 grep winbind

after 10 hours ... it grows ~120Mb. Right now cron does service winbind restart
to avoid OOM-Killer.

In winbindd.log there are only suspect lines:

clikrb5.c:ads_cleanup_expired_creds(339) 
  ads_cleanup_expired_creds: krb5_cc_remove_cred failed, err Ccache function
not supported: not implemented

...

and

[2005/05/25 19:10:52, 1] nsswitch/winbindd.c:main(864)
  winbindd version 3.0.10-1.4E started.
  Copyright The Samba Team 2000-2004
[2005/05/25 19:11:58, 1] nsswitch/winbindd_ads.c:enum_dom_groups(282)
  No rid for Pre-Windows 2000 Compatible Access !?
[2005/05/25 19:11:58, 1] nsswitch/winbindd_ads.c:enum_dom_groups(282)
  No rid for Guests !?
[2005/05/25 19:11:58, 1] nsswitch/winbindd_ads.c:enum_dom_groups(282)
  No rid for Server Operators !?
[2005/05/25 19:11:58, 1] nsswitch/winbindd_ads.c:enum_dom_groups(282)
  No rid for Replicator !?
[2005/05/25 19:11:58, 1] nsswitch/winbindd_ads.c:enum_dom_groups(282)
  No rid for Account Operators !?
...


I seeked the samba site, googled... but could not found the decistion how to solve!

Please, HELP!!!
Comment 1 Björn Jacke 2005-05-25 09:27:32 UTC
you might also want to try the RHEL4 Samba 3 packages you get from
ftp.sernet.de/pub/samba/rhel/rhel4/ . These packages have improved ADS support
due to the use of heimdal kerberos and are kept uptodate with the latest Samba
versions.
Comment 2 Vasiliy Kotikov 2005-05-26 00:06:59 UTC
I have already tried the version samba-3.0.14a-2.
The same situation

ps aux for winbind at 2 o'clock (at 1 cron restarted the winbind service)
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
root      7825  0.0  0.3  9352 3312 ?        Ss   01:20   0:00 winbindd
root      7826  0.0  0.1  9048 2020 ?        S    01:20   0:00 winbindd
at 17 o'clock
root      8319  0.2  0.3 11268 4044 ?        Ss   08:09   1:05 winbindd
root      8320  0.0 34.5 365584 357904 ?     S    08:09   0:22 winbindd

What could cause it to eat the mem? Problem with expired tickets?
Comment 3 Vasiliy Kotikov 2005-05-26 02:14:19 UTC
(In reply to comment #1)
> you might also want to try the RHEL4 Samba 3 packages you get from
> ftp.sernet.de/pub/samba/rhel/rhel4/ . These packages have improved ADS 
support
> due to the use of heimdal kerberos and are kept uptodate with the latest 
Samba
> versions.

I will try :-) 

Comment 4 Volker Lendecke 2005-05-27 03:14:43 UTC
If you can affort a (really) slow winbind for a while then a rather safe way to
find this problem is to run it under valgrind (www.valgrind.org).

valgrind --tool=memcheck --leak-check=yes -v --num-callers=20 winbindd -i >vg.log

This leaves winbind in the foreground. If you let that run for an hour or so and
issue a 'smbcontrol winbindd shutdown' from another window, you should get a
report in vg.log that should help us tracking down this problem.

Volker
Comment 5 Vasiliy Kotikov 2005-05-28 08:49:55 UTC
Created attachment 1243 [details]
log file from valgrind
Comment 6 Vasiliy Kotikov 2005-05-28 08:50:26 UTC
Created attachment 1244 [details]
screen shot during valgrind running
Comment 7 Vasiliy Kotikov 2005-05-28 08:52:22 UTC
Created attachment 1245 [details]
script to check used memory by winbind
Comment 8 Vasiliy Kotikov 2005-05-28 08:52:38 UTC
Ok After several hours of running 
valgrind --tool=memcheck --leak-check=yes -v --num-callers=20 winbindd -i >vg.log
screen shot and log are attached.
I hope they can help.
Currently I wrote little script to check mem hourly for winbind and it it goes
above 100Mb, the service is restated.

Thank You
Comment 9 Volker Lendecke 2005-05-28 13:48:07 UTC
Thanks for the logs! This looks like another kerberos memory leak. So I'd like
Jeremy to take a look at this.

This winbind has not been excessively large yet, right?

Volker
Comment 10 Vasiliy Kotikov 2005-05-28 20:04:39 UTC
Hello,

Yes you are right, I didn't wait for the big memory eat. 
Should I leave valgrind working for a day? Cron did restart of winbind service 
every night at 1 a.m., but when I come to work I see at afternoon that the 2 
winbindd process is above 360Mb. cron.hourly sends every hour 'ps aux' by email.
Comment 11 Vasiliy Kotikov 2005-05-28 20:08:33 UTC
top - 07:06:49 up 6 days,  8:58,  2 users,  load average: 0.19, 0.07, 0.01
Tasks:  90 total,   1 running,  89 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2% us,  0.2% sy,  0.0% ni, 99.5% id,  0.2% wa,  0.0% hi,  0.0% si
Mem:   1034484k total,   802732k used,   231752k free,   115680k buffers
Swap:  2096472k total,      200k used,  2096272k free,   341880k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 8470 squid     15   0  155m 149m 1744 S  0.0 14.8   0:27.98 squid
22507 root      16   0 73580  65m 2808 S  0.0  6.5   0:03.83 winbindd
 2273 named     18   0 43108 8284 2268 S  0.0  0.8   0:00.00 named
 2387 ntp       16   0  5276 5276 3424 S  0.0  0.5   0:00.55 ntpd
 2588 root      16   0  7584 5124 1608 S  0.0  0.5   3:48.95 hald
21821 apache    15   0  8948 4224 2716 S  0.0  0.4   0:00.01 httpd
21816 apache    15   0  8940 4156 2692 S  0.0  0.4   0:00.02 httpd
21823 apache    15   0  8940 4152 2696 S  0.0  0.4   0:00.00 httpd
21817 apache    15   0  8940 4140 2692 S  0.0  0.4   0:00.00 httpd
21815 apache    15   0  8932 4136 2696 S  0.0  0.4   0:00.00 httpd
21820 apache    16   0  8940 4120 2696 S  0.0  0.4   0:00.00 httpd
21818 apache    15   0  8940 4116 2696 S  0.0  0.4   0:00.00 httpd
21822 apache    15   0  8940 4116 2696 S  0.0  0.4   0:00.00 httpd
 2448 root      16   0  8640 3660 2492 S  0.0  0.4   0:00.37 httpd
22506 root      16   0  9584 3624 2868 S  0.0  0.4   0:01.59 winbindd
 2420 root      16   0  8692 3336 2284 S  0.0  0.3   0:00.25 sendmail
13233 root      15   0 10332 3044 2516 S  0.0  0.3   0:00.31 smbd

working 11 hours...
Comment 12 Volker Lendecke 2005-05-29 01:05:18 UTC
No, I don't think running longer will give additional information. Jeremy might
have other questions though.

Jeremy?

Volker
Comment 13 Vasiliy Kotikov 2005-05-30 12:57:10 UTC
Hello,
I'm getting only one thing in records of winbindd.log "krb5_cc_remove_cred 
failed, err Ccache function not supported: not implemented"  when the used mem 
begins to grow for a child? winbindd process.
During the last several days I tried to use krb5-1.3.6 from 
download.fedora.redhat.com with default samba from RHAS4 - and installed samba-
3.0.14a, downloaded from same source... it didn't solve tmy problem... Maybe I 
have wrong parameters in confs??? 
Maybe should I change the "security" to "DOMAIN" mode not "ADS"?
The previous OS ASPLinux Server II worked five monthes withous being checked as 
also RedHat 9 more that one year before ASP... 
Comment 14 Vasiliy Kotikov 2005-06-10 05:40:59 UTC
I have changed the security mode to RPC and now the system works fine for more 1
week and the used memory by winbind does not go above 10Mbs...
Comment 15 Gerald (Jerry) Carter (dead mail address) 2005-09-29 05:34:14 UTC
please reopen if the bug still exists in 3.0.20a