Problem: When trying to logging in via SSH there is a short delay, winbindd uses 100% and then winbindd crashes: Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: [2012/04/17 17:00:19.420934, 0] ../lib/util/debug.c:413(talloc_log_fn) Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: Bad talloc magic value - unknown value Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: [2012/04/17 17:00:19.420995, 0] lib/util.c:1117(smb_panic) Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: PANIC (pid 10788): Bad talloc magic value - unknown value Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: [2012/04/17 17:00:19.422937, 0] lib/util.c:1221(log_stack_trace) Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: BACKTRACE: 20 stack frames: Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: #0 /usr/sbin/winbindd(log_stack_trace+0x1a) [0x7f8da6793dda] Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: #1 /usr/sbin/winbindd(smb_panic+0x25) [0x7f8da6793eb5] Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: #2 /usr/lib/x86_64-linux-gnu/libtalloc.so.2(talloc_strdup+0x299) [0x7f8da4382de9] Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: #3 /usr/sbin/winbindd(+0x4ff30d) [0x7f8da6ad730d] Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: #4 /usr/sbin/winbindd(dcerpc_lsa_lookup_sids3+0x2e) [0x7f8da6ad79fe] Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: #5 /usr/sbin/winbindd(winbindd_lookup_sids+0x116) [0x7f8da66d4306] Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: #6 /usr/sbin/winbindd(+0xfffa2) [0x7f8da66d7fa2] Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: #7 /usr/sbin/winbindd(+0xeabe2) [0x7f8da66c2be2] Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: #8 /usr/sbin/winbindd(_wbint_LookupGroupMembers+0x5e) [0x7f8da66e197e] Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: #9 /usr/sbin/winbindd(+0x1139b4) [0x7f8da66eb9b4] Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: #10 /usr/sbin/winbindd(winbindd_dual_ndrcmd+0xbc) [0x7f8da66e0f6c] Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: #11 /usr/sbin/winbindd(+0x107cb4) [0x7f8da66dfcb4] Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: #12 /usr/sbin/winbindd(+0x108765) [0x7f8da66e0765] Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: #13 /usr/sbin/winbindd(tevent_common_loop_immediate+0xe2) [0x7f8da67a57e2] Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: #14 /usr/sbin/winbindd(run_events_poll+0x48) [0x7f8da67a38d8] Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: #15 /usr/sbin/winbindd(+0x1cbcf6) [0x7f8da67a3cf6] Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: #16 /usr/sbin/winbindd(_tevent_loop_once+0x90) [0x7f8da67a4900] Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: #17 /usr/sbin/winbindd(main+0x78b) [0x7f8da66b6a3b] Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: #18 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f8da39dfead] Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: #19 /usr/sbin/winbindd(+0xdee91) [0x7f8da66b6e91] Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: [2012/04/17 17:00:19.423285, 0] lib/fault.c:372(dump_core) Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: dumping core in /var/log/samba/cores/winbindd After the delay and the crash the login works. This happens with some uernames, not all! Setup: Debian "testing" Kernel': 3.3.2 Joined (ads) to domain with ~35000 users, everything OK: Join is OK Workgroup: AD LDAP server: 129.70.4.79 LDAP server name: DC1.ad.uni-bielefeld.de Realm: AD.UNI-BIELEFELD.DE Bind Path: dc=AD,dc=UNI-BIELEFELD,dc=DE LDAP port: 389 Server time: Di, 17 Apr 2012 17:08:18 CEST KDC server: 129.70.4.79 Server time offset: -2 Ping to winbindd succeeded checking the NETLOGON dc connection succeeded checking the trust secret for domain AD via RPC calls succeeded getent passwd: works wbinfo -u: works wbinfo -g: works nss: works PAM: works ========================================= * smb.conf: [global] netbios name = uche-pc1-ml server string = uche-pc1-ml interfaces = lo eth0 socket options = SO_KEEPALIVE IPTOS_LOWDELAY TCP_NODELAY hosts allow = 129.70.70.0/24 129.70.172.0/24 127.0.0.1/8 realm = ad.uni-bielefeld.de workgroup = ad security = ads template shell = /bin/bash template homedir = /home/%D/%U winbind use default domain = yes winbind refresh tickets = yes winbind offline logon = true winbind enum users = No winbind enum groups = No allow trusted domains = No idmap config *: backend = rid idmap config *: range = 100000-1000000 wins server = 129.70.4.79 129.70.4.80 129.70.4.106 time server = yes unix charset = utf8 load printers = no syslog only = yes [homes] Valid Users = @AD\uche-pc1 read only = no browseable = Yes =========================================
This only happens if sshd is configured to check group membership. Without, everything seems fine. In sshd.conf: AllowGroups root users uche-pc1 Output of "winbindd -F -S -d3" while trying to login domain user "pnau" via ssh: [16461]: request interface version [16461]: request location of privileged pipe getpwnam pnau getgroups pnau sam_lookup_useraliases Closed policy sam_lookup_useraliases Closed policy getgrgid 100513 getgrgid 174268 getgrgid 124409 get_dc_list: preferred server list: "DC3.ad.uni-bielefeld.de, *" Successfully contacted LDAP server 129.70.4.106 get_dc_list: preferred server list: "DC3.ad.uni-bielefeld.de, *" get_dc_list: preferred server list: "DC3.ad.uni-bielefeld.de, *" get_dc_list: preferred server list: "DC3.ad.uni-bielefeld.de, *" Successfully contacted LDAP server 129.70.4.106 get_dc_list: preferred server list: "DC3.ad.uni-bielefeld.de, *" get_dc_list: preferred server list: "DC3.ad.uni-bielefeld.de, *" Successfully contacted LDAP server 129.70.4.106 Connected to LDAP server DC3.ad.uni-bielefeld.de ads_sasl_spnego_bind: got OID=1.3.6.1.4.1.311.2.2.30 ads_sasl_spnego_bind: got OID=1.2.840.48018.1.2.2 ads_sasl_spnego_bind: got OID=1.2.840.113554.1.2.2 ads_sasl_spnego_bind: got OID=1.2.840.113554.1.2.2.3 ads_sasl_spnego_bind: got OID=1.3.6.1.4.1.311.2.2.10 ads_sasl_spnego_bind: got server principal name = not_defined_in_RFC4178@please_ignore ads_krb5_mk_req: krb5_cc_get_principal failed (No credentials cache found) ads_cleanup_expired_creds: Ticket in ccache[MEMORY:winbind_ccache] expiration Mi, 18 Apr 2012 03:42:38 CEST ads_krb5_mk_req: server marked as OK to delegate to, building forwardable TGT connection_ok: Connection to DC3.ad.uni-bielefeld.de for domain AD is not connected Connecting to 129.70.4.106 at port 445 Doing spnego session setup (blob length=136) got OID=1.3.6.1.4.1.311.2.2.30 got OID=1.2.840.48018.1.2.2 got OID=1.2.840.113554.1.2.2 got OID=1.2.840.113554.1.2.2.3 got OID=1.3.6.1.4.1.311.2.2.10 got principal=not_defined_in_RFC4178@please_ignore cli_session_setup_spnego: using target hostname not SPNEGO principal cli_session_setup_spnego: guessed server principal=cifs/DC3.ad.uni-bielefeld.de@AD.UNI-BIELEFELD.DE Doing kerberos session setup ads_cleanup_expired_creds: Ticket in ccache[MEMORY:cliconnect] expiration Mi, 18 Apr 2012 03:42:40 CEST ads_krb5_mk_req: server marked as OK to delegate to, building forwardable TGT Connecting to 129.70.4.106 at port 135 Connecting to 129.70.4.106 at port 49158 Bad talloc magic value - unknown value PANIC (pid 16463): Bad talloc magic value - unknown value BACKTRACE: 20 stack frames: #0 winbindd(log_stack_trace+0x1a) [0x7f04b486edda] #1 winbindd(smb_panic+0x25) [0x7f04b486eeb5] #2 /usr/lib/x86_64-linux-gnu/libtalloc.so.2(talloc_strdup+0x299) [0x7f04b245dde9] #3 winbindd(+0x4ff30d) [0x7f04b4bb230d] #4 winbindd(dcerpc_lsa_lookup_sids3+0x2e) [0x7f04b4bb29fe] #5 winbindd(winbindd_lookup_sids+0x116) [0x7f04b47af306] #6 winbindd(+0xfffa2) [0x7f04b47b2fa2] #7 winbindd(+0xeabe2) [0x7f04b479dbe2] #8 winbindd(_wbint_LookupGroupMembers+0x5e) [0x7f04b47bc97e] #9 winbindd(+0x1139b4) [0x7f04b47c69b4] #10 winbindd(winbindd_dual_ndrcmd+0xbc) [0x7f04b47bbf6c] #11 winbindd(+0x107cb4) [0x7f04b47bacb4] #12 winbindd(+0x108765) [0x7f04b47bb765] #13 winbindd(tevent_common_loop_immediate+0xe2) [0x7f04b48807e2] #14 winbindd(run_events_poll+0x48) [0x7f04b487e8d8] #15 winbindd(+0x1cbcf6) [0x7f04b487ecf6] #16 winbindd(_tevent_loop_once+0x90) [0x7f04b487f900] #17 winbindd(main+0x78b) [0x7f04b4791a3b] #18 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f04b1abaead] #19 winbindd(+0xdee91) [0x7f04b4791e91] dumping core in /var/log/samba/cores/winbindd getgrgid 167813 getgrgid 165298 getgrgid 127654 getgrgid 146660 getgrgid 177191 getgrgid 175911 getgrgid 174876 getgrgid 146659 getgrgid 175915 getgrgid 172505 getgrgid 172506 getgrgid 174265 getgrgid 174269 getgrgid 181582 getgrgid 125140
Looks to me like Bug 8807
Yes, looks the same to me. Michael, can you test with the patch attached here: https://bugzilla.samba.org/show_bug.cgi?id=8807 and see if it fixes your problem ? Thanks. Jeremy.
Yes, it works, the patch for bug 8807 fixes the problem. No problems nor delays nor high CPU-usage anymore - even with 30000 users. :) Tested with Debian testing (Samba 3.6.4) THX, Michael
*** This bug has been marked as a duplicate of bug 8807 ***