Bug 8871 - winbindd crashes on ssh-login (Bad talloc magic value - unknown value)
Summary: winbindd crashes on ssh-login (Bad talloc magic value - unknown value)
Status: RESOLVED DUPLICATE of bug 8807
Alias: None
Product: Samba 3.6
Classification: Unclassified
Component: Winbind (show other bugs)
Version: 3.6.4
Hardware: x64 Linux
: P5 critical
Target Milestone: ---
Assignee: Michael Adam
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-17 15:19 UTC by Michael Letzgus-Koppmann
Modified: 2012-04-23 20:05 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Letzgus-Koppmann 2012-04-17 15:19:28 UTC
Problem:

When trying to logging in via SSH there is a short delay, winbindd uses 100% and then winbindd crashes:

Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: [2012/04/17 17:00:19.420934,  0] ../lib/util/debug.c:413(talloc_log_fn)
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:   Bad talloc magic value - unknown value
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: [2012/04/17 17:00:19.420995,  0] lib/util.c:1117(smb_panic)
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:   PANIC (pid 10788): Bad talloc magic value - unknown value
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: [2012/04/17 17:00:19.422937,  0] lib/util.c:1221(log_stack_trace)
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:   BACKTRACE: 20 stack frames:
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:    #0 /usr/sbin/winbindd(log_stack_trace+0x1a) [0x7f8da6793dda]
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:    #1 /usr/sbin/winbindd(smb_panic+0x25) [0x7f8da6793eb5]
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:    #2 /usr/lib/x86_64-linux-gnu/libtalloc.so.2(talloc_strdup+0x299) [0x7f8da4382de9]
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:    #3 /usr/sbin/winbindd(+0x4ff30d) [0x7f8da6ad730d]
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:    #4 /usr/sbin/winbindd(dcerpc_lsa_lookup_sids3+0x2e) [0x7f8da6ad79fe]
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:    #5 /usr/sbin/winbindd(winbindd_lookup_sids+0x116) [0x7f8da66d4306]
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:    #6 /usr/sbin/winbindd(+0xfffa2) [0x7f8da66d7fa2]
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:    #7 /usr/sbin/winbindd(+0xeabe2) [0x7f8da66c2be2]
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:    #8 /usr/sbin/winbindd(_wbint_LookupGroupMembers+0x5e) [0x7f8da66e197e]
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:    #9 /usr/sbin/winbindd(+0x1139b4) [0x7f8da66eb9b4]
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:    #10 /usr/sbin/winbindd(winbindd_dual_ndrcmd+0xbc) [0x7f8da66e0f6c]
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:    #11 /usr/sbin/winbindd(+0x107cb4) [0x7f8da66dfcb4]
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:    #12 /usr/sbin/winbindd(+0x108765) [0x7f8da66e0765]
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:    #13 /usr/sbin/winbindd(tevent_common_loop_immediate+0xe2) [0x7f8da67a57e2]
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:    #14 /usr/sbin/winbindd(run_events_poll+0x48) [0x7f8da67a38d8]
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:    #15 /usr/sbin/winbindd(+0x1cbcf6) [0x7f8da67a3cf6]
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:    #16 /usr/sbin/winbindd(_tevent_loop_once+0x90) [0x7f8da67a4900]
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:    #17 /usr/sbin/winbindd(main+0x78b) [0x7f8da66b6a3b]
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:    #18 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f8da39dfead]
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:    #19 /usr/sbin/winbindd(+0xdee91) [0x7f8da66b6e91]
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]: [2012/04/17 17:00:19.423285,  0] lib/fault.c:372(dump_core)
Apr 17 17:00:19 uche-pc1-ml winbindd[10788]:   dumping core in /var/log/samba/cores/winbindd


After the delay and the crash the login works.

This happens with some uernames, not all!



Setup:

Debian "testing"
Kernel': 3.3.2


Joined (ads) to domain with ~35000 users, everything OK:

Join is OK
Workgroup: AD
LDAP server: 129.70.4.79
LDAP server name: DC1.ad.uni-bielefeld.de
Realm: AD.UNI-BIELEFELD.DE
Bind Path: dc=AD,dc=UNI-BIELEFELD,dc=DE
LDAP port: 389
Server time: Di, 17 Apr 2012 17:08:18 CEST
KDC server: 129.70.4.79
Server time offset: -2

Ping to winbindd succeeded
checking the NETLOGON dc connection succeeded
checking the trust secret for domain AD via RPC calls succeeded

getent passwd: works
wbinfo -u: works
wbinfo -g: works

nss: works
PAM: works

=========================================
* smb.conf:

[global]

    netbios name = uche-pc1-ml
    server string = uche-pc1-ml

    interfaces = lo eth0
    socket options = SO_KEEPALIVE IPTOS_LOWDELAY TCP_NODELAY
    hosts allow = 129.70.70.0/24 129.70.172.0/24 127.0.0.1/8

    realm = ad.uni-bielefeld.de
    workgroup = ad
    security = ads
    template shell = /bin/bash
    template homedir = /home/%D/%U

    winbind use default domain = yes
    winbind refresh tickets = yes

    winbind offline logon = true 

    winbind enum users = No
    winbind enum groups = No

    allow trusted domains = No

    idmap config *: backend = rid
    idmap config *: range = 100000-1000000


    wins server = 129.70.4.79 129.70.4.80 129.70.4.106

    time server = yes
    unix charset = utf8


    load printers = no


    syslog only = yes



[homes]
    Valid Users = @AD\uche-pc1
    read only = no
    browseable = Yes
=========================================
Comment 1 Michael Letzgus-Koppmann 2012-04-17 15:45:55 UTC
This only happens if sshd is configured to check group membership. Without,
everything seems fine.

In sshd.conf:

AllowGroups root users uche-pc1


Output of "winbindd -F -S -d3" while trying to login domain user "pnau" via
ssh:


[16461]: request interface version
[16461]: request location of privileged pipe
getpwnam pnau
getgroups pnau
sam_lookup_useraliases
Closed policy
sam_lookup_useraliases
Closed policy
getgrgid 100513
getgrgid 174268
getgrgid 124409
get_dc_list: preferred server list: "DC3.ad.uni-bielefeld.de, *"
Successfully contacted LDAP server 129.70.4.106
get_dc_list: preferred server list: "DC3.ad.uni-bielefeld.de, *"
get_dc_list: preferred server list: "DC3.ad.uni-bielefeld.de, *"
get_dc_list: preferred server list: "DC3.ad.uni-bielefeld.de, *"
Successfully contacted LDAP server 129.70.4.106
get_dc_list: preferred server list: "DC3.ad.uni-bielefeld.de, *"
get_dc_list: preferred server list: "DC3.ad.uni-bielefeld.de, *"
Successfully contacted LDAP server 129.70.4.106
Connected to LDAP server DC3.ad.uni-bielefeld.de
ads_sasl_spnego_bind: got OID=1.3.6.1.4.1.311.2.2.30
ads_sasl_spnego_bind: got OID=1.2.840.48018.1.2.2
ads_sasl_spnego_bind: got OID=1.2.840.113554.1.2.2
ads_sasl_spnego_bind: got OID=1.2.840.113554.1.2.2.3
ads_sasl_spnego_bind: got OID=1.3.6.1.4.1.311.2.2.10
ads_sasl_spnego_bind: got server principal name =
not_defined_in_RFC4178@please_ignore
ads_krb5_mk_req: krb5_cc_get_principal failed (No credentials cache found)
ads_cleanup_expired_creds: Ticket in ccache[MEMORY:winbind_ccache] expiration
Mi, 18 Apr 2012 03:42:38 CEST
ads_krb5_mk_req: server marked as OK to delegate to, building forwardable TGT
connection_ok: Connection to DC3.ad.uni-bielefeld.de for domain AD is not
connected
Connecting to 129.70.4.106 at port 445
Doing spnego session setup (blob length=136)
got OID=1.3.6.1.4.1.311.2.2.30
got OID=1.2.840.48018.1.2.2
got OID=1.2.840.113554.1.2.2
got OID=1.2.840.113554.1.2.2.3
got OID=1.3.6.1.4.1.311.2.2.10
got principal=not_defined_in_RFC4178@please_ignore
cli_session_setup_spnego: using target hostname not SPNEGO principal
cli_session_setup_spnego: guessed server
principal=cifs/DC3.ad.uni-bielefeld.de@AD.UNI-BIELEFELD.DE
Doing kerberos session setup
ads_cleanup_expired_creds: Ticket in ccache[MEMORY:cliconnect] expiration Mi,
18 Apr 2012 03:42:40 CEST
ads_krb5_mk_req: server marked as OK to delegate to, building forwardable TGT
Connecting to 129.70.4.106 at port 135
Connecting to 129.70.4.106 at port 49158
Bad talloc magic value - unknown value
PANIC (pid 16463): Bad talloc magic value - unknown value
BACKTRACE: 20 stack frames:
 #0 winbindd(log_stack_trace+0x1a) [0x7f04b486edda]
 #1 winbindd(smb_panic+0x25) [0x7f04b486eeb5]
 #2 /usr/lib/x86_64-linux-gnu/libtalloc.so.2(talloc_strdup+0x299)
[0x7f04b245dde9]
 #3 winbindd(+0x4ff30d) [0x7f04b4bb230d]
 #4 winbindd(dcerpc_lsa_lookup_sids3+0x2e) [0x7f04b4bb29fe]
 #5 winbindd(winbindd_lookup_sids+0x116) [0x7f04b47af306]
 #6 winbindd(+0xfffa2) [0x7f04b47b2fa2]
 #7 winbindd(+0xeabe2) [0x7f04b479dbe2]
 #8 winbindd(_wbint_LookupGroupMembers+0x5e) [0x7f04b47bc97e]
 #9 winbindd(+0x1139b4) [0x7f04b47c69b4]
 #10 winbindd(winbindd_dual_ndrcmd+0xbc) [0x7f04b47bbf6c]
 #11 winbindd(+0x107cb4) [0x7f04b47bacb4]
 #12 winbindd(+0x108765) [0x7f04b47bb765]
 #13 winbindd(tevent_common_loop_immediate+0xe2) [0x7f04b48807e2]
 #14 winbindd(run_events_poll+0x48) [0x7f04b487e8d8]
 #15 winbindd(+0x1cbcf6) [0x7f04b487ecf6]
 #16 winbindd(_tevent_loop_once+0x90) [0x7f04b487f900]
 #17 winbindd(main+0x78b) [0x7f04b4791a3b]
 #18 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f04b1abaead]
 #19 winbindd(+0xdee91) [0x7f04b4791e91]
dumping core in /var/log/samba/cores/winbindd
getgrgid 167813
getgrgid 165298
getgrgid 127654
getgrgid 146660
getgrgid 177191
getgrgid 175911
getgrgid 174876
getgrgid 146659
getgrgid 175915
getgrgid 172505
getgrgid 172506
getgrgid 174265
getgrgid 174269
getgrgid 181582
getgrgid 125140
Comment 2 Christian Ambach 2012-04-17 16:10:57 UTC
Looks to me like Bug 8807
Comment 3 Jeremy Allison 2012-04-18 04:10:01 UTC
Yes, looks the same to me. Michael, can you test with the patch attached here:

https://bugzilla.samba.org/show_bug.cgi?id=8807

and see if it fixes your problem ?

Thanks.

Jeremy.
Comment 4 Michael Letzgus-Koppmann 2012-04-23 17:25:41 UTC
Yes, it works, the patch for bug 8807  fixes the problem.

No problems nor delays nor high CPU-usage anymore - even with 30000 users. :)

Tested with Debian testing (Samba 3.6.4)

THX,
 Michael
Comment 5 Volker Lendecke 2012-04-23 20:05:09 UTC

*** This bug has been marked as a duplicate of bug 8807 ***