From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf =============================================================== PANIC (pid 2643): internal error BACKTRACE: 22 stack frames: #0 /usr/sbin/winbindd(log_stack_trace+0x1a) [0x7ffff7a4d3ca] #1 /usr/sbin/winbindd(smb_panic+0x1f) [0x7ffff7a4d48f] #2 /usr/sbin/winbindd(+0x193dcd) [0x7ffff7a3cdcd] #3 /lib64/libc.so.6(+0x359a832a20) [0x7ffff5342a20] #4 /usr/sbin/winbindd(rpccli_set_timeout+0x4) [0x7ffff7b45874] #5 /usr/sbin/winbindd(winbindd_lookup_names+0xca) [0x7ffff7998c1a] #6 /usr/sbin/winbindd(+0xefdbe) [0x7ffff7998dbe] #7 /usr/sbin/winbindd(+0xf1698) [0x7ffff799a698] #8 /usr/sbin/winbindd(+0xf18a7) [0x7ffff799a8a7] #9 /usr/sbin/winbindd(+0xde20c) [0x7ffff798720c] #10 /usr/sbin/winbindd(_wbint_LookupName+0x47) [0x7ffff79a4bd7] #11 /usr/sbin/winbindd(+0x105491) [0x7ffff79ae491] #12 /usr/sbin/winbindd(winbindd_dual_ndrcmd+0xdc) [0x7ffff79a47bc] #13 /usr/sbin/winbindd(+0xfa85b) [0x7ffff79a385b] #14 /usr/sbin/winbindd(+0xfafeb) [0x7ffff79a3feb] #15 /usr/sbin/winbindd(tevent_common_loop_immediate+0xd7) [0x7ffff7a5e5b7] #16 /usr/sbin/winbindd(run_events+0x4b) [0x7ffff7a5cd8b] #17 /usr/sbin/winbindd(+0x1b40e7) [0x7ffff7a5d0e7] #18 /usr/sbin/winbindd(_tevent_loop_once+0x90) [0x7ffff7a5d580] #19 /usr/sbin/winbindd(main+0x92b) [0x7ffff797a5eb] #20 /lib64/libc.so.6(__libc_start_main+0xfd) [0x7ffff532ec5d] #21 /usr/sbin/winbindd(+0xcf079) [0x7ffff7978079] dumping core in /var/log/samba/cores/winbindd Not entirely sure how I triggered this; I installed the latest Fedora test package (with bug #7589 and bug #7590 fixed) and was playing with re-joining the domain with different host names and watching whether the DNS got updated. I was running in gdb to try to catch it, but it still escaped. I'll have to play with the fork-follows-child setting, presumably. Will debug more later if it's not immediately obvious to someone who knows the code...
To catch these things with gdb, set panic action = /bin/sleep 999999 You can then remotely attach to the parent of the sleep process and do the "bt full". Volker
process_request: Handling async request 18811:GETPWNAM getpwnam root Entry has wrong sequence number: 29970033 Need to read 40 extra bytes child daemon request 63 child_process_request: request fn NDRCMD winbindd_dual_ndrcmd: Running command WBINT_LOOKUPNAME (CORP) wbint_LookupName: struct wbint_LookupName in: struct wbint_LookupName domain : * domain : 'GER' name : * name : 'ROOT' flags : 0x00000008 (8) refresh_sequence_number: CORP time ok refresh_sequence_number: CORP seq number is now 30032694 centry_expired: Key NS/GER/ROOT for domain CORP expired wcache_fetch: entry NS/GER/ROOT expired for domain CORP name_to_sid: [Cached] - doing backend query for name for domain CORP rpc: name_to_sid name=GER\ROOT name_to_sid [rpc] GER\ROOT for domain GER cm_connect_lsa_tcp wcache_tdc_fetch_domain: Searching for domain CORP wcache_tdc_fetch_domain: Found domain CORP wcache_tdc_fetch_domain: Searching for domain CORP wcache_tdc_fetch_domain: Found domain CORP =============================================================== INTERNAL ERROR: Signal 11 in pid 2643 (3.5.4-65.fc13)
Can't reproduce at will. But stupid me, it gave me a core file to look at so maybe I don't need to. (gdb) bt #0 0x00007ffff53429a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007ffff5344185 in abort () at abort.c:92 #2 0x00007ffff7a3c9d0 in dump_core () at lib/fault.c:337 #3 0x00007ffff7a4d4cd in smb_panic (why=<value optimized out>) at lib/util.c:1481 #4 0x00007ffff7a3cdcd in fault_report (sig=11) at lib/fault.c:52 #5 sig_fault (sig=11) at lib/fault.c:75 #6 <signal handler called> #7 rpccli_set_timeout (rpc_cli=0x0, timeout=35000) at rpc_client/cli_pipe.c:3041 #8 0x00007ffff7998c1a in winbindd_lookup_names (mem_ctx=0x7ffff82328d0, domain=<value optimized out>, num_names=1, names=0x7fffffffbfe8, domains=0x0, sids=0x7fffffffbff8, types=0x7fffffffbff0) at winbindd/winbindd_rpc.c:1285 #9 0x00007ffff7998dbe in msrpc_name_to_sid (domain=0x7ffff82a7740, mem_ctx=0x7ffff82328d0, domain_name=0x7ffff82c8f70 "GER", name=<value optimized out>, flags=<value optimized out>, sid=0x7ffff82d8030, type=0x7ffff82c6eb0) at winbindd/winbindd_rpc.c:314 #10 0x00007ffff799a698 in name_to_sid (domain=0x7ffff82a7740, mem_ctx=0x7ffff82328d0, domain_name=0x7ffff82c8f70 "GER", name=0x7ffff829ecb0 "ROOT", flags=8, sid=0x7ffff82d8030, type=0x7ffff82c6eb0) at winbindd/winbindd_reconnect.c:139 ---Type <return> to continue, or q <return> to quit--- #11 0x00007ffff799a8a7 in name_to_sid (domain=<value optimized out>, mem_ctx=<value optimized out>, domain_name=<value optimized out>, name=<value optimized out>, flags=<value optimized out>, sid=<value optimized out>, type=0x7ffff82c6eb0) at winbindd/winbindd_ads.c:413 #12 0x00007ffff798720c in name_to_sid (domain=0x7ffff82a7740, mem_ctx=0x7ffff82328d0, domain_name=0x7ffff82c8f70 "GER", name=0x7ffff829ecb0 "ROOT", flags=8, sid=0x7ffff82d8030, type=0x7ffff82c6eb0) at winbindd/winbindd_cache.c:1798 #13 0x00007ffff79a4bd7 in _wbint_LookupName (p=0x7fffffffc1a0, r=0x7ffff82d82f0) at winbindd/winbindd_dual_srv.c:66 #14 0x00007ffff79ae491 in api_wbint_LookupName (p=0x7fffffffc1a0) at librpc/gen_ndr/srv_wbint.c:232 #15 0x00007ffff79a47bc in winbindd_dual_ndrcmd (domain=<value optimized out>, state=0x7fffffffc480) at winbindd/winbindd_dual_ndr.c:271 #16 0x00007ffff79a385b in child_process_request (child=0x7ffff82a7cc8) at winbindd/winbindd_dual.c:480 #17 fork_domain_child (child=0x7ffff82a7cc8) at winbindd/winbindd_dual.c:1522 #18 0x00007ffff79a3feb in wb_child_request_trigger (req=0x7ffff8232160, private_data=<value optimized out>) at winbindd/winbindd_dual.c:137 #19 0x00007ffff7a5e5b7 in tevent_common_loop_immediate (ev=0x7ffff8295850) at ../lib/tevent/tevent_immediate.c:135 #20 0x00007ffff7a5cd8b in run_events (ev=0x7ffff8295850, selrtn=0, ---Type <return> to continue, or q <return> to quit--- read_fds=0x0, write_fds=0x0) at lib/events.c:99 #21 0x00007ffff7a5d0e7 in s3_event_loop_once (ev=0x7ffff8295850, location=<value optimized out>) at lib/events.c:192 #22 0x00007ffff7a5d580 in _tevent_loop_once (ev=0x7ffff8295850, location=0x7ffff7dcff97 "winbindd/winbindd.c:1275") at ../lib/tevent/tevent.c:497 #23 0x00007ffff797a5eb in main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at winbindd/winbindd.c:1275 (gdb) up 7 #7 rpccli_set_timeout (rpc_cli=0x0, timeout=35000) at rpc_client/cli_pipe.c:3041 3041 if (rpc_cli->transport == NULL) { (gdb) p rpc_cli $1 = (struct rpc_pipe_client *) 0x0 (gdb) up #8 0x00007ffff7998c1a in winbindd_lookup_names (mem_ctx=0x7ffff82328d0, domain=<value optimized out>, num_names=1, names=0x7fffffffbfe8, domains=0x0, sids=0x7fffffffbff8, types=0x7fffffffbff0) at winbindd/winbindd_rpc.c:1285 1285 orig_timeout = rpccli_set_timeout(cli, 35000); Can either cm_connect_lsa() or cm_connect_lsa_tcp() be returning an OK status without setting cli?
cm_connect_lsa_tcp() can potentially return NT_STATUS_OK without filling in *cli if cm_get_schannel_creds() fails. I'll attach a patch for this for you to test. Jeremy.
Jeremy, please look at the two commits in http://git.samba.org/?p=gd/samba/.git;a=shortlog;h=master-schannel before digging deeper.
Not the right fix (although I appreciate the error message you used :-). My patch changes cm_get_schannel_creds() to return NTSTATUS, which is what we should be doing here. Sorry, you'll have to rebase :-). Jeremy.
Ok, ok :) But please look at the 2nd commit, this is really a huge problem for multiple schanneld connections.
Created attachment 5916 [details] Plain patch fro 3.5.x.
Created attachment 5917 [details] Guenther's patch for 3.5.x to fix the allocation problem.
Created attachment 5918 [details] gt-am format patch for 3.5.x.
Unfortunately, gd's patch breaks the s4 WAF build due to : ../../libcli/auth/credentials.c: In function ‘netlogon_creds_copy’: ../../libcli/auth/credentials.c:485: error: implicit declaration of function ‘sid_dup_talloc’ ../../libcli/auth/credentials.c:485: warning: assignment makes pointer from integer without a cast Will post a fix shortly. Jeremy.
Created attachment 5920 [details] git am format patch for 3.4.x.
Created attachment 5921 [details] git am fix for 3.5.x - part1
Created attachment 5922 [details] git am fix for 3.5.x - part2
David Woodhouse <dwmw2@infradead.org> - please apply the two attachments marked as "git am fix for 3.5.x - part1", and "part2". These should fix the issue for you. Please report back that they have fixed the problem. Guenther, if you're satisfied with the patches for 3.5.x and 3.4.x please re-assign to Karolin for inclusion. Jeremy
Thanks. Given that I didn't really know how to reliably reproduce it, is there something I should be looking for in the logs which tells me it *would* have happened?
Errr - no, sorry :-). It's the absence of a crash. Rather like one hand clapping :-). Jeremy.
Hm, then I'll tell you in a week or so if it hasn't crashed :)
Karolin, please add to 3.4 (that one patch) and to 3.5 (the two patches)
Pushed to both branches. Closing out bug report. Please feel free to reopen if it's still an issue. Thanks!
*** Bug 8062 has been marked as a duplicate of this bug. ***