We have gotten several crashes in winbind like the following: #0 0x00007fee31f411d7 in raise () from /lib64/libc.so.6 #1 0x00007fee31f428c8 in abort () from /lib64/libc.so.6 #2 0x00007fee3518463b in dump_core () at ../source3/lib/dumpcore.c:322 #3 0x00007fee35175cf7 in smb_panic_s3 (why=<optimized out>) at ../source3/lib/util.c:814 #4 0x00007fee388a512f in smb_panic (why=why@entry=0x7fee388f26ab "internal error") at ../lib/util/fault.c:166 #5 0x00007fee388a5346 in fault_report (sig=<optimized out>) at ../lib/util/fault.c:83 #6 sig_fault (sig=<optimized out>) at ../lib/util/fault.c:94 #7 <signal handler called> #8 cm_connect_lsa_tcp (domain=domain@entry=0x7fee3adffe60, cli=cli@entry=0x7ffde8610e88, mem_ctx=0x7fee3ae11090) at ../source3/winbindd/winbindd_cm.c:2982 #9 0x00007fee3a783724 in cm_connect_lsat (domain=domain@entry=0x7fee3adffe60, mem_ctx=mem_ctx@entry=0x7fee3ae11090, cli=cli@entry=0x7ffde8610e88, lsa_policy=lsa_policy@entry=0x7ffde8610e90) at ../source3/winbindd/winbindd_cm.c:3232 #10 0x00007fee3a7862b7 in winbindd_lookup_sids (mem_ctx=mem_ctx@entry=0x7fee3ae11090, domain=domain@entry=0x7fee3adffe60, num_sids=num_sids@entry=1, sids=0x7fee3ae1b340, domains=domains@entry=0x7ffde8610f10, names=names@entry=0x7ffde8610f18, types=types@entry=0x7ffde8610f20) at ../source3/winbindd/winbindd_msrpc.c:1103 #11 0x00007fee3a78665c in msrpc_sid_to_name (domain=0x7fee3adffe60, mem_ctx=0x7fee3ae11090, sid=<optimized out>, domain_name=0x7ffde8611040, name=0x7ffde8611048, type=0x7ffde861103c) at ../source3/winbindd/winbindd_msrpc.c:305 #12 0x00007fee3a78949e in sid_to_name (domain=<optimized out>, mem_ctx=<optimized out>, sid=<optimized out>, domain_name=<optimized out>, name=<optimized out>, type=<optimized out>) at ../source3/winbindd/winbindd_ads.c:586 #13 0x00007fee3a78925f in sid_to_name (domain=0x7fee3adffe60, mem_ctx=0x7fee3ae11090, sid=0x7fee3ae1b340, domain_name=0x7ffde8611040, name=0x7ffde8611048, type=0x7ffde861103c) at ../source3/winbindd/winbindd_reconnect_ads.c:126 #14 0x00007fee3a7721de in sid_to_name (domain=0x7fee3adffe60, mem_ctx=0x7fee3ae11090, sid=0x7fee3ae1b340, domain_name=0x7ffde8611040, name=0x7ffde8611048, type=0x7ffde861103c) at ../source3/winbindd/winbindd_cache.c:1989 #15 0x00007fee3a793507 in _wbint_LookupSid (p=p@entry=0x7ffde86110b0, r=r@entry=0x7fee3ae13050) at ../source3/winbindd/winbindd_dual_srv.c:65 #16 0x00007fee3a7dde13 in api_wbint_LookupSid (p=0x7ffde86110b0) at default/librpc/gen_ndr/srv_winbind.c:144 #17 0x00007fee3a793394 in winbindd_dual_ndrcmd (domain=<optimized out>, state=0x7ffde8611348) at ../source3/winbindd/winbindd_dual_ndr.c:322 #18 0x00007fee3a79012c in child_process_request (child=<optimized out>, child=<optimized out>, state=0x7ffde8611348) at ../source3/winbindd/winbindd_dual.c:513 #19 child_handler (ev=<optimized out>, fde=<optimized out>, flags=<optimized out>, private_data=0x7ffde8611340) at ../source3/winbindd/winbindd_dual.c:1393 #20 0x00007fee37e0fc6b in epoll_event_loop (tvalp=0x7ffde8611260, epoll_ev=0x7fee3adffa50) at ../lib/tevent/tevent_epoll.c:728 #21 epoll_event_loop_once (ev=<optimized out>, location=<optimized out>) at ../lib/tevent/tevent_epoll.c:926 #22 0x00007fee37e0e137 in std_event_loop_once (ev=0x7fee3adf47f0, location=0x7fee3a7fec70 "../source3/winbindd/winbindd_dual.c:1592") at ../lib/tevent/tevent_standard.c:114 #23 0x00007fee37e0a38d in _tevent_loop_once (ev=0x7fee3adf47f0, location=location@entry=0x7fee3a7fec70 "../source3/winbindd/winbindd_dual.c:1592") at ../lib/tevent/tevent.c:533 #24 0x00007fee3a7924f8 in fork_domain_child (child=0x7fee3ae000b0) at ../source3/winbindd/winbindd_dual.c:1592 #25 0x00007fee3a792bc5 in wb_child_request_trigger (req=0x7fee3ae08650, private_data=<optimized out>) at ../source3/winbindd/winbindd_dual.c:173 #26 0x00007fee37e0abb4 in tevent_common_loop_immediate (ev=ev@entry=0x7fee3adf47f0) at ../lib/tevent/tevent_immediate.c:135 #27 0x00007fee37e0fa2e in epoll_event_loop_once (ev=0x7fee3adf47f0, location=<optimized out>) at ../lib/tevent/tevent_epoll.c:907 #28 0x00007fee37e0e137 in std_event_loop_once (ev=0x7fee3adf47f0, location=0x7fee3a7e8c48 "../source3/winbindd/winbindd.c:1809") at ../lib/tevent/tevent_standard.c:114 #29 0x00007fee37e0a38d in _tevent_loop_once (ev=0x7fee3adf47f0, location=location@entry=0x7fee3a7e8c48 "../source3/winbindd/winbindd.c:1809") at ../lib/tevent/tevent.c:533 #30 0x00007fee3a758cfc in main (argc=<optimized out>, argv=<optimized out>) at ../source3/winbindd/winbindd.c:1809
Here is the code, it is the same in mainline: if (conn->lsa_pipe_tcp && conn->lsa_pipe_tcp->transport->transport == NCACN_IP_TCP && conn->lsa_pipe_tcp->auth->auth_level >= DCERPC_AUTH_LEVEL_INTEGRITY && rpccli_is_connected(conn->lsa_pipe_tcp)) { goto done; } and here is what conn->lsa_pipe_tcp looks like: (gdb) p *conn->lsa_pipe_tcp $4 = {prev = 0x0, next = 0x0, transport = 0x0, binding_handle = 0x7f15e8bf2220, abstract_syntax = {uuid = {time_low = 305420152, time_mid = 4660, time_hi_and_version = 43981, clock_seq = <incomplete sequence \357>, node = "\001#Eg\211\253"}, if_version = 0}, transfer_syntax = {uuid = { time_low = 2324192516, time_mid = 7403, time_hi_and_version = 4553, clock_seq = "\237", <incomplete sequence \350>, node = "\b\000+\020H`"},
In debugging this with Jeremy, I pointed out that we also see this in the log before the crash but I did not record how long before. [2017/06/13 17:48:16.110444, 0] ../source3/rpc_client/cli_pipe.c:3326(cli_rpc_pipe_open_schannel_with_creds) netlogon_creds_cli_check failed with NT_STATUS_RPC_SEC_PKG_ERROR We (Jeremy and I) noticed that cli_pipe.c:rpc_api_pipe_got_pdu has code like: } else if (NT_STATUS_EQUAL(status, NT_STATUS_RPC_SEC_PKG_ERROR)) { /* * TODO: do a real async disconnect ... * * For now do it sync... */ TALLOC_FREE(state->cli->transport); } Thus there seems like a code path into cm_connect_lsa_tcp where conn->lsa_pipe_tcp->transport is NULL and the fix is to add another term to the check. I have rebuilt our version of Samba and we are running a test now to see if we can still hit the issue.
(In reply to Richard Sharpe from comment #2) I guess we need to change the order and call rpccli_is_connected() first. if (rpccli_is_connected(conn->lsa_pipe_tcp) && conn->lsa_pipe_tcp->transport->transport == NCACN_IP_TCP && conn->lsa_pipe_tcp->auth->auth_level >= DCERPC_AUTH_LEVEL_INTEGRITY) { goto done; }
Good point. I will try that.
Created attachment 13297 [details] Patch for master This is a patch based on Metze's suggestion.
Created attachment 13304 [details] git-am fix for 4.6.next, 4.5.next. Cherry-picked from master.
Re-assigning to Karolin for inclusion in 4.6.next, 4.5.next.
(In reply to Jeremy Allison from comment #7) Pushed to autobuild-v4-{6,5}-test.
(In reply to Karolin Seeger from comment #8) Pushed to both branches. Closing out bug report. Thanks!