Bug 14839 - _tevent_schedule_immediate memory error coredump
Summary: _tevent_schedule_immediate memory error coredump
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Winbind (show other bugs)
Version: 4.11.7
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Samba QA Contact
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-09-18 08:28 UTC by zhangjingjun
Modified: 2021-09-19 02:54 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description zhangjingjun 2021-09-18 08:28:36 UTC
coredump stack

(gdb) bt
#0  0x00007f606336d1f7 in raise () from /lib64/libc.so.6
#1  0x00007f606336e8e8 in abort () from /lib64/libc.so.6
#2  0x00007f606a27a719 in dump_core () at ../../source3/lib/dumpcore.c:338
#3  0x00007f606a252518 in smb_panic_s3 (why=<optimized out>) at ../../source3/lib/util.c:847
#4  0x00007f60716b2fbd in smb_panic (why=why@entry=0x7f60717031f9 "internal error") at ../../lib/util/fault.c:174
#5  0x00007f60716b31fe in fault_report (sig=<optimized out>) at ../../lib/util/fault.c:88
#6  sig_fault (sig=11) at ../../lib/util/fault.c:99
#7  <signal handler called>
#8  0x00007f6072ce0923 in _tevent_schedule_immediate (im=0x55c681c28f70, ev=0x55c681bc8ba0, handler=0x7f6072ce1ec0 <tevent_req_trigger>, private_data=0x55c681c28e80,
    handler_name=0x7f6072ce81a3 "tevent_req_trigger", location=0x7f6072ce7ea0 "../../lib/tevent/tevent_req.c:258") at ../../lib/tevent/tevent.c:670
#9  0x00007f6072ce1d45 in tevent_req_post (req=req@entry=0x55c681c28e80, ev=<optimized out>) at ../../lib/tevent/tevent_req.c:257
#10 0x00007f6072ce1d66 in _tevent_req_notify_callback (req=0x55c681c28e80, location=<optimized out>) at ../../lib/tevent/tevent_req.c:136
#11 0x00007f6072ce1e28 in tevent_req_finish (req=<optimized out>, state=<optimized out>, location=<optimized out>) at ../../lib/tevent/tevent_req.c:193
#12 0x00007f6072ce1e72 in _tevent_req_error (req=req@entry=0x55c681c28e80, error=error@entry=125, location=location@entry=0x7f6065dbc2e0 "../../lib/async_req/async_sock.c:331")
    at ../../lib/tevent/tevent_req.c:211
#13 0x00007f6065dbad84 in writev_cancel (req=0x55c681c28e80) at ../../lib/async_req/async_sock.c:331
#14 0x00007f6067e7a16a in smbXcli_req_cancel_write_req (req=req@entry=0x55c681c97100) at ../../libcli/smb/smbXcli_base.c:908
#15 0x00007f6067e7da04 in smbXcli_req_unset_pending (req=req@entry=0x55c681c97100) at ../../libcli/smb/smbXcli_base.c:966
#16 0x00007f6067e7d6de in smbXcli_conn_disconnect (conn=0x55c681bcc660, status=...) at ../../libcli/smb/smbXcli_base.c:1280
#17 0x00007f6069fc7d00 in _cli_shutdown (cli=0x55c681bd82a0) at ../../source3/libsmb/clientgen.c:272
#18 0x00007f6069fc7d95 in cli_shutdown (cli=<optimized out>) at ../../source3/libsmb/clientgen.c:300
#19 0x000055c680a14117 in invalidate_cm_connection (domain=domain@entry=0x55c681c6e060) at ../../source3/winbindd/winbindd_cm.c:2114
#20 0x000055c680a2a828 in reset_cm_connection_on_error (domain=domain@entry=0x55c681c6e060, b=b@entry=0x0, status=status@entry=...) at ../../source3/winbindd/winbindd_dual_srv.c:63
#21 0x000055c680a1dc54 in reconnect_need_retry (status=status@entry=..., domain=domain@entry=0x55c681c6e060) at ../../source3/winbindd/winbindd_reconnect.c:74
#22 0x000055c680a1e239 in query_user_list (domain=0x55c681c6e060, mem_ctx=0x55c681bd05a0, rids=0x7ffda75de700) at ../../source3/winbindd/winbindd_reconnect.c:89
#23 0x000055c680a042af in wb_cache_query_user_list (domain=domain@entry=0x55c681c6e060, mem_ctx=0x55c681bd05a0, prids=0x55c683358d88) at ../../source3/winbindd/winbindd_cache.c:1556
#24 0x000055c680a2bd62 in _wbint_QueryUserRidList (p=p@entry=0x7ffda75de7b0, r=r@entry=0x55c6839887d0) at ../../source3/winbindd/winbindd_dual_srv.c:594
#25 0x000055c6809efc50 in api_wbint_QueryUserRidList (p=0x7ffda75de7b0) at librpc/gen_ndr/srv_winbind.c:1187
#26 0x000055c680a2a4a7 in winbindd_dual_ndrcmd (domain=0x55c681c6e060, state=0x7ffda75dea88) at ../../source3/winbindd/winbindd_dual_ndr.c:399
#27 0x000055c680a259a1 in child_process_request (child=<optimized out>, child=<optimized out>, state=0x7ffda75dea88) at ../../source3/winbindd/winbindd_dual.c:807
#28 child_handler (ev=<optimized out>, fde=<optimized out>, flags=<optimized out>, private_data=0x7ffda75dea80) at ../../source3/winbindd/winbindd_dual.c:1748
#29 0x00007f6072ce12a5 in tevent_common_invoke_fd_handler (fde=fde@entry=0x55c681bd4d60, flags=<optimized out>, removed=removed@entry=0x0) at ../../lib/tevent/tevent_fd.c:110
#30 0x00007f6072ce7677 in epoll_event_loop (tvalp=0x7ffda75de990, epoll_ev=0x55c681bd6600) at ../../lib/tevent/tevent_epoll.c:736
#31 epoll_event_loop_once (ev=<optimized out>, location=<optimized out>) at ../../lib/tevent/tevent_epoll.c:937
#32 0x00007f6072ce5667 in std_event_loop_once (ev=0x55c681bc8060, location=0x55c680aa4490 "../../source3/winbindd/winbindd_dual.c:1963") at ../../lib/tevent/tevent_standard.c:110
#33 0x00007f6072ce0a9d in _tevent_loop_once (ev=0x55c681bc8060, location=location@entry=0x55c680aa4490 "../../source3/winbindd/winbindd_dual.c:1963") at ../../lib/tevent/tevent.c:772
#34 0x000055c680a28aa4 in fork_domain_child (child=0x55c681c70060) at ../../source3/winbindd/winbindd_dual.c:1963
#35 0x000055c680a29285 in wb_child_request_waited (subreq=0x0) at ../../source3/winbindd/winbindd_dual.c:245
#36 0x00007f6072ce1579 in tevent_common_invoke_immediate_handler (im=0x55c681c74120, removed=removed@entry=0x0) at ../../lib/tevent/tevent_immediate.c:129
#37 0x00007f6072ce15b0 in tevent_common_loop_immediate (ev=ev@entry=0x55c681bc8060) at ../../lib/tevent/tevent_immediate.c:151
#38 0x00007f6072ce742d in epoll_event_loop_once (ev=0x55c681bc8060, location=<optimized out>) at ../../lib/tevent/tevent_epoll.c:918
#39 0x00007f6072ce5667 in std_event_loop_once (ev=0x55c681bc8060, location=0x55c680a89018 "../../source3/winbindd/winbindd.c:2008") at ../../lib/tevent/tevent_standard.c:110
#40 0x00007f6072ce0a9d in _tevent_loop_once (ev=0x55c681bc8060, location=location@entry=0x55c680a89018 "../../source3/winbindd/winbindd.c:2008") at ../../lib/tevent/tevent.c:772
#41 0x000055c6809eab88 in main (argc=<optimized out>, argv=<optimized out>) at ../../source3/winbindd/winbindd.c:2008

(gdb) f 8
#8  0x00007f6072ce0923 in _tevent_schedule_immediate (im=0x55c681c28f70, ev=0x55c681bc8ba0, handler=0x7f6072ce1ec0 <tevent_req_trigger>, private_data=0x55c681c28e80,
    handler_name=0x7f6072ce81a3 "tevent_req_trigger", location=0x7f6072ce7ea0 "../../lib/tevent/tevent_req.c:258") at ../../lib/tevent/tevent.c:670
670             ev->ops->schedule_immediate(im, ev, handler, private_data,
(gdb) p ev
$1 = (struct tevent_context *) 0x55c681bc8ba0
(gdb) p *(struct talloc_chunk*)((unsigned long long) 0x55c681bc8ba0 - ((sizeof(struct talloc_chunk) + 15) & ~15))
$2 = {flags = 2176617920, next = 0x5f696c6358626d73, prev = 0x72747365645f706e, parent = 0x63203a726f746375, child = 0x65736f6c635f696c, refs = 0x2064656c69616620, destructor = 0x2065706970206e6f,
  name = 0x202e63707261736c <Address 0x202e63707261736c out of bounds>, size = 7023117819775709765, limit = 0x4154535f544e2073, pool = 0x41564e495f535554}
(gdb)

The ev memory has been released but is still being used in the pending queue

I added some logs showing:
part1 list user
14505795:[2021/09/16 22:35:13.560412, 10, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (smbXcli_req_set_pending)   enter req:0x55c681c97100
14505796:[2021/09/16 22:35:13.560422, 10, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (smbXcli_req_set_pending)   set req:0x55c681c97100 num_pending:0 smbXcli_req_cleanup:0x7f6067e7fb20 smbXcli_req_cancel:0x7f6067e83870
14505797:[2021/09/16 22:35:13.560434, 10, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (smbXcli_req_set_pending) [2021/09/16 22:35:13.678320,  4, pid=1697356, effective(0, 0), real(0, 0), class=winbind] 1697356 (child_handler)   child daemon request 14
14522824:[2021/09/16 22:35:23.568439,  5, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (smb2_signing_sign_pdu)   signed SMB2 message
14522825:[2021/09/16 22:35:23.568457, 10, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (smbXcli_req_set_pending)   enter req:0x55c681c97100
14522826:[2021/09/16 22:35:23.568466, 10, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (smbXcli_req_set_pending)   set req:0x55c681c97100 num_pending:0 smbXcli_req_cleanup:0x7f6067e7fb20 smbXcli_req_cancel:0x7f6067e83870
14522827:[2021/09/16 22:35:23.568482, 10, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (smbXcli_req_set_pending)   exit trueget_complete_frag failed: NT_STATUS_IO_TIMEOUT
14522828:[2021/09/16 22:35:23.568514, 10, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (dcerpc_binding_handle_raw_call_done)   enter subreq:0x55c681c29580
14522829:[2021/09/16 22:35:23.568524, 10, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (dcerpc_binding_handle_call_done)   enter subreq:0x55c681c93200
14522830:[2021/09/16 22:35:23.568531, 10, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (dcerpc_binding_handle_raw_call_recv)   enter req:0x55c681c93200 mem_ctx:0x55c681c28930 out_data:0x55c681c28968 out_length:0x55c681c28970 out_flags:0x7ffda75de064
14522831:[2021/09/16 22:35:23.568540,  0, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (dcerpc_binding_handle_raw_call_recv)   exit req:0x55c681c93200 status:NT_STATUS_IO_TIMEOUT
14522832:[2021/09/16 22:35:23.568549,  0, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (dcerpc_binding_handle_call_done)   exit dcerpc_binding_handle_raw_call_recv status:NT_STATUS_IO_TIMEOUT
14522833:[2021/09/16 22:35:23.568558,  0, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (dcerpc_binding_handle_raw_call_done)   exit raw_call_recv status:NT_STATUS_IO_TIMEOUT
14522834:[2021/09/16 22:35:23.568567, 10, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (dcerpc_binding_handle_call)   exit subreq:0x55c681c28780 status:NT_STATUS_IO_TIMEOUT
14522835:[2021/09/16 22:35:23.568591,  5, pid=1688209, effective(0, 0), real(0, 0), class=winbind] 1688209 (reconnect_need_retry)   enter domain:TT status:NT_STATUS_IO_TIMEOUT
14522836:[2021/09/16 22:35:23.568602,  5, pid=1688209, effective(0, 0), real(0, 0), class=winbind] 1688209 (reset_cm_connection_on_error)   enter domain:TT b:(nil) status:NT_STATUS_IO_TIMEOUT
14522840:[2021/09/16 22:35:23.569070,  1, pid=1688209, effective(0, 0), real(0, 0), class=rpc_parse] 1688209 (ndr_print_function_debug)        samr_Close: struct samr_Close

part2
14522905:[2021/09/16 22:35:23.570890, 10, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (smbXcli_req_set_pending)   enter req:0x55c681c97c00
14522906:[2021/09/16 22:35:23.570895, 10, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (smbXcli_req_set_pending)   set req:0x55c681c97c00 num_pending:1 smbXcli_req_cleanup:0x7f6067e7fb20 smbXcli_req_cancel:0x7f6067e83870
14522907:[2021/09/16 22:35:23.570900, 10, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (smbXcli_req_set_pending)   status pending[0] = 0x55c681c97100
14522908:[2021/09/16 22:35:23.570905, 10, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (smbXcli_req_set_pending)   exit trueexit ev:0x55c681bc8ce0 set call back dcerpc_binding_handle_raw_call_done req:0x55c681c93200
14522909:[2021/09/16 22:35:23.570911, 10, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (dcerpc_binding_handle_call_send)   exit set_callback dcerpc_binding_handle_call_done req:0x55c681c28780
14523114:[2021/09/16 22:35:33.580413, 10, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (smbXcli_req_unset_pending)   enter req:0x55c681c97c00
14523115:[2021/09/16 22:35:33.580435,  5, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (smbXcli_req_cancel_write_req)   enter req:0x55c681c97c00 conn:0x55c681bcc660 num_pending:2 state_write_req:0x55c681c29900

part3 lsa_Close
14523389:[2021/09/16 22:35:43.590540,  0, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (dcerpc_binding_handle_raw_call_recv)   exit req:0x55c681c93200 status:NT_STATUS_IO_TIMEOUT
14523390:[2021/09/16 22:35:43.590546,  0, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (dcerpc_binding_handle_call_done)   exit dcerpc_binding_handle_raw_call_recv status:NT_STATUS_IO_TIMEOUT
14523391:[2021/09/16 22:35:43.590552,  0, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (dcerpc_binding_handle_raw_call_done)   exit raw_call_recv status:NT_STATUS_IO_TIMEOUT
14523392:[2021/09/16 22:35:43.590558, 10, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (dcerpc_binding_handle_call)   exit subreq:0x55c681c28780 status:NT_STATUS_IO_TIMEOUT
14523393:[2021/09/16 22:35:43.590600,  0, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (smbXcli_conn_disconnect)   enter conn:0x55c681bcc660 status:NT_STATUS_OK
14523394:[2021/09/16 22:35:43.590608,  5, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (smbXcli_conn_disconnect)   get pending req:0x55c681c97100
14523395:[2021/09/16 22:35:43.590612, 10, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (smbXcli_req_unset_pending)   enter req:0x55c681c97100
14523396:[2021/09/16 22:35:43.590617,  5, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (smbXcli_req_cancel_write_req)   enter req:0x55c681c97100 conn:0x55c681bcc660 num_pending:1 state_write_req:0x55c681c28e80
14523397:[2021/09/16 22:35:43.590630,  0, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (fault_report)   ===============================================================
14523398:[2021/09/16 22:35:43.590636,  0, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (fault_report)   INTERNAL ERROR: Signal 11 in pid 1688209 (4.11.7)
14523400:[2021/09/16 22:35:43.590642,  0, pid=1688209, effective(0, 0), real(0, 0)] 1688209 (fault_report)   ===============================================================

dcerpc_binding_handle_call The execution of the operation timed out, exit the release ev was released when the frame was placed. But req:0x55c681c97100 did not unset and caused an error.

dcerpc_binding_handle_call need change

status = dcerpc_binding_handle_call_recv(subreq);
if (!NT_STATUS_IS_OK(status)) {
     talloc_move(h, ev);
     goto fail;
}
Comment 1 Ralph Böhme 2021-09-18 08:39:24 UTC
Thanks for the detailed analysis. One question: how do you trigger this?
Comment 2 zhangjingjun 2021-09-18 09:16:36 UTC
My environment often appears, AD domain has 60,000 users and 60,000 user groups.
Poor network and more clients
Comment 3 Andrew Bartlett 2021-09-18 10:11:29 UTC
This reminds me of bug #14710 which in turn fixes a regression in the attempts to avoid large NDR arrays taking a lot of time to process.  That was
Comment 4 Andrew Bartlett 2021-09-18 10:13:23 UTC
... not in Samba 4.11, but the slow processing of large NDR arrays was.

Of course the tevent code shouldn't segfault, but just to say that it might be reproducing because we are very slow in winbindd handling some of the packets in such a large domain.
Comment 5 zhangjingjun 2021-09-19 02:54:43 UTC
bug #14710 I have analyzed it. My environment has sufficient memory and there are no more than 65536 user groups and users.
The winfo -g command has been able to execute normally

From the log , I think that the exception handling of get_complete_frag() is not enough, and write_req is not unset at TIMEOUT. I normally check the logic of error in get_complete_frag, hoping to solve this problem from this piece