Created attachment 18763 [details] Logs with a full stack trace. 2025-10-24T01:58:52.551756+00:00 addc.addom.samba.example.com winbindd[306061]: wbd_ping_dc_done: dcerpc_wbint_PingDc_recv failed for domain: TORTURE305 - NT _STATUS_DOMAIN_CONTROLLER_NOT_FOUND 2025-10-24T01:58:52.551854+00:00 addc.addom.samba.example.com winbindd[306061]: free_domain: Free updated domain[0x58ce4dc1a4d0] name[TORTURE305] S-1-5-21-97 398-379795-305 replaced by domain[0x58ce4cdb7790] name[TORTURE305] 2025-10-24T01:58:52.558471+00:00 addc.addom.samba.example.com winbindd[306061]: Bad talloc magic value - unknown value 2025-10-24T01:58:52.558544+00:00 addc.addom.samba.example.com winbindd[306061]: =============================================================== 2025-10-24T01:58:52.558558+00:00 addc.addom.samba.example.com winbindd[306061]: INTERNAL ERROR: Bad talloc magic value - unknown value in winbindd () () pid 306061 (4.24.0pre1-DEVELOPERBUILD) 2025-10-24T01:58:52.558573+00:00 addc.addom.samba.example.com winbindd[306061]: If you are running a recent Samba version, and if you think this problem is n ot yet fixed in the latest versions, please consider reporting this bug, see https://wiki.samba.org/index.php/Bug_Reporting 2025-10-24T01:58:52.558588+00:00 addc.addom.samba.example.com winbindd[306061]: =============================================================== 2025-10-24T01:58:52.558598+00:00 addc.addom.samba.example.com winbindd[306061]: PANIC (pid 306061): Bad talloc magic value - unknown value in 4.24.0pre1-DEVE LOPERBUILD 2025-10-24T01:58:52.558772+00:00 addc.addom.samba.example.com winbindd[306061]: BACKTRACE: 16 stack frames: #0 bin/shared/private/libgenrand-private-samba.so(log_stack_trace+0x29) [0x7398e741ce59] #1 bin/shared/private/libgenrand-private-samba.so(smb_panic_log+0x256) [0x7398e741ce26] #2 bin/shared/private/libgenrand-private-samba.so(smb_panic+0x15) [0x7398e741cfe5] #3 bin/shared/private/libtalloc-private-samba.so(+0x9dca) [0x7398e7a60dca] #4 bin/shared/private/libtalloc-private-samba.so(+0x9d80) [0x7398e7a60d80] #5 bin/shared/private/libtalloc-private-samba.so(+0x497d) [0x7398e7a5b97d] #6 bin/shared/private/libtalloc-private-samba.so(+0x5ad5) [0x7398e7a5cad5] #7 bin/shared/private/libtalloc-private-samba.so(talloc_check_name+0x3c) [0x7398e7a5cb8c] #8 bin/shared/private/libtevent-private-samba.so(+0x1a7ac) [0x7398e83007ac] #9 bin/shared/private/libtevent-private-samba.so(+0x17e18) [0x7398e82fde18] #10 bin/shared/private/libtevent-private-samba.so(+0x16120) [0x7398e82fc120] #11 bin/shared/private/libtevent-private-samba.so(_tevent_loop_once+0x101) [0x7398e82f1861] #12 /data/samba/samba01/bin/winbindd(main+0x1b61) [0x58ce3a307ff1] #13 /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca) [0x7398e662a1ca] #14 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b) [0x7398e662a28b] #15 /data/samba/samba01/bin/winbindd(_start+0x25) [0x58ce3a27d945]
Running make TESTS="samba4.rpc.lsa" test in a loop will trigger the crash. It appears to be a race condition between. source3/windbindd/winbindd_util.c terminate_child which kills the child process, and frees the child monitor_fde. kill(c->pid, SIGTERM); c->pid = 0; if (c->sock != -1) { close(c->sock); // } // c->sock = -1; // DBG_ERR("Freed c->monitor_fde (%p), pid (%d)\n", // c->monitor_fde, c->pid); // TALLOC_FREE(c->monitor_fde); and lib/tevent/tevent_epoll.c epoll_event_loop line 632 struct tevent_fd *fde = talloc_get_type(events[i].data.ptr, struct tevent_fd); The kill makes the child socked readable as the child process has gone away. The TALLOC_FREE(c->monitor_fde);
Sigh, lets try that againn :-) Running make TESTS="samba4.rpc.lsa" test in a loop will trigger the crash. It appears to be a race condition between. source3/windbindd/winbindd_util.c terminate_child which kills the child process, and frees the child monitor_fde. kill(c->pid, SIGTERM); c->pid = 0; if (c->sock != -1) { close(c->sock); } c->sock = -1; TALLOC_FREE(c->monitor_fde); and lib/tevent/tevent_epoll.c epoll_event_loop line 632 struct tevent_fd *fde = talloc_get_type(events[i].data.ptr, struct tevent_fd); The kill makes the child socked readable as the child process has gone away, which has: source3/windbindd/winbindd_dual.c child_socket_readable registered events[i].data.ptr points to c->monitor_fde
Except this code is all synchronous, and the talloc destructor removes the FD from the epoll list.
But the the epoll_wait is returning an event that points to freed memory. Need to find out where that's coming from.
https://gitlab.com/samba-team/samba/-/merge_requests/4283 also has some details...
Gary, what OS and kernel is this on?
6.14.0-35-generic #35~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Oct 14 13:55:17 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
But I did see a failure in CI, which is what started me down the rabbit hole.
I see pretty often samba-ad-dc-4b failing with WBC_ERR_WINBIND_NOT_AVAILABLE: https://gitlab.com/samba-team/devel/samba/-/jobs/12368740677
(In reply to Andreas Schneider from comment #9) In case this bug lasts longer than the CI log, here are some relevant lines > Pulling docker image registry.gitlab.com/samba-team/devel/samba/samba-ci-ubuntu2204:336927a79f09b3eb729c64872bf4eca3e2f6761f > Linux runner-xs6vzpvoq-project-6378020-concurrent-0 5.15.154+ #1 SMP Sat May 4 12:14:42 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux > ==> /builds/samba-team/devel/samba/samba-ad-dc-4b.stdout <== > [149(913)/192 at 3m41s] samba4.blackbox.kinit_trust(fl2008r2dc:local) > > ==> /builds/samba-team/devel/samba/samba-ad-dc-4b.stderr <== > 2025-12-09T09:53:01.701228+00:00 dc7.samba2008r2.example.com samba[574]: winbindd daemon died with exit status 6 > 2025-12-09T09:53:01.701276+00:00 dc7.samba2008r2.example.com samba[574]: task_server_terminate: task_server_terminate: [winbindd child process exited] > 2025-12-09T09:53:01.702712+00:00 dc7.samba2008r2.example.com samba[559]: samba_terminate: samba_terminate of samba 559: winbindd child process exited > > ==> /builds/samba-team/devel/samba/samba-ad-dc-4b.stdout <== > UNEXPECTED(failure): samba4.blackbox.kinit_trust.wbinfo check outgoing trust pw(fl2008r2dc:local) > REASON: Exception: Exception: failed to call wbcCheckTrustCredentials: WBC_ERR_WINBIND_NOT_AVAILABLE
Created attachment 18792 [details] Proposed fix (version 1)
It looks to be a race condition between the child socket closing, and it being de-registered from epoll.
Created attachment 18796 [details] Proposed fix (version 2) Updated the commit title
(In reply to Gary Lockyer from comment #13) Looks good, but why did you close the merge request and didn't push the updated patch there?
(In reply to Stefan Metzmacher from comment #14) Ok, looked at the wrong MR
This bug was referenced in samba master: a3684a2284cdf421090d6064b720b81b05b6eae6
Created attachment 18798 [details] patch for 4.23 Backport to 4.23 is trivial; beyond that looks tricky.
For 4.23.
This bug was referenced in samba v4-23-test: 36f0300cda5989c948801fba0f8b0b64066f54a9
This bug was referenced in samba v4-23-stable (Release samba-4.23.5): 36f0300cda5989c948801fba0f8b0b64066f54a9