The segfaults happens in cleanupd_init() which calls tevent_req_poll() and when we try to derefence the event context we segfault. Looking at the two backtraces at https://bugzilla.redhat.com/show_bug.cgi?id=1375973 the segfaults indicate that the event contest is freed in between.
The same problem occurs in Mageia 6 development distro, whose bug report is this one: https://bugs.mageia.org/show_bug.cgi?id=19356 The problem occurs with both samba 4.4.5 as well as 4.5.0 with latest libtevent 0.9.30; downgrading to libtevent 0.9.29 doesn't cause samba to crash, so probably the problems is due to the latest changes (from a quick diff I spotted a massive adding of multithreaded code). Can some libtevent expert have a look?
We need a valgrind log with debug symbols installed. valgrind --tool=memcheck -v --num-callers=20 --track-origins=yes --log-file=smbd-valgrind.log /usr/sbin/smbd I already got one without debug symbols which at least detects the error, but we need one with debug symbols to get line numbers!
The Red Hat bug contains a valgrind log: ==18185== Invalid read of size 8 ==18185== at 0x8B63168: tevent_timeval_is_zero (tevent_timed.c:107) ==18185== by 0x8B63441: tevent_common_loop_timer_delay (tevent_timed.c:304) ==18185== by 0x7583298: run_events_poll (events.c:199) ==18185== by 0x7583436: s3_event_loop_once (events.c:303) ==18185== by 0x8B5EA9C: _tevent_loop_once (tevent.c:680) ==18185== by 0x8B5FE02: tevent_req_poll (tevent_req.c:264) ==18185== by 0x1140D2: cleanupd_init (server.c:626) ==18185== by 0x10E6A6: main (server.c:1860) ==18185== Address 0x1780ce70 is 16 bytes after a block of size 112 alloc'd ==18185== at 0x4C2DB8D: malloc (vg_replace_malloc.c:299) ==18185== by 0x894CAFD: __talloc_with_prefix (talloc.c:675) ==18185== by 0x894CAFD: __talloc (talloc.c:716) ==18185== by 0x894CAFD: _talloc_named_const (talloc.c:873) ==18185== by 0x894CAFD: _talloc_zero (talloc.c:2318) ==18185== by 0x7582DC9: tevent_get_poll_private.part.1 (events.c:45) ==18185== by 0x75834E4: tevent_get_poll_private (events.c:326) ==18185== by 0x75834E4: s3_event_loop_once (events.c:297) ==18185== by 0x8B5EA9C: _tevent_loop_once (tevent.c:680) ==18185== by 0x8B5FE02: tevent_req_poll (tevent_req.c:264) ==18185== by 0x1140D2: cleanupd_init (server.c:626) ==18185== by 0x10E6A6: main (server.c:1860)
Is it possible that the absence of 110f9258ddf995a334280c42b98e6f15d4d947d8 in Samba 4.4 is the issue here?
(In reply to Andreas Schneider from comment #4) The problem is that source3/lib/events.c uses #include "lib/tevent/tevent_internal.h", which doesn't belong to the used tevent version. The fix will be to remove the usage of s3_tevent_context_init, run_events_poll, event_add_to_poll_args and dump_event_list.
Created attachment 12510 [details] Patches for v4-5-test
Created attachment 12511 [details] Patches for v4-4-test
Created attachment 12512 [details] Patches for v4-3-test
As this breaks Samba on several distributions because libtevent was updated, I would argue that we need a new Samba release ASAP!
Yep. That together with 12045 is not nice.
I've to say that the samba 4.5.0 with the patches you posted for this and for the bug #12045 no longer crashes with libtevent 0.9.30. So I think you can push for an official update to 4.5.1 (and 4.4.7).
There is still one problem remained, probably not related: when samba starts and there is no printing cups and cups-browsed installed and processes running, the smbd eats 100% CPU. Probably the problem could be related to the call of smbd_reinit_after_fork(msg_ctx, ev, true, "lpqd"); in source3/printing/queue_process.c, but I can't be more precise.
Reported in Debian too: https://bugs.debian.org/840382 (and https://bugs.debian.org/840298)
Pushed to autobuild-v4-{5,4,3}-test. (There will be a 4.3 bugfix release including these fixes.) Figuring out appropriate release dates.
(In reply to Karolin Seeger from comment #14) Pushed to all branches. Closing out bug report. Thanks!