Bug 12283 - REGRESSION: smbd segfaults on startup, tevent context being freed
Summary: REGRESSION: smbd segfaults on startup, tevent context being freed
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.5.0
Hardware: All All
: P5 regression (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-09-21 07:54 UTC by Andreas Schneider
Modified: 2016-10-25 07:43 UTC (History)
7 users (show)

See Also:


Attachments
Patches for v4-5-test (46.23 KB, patch)
2016-09-24 21:56 UTC, Stefan Metzmacher
vl: review+
jra: review+
Details
Patches for v4-4-test (46.21 KB, text/plain)
2016-09-24 21:57 UTC, Stefan Metzmacher
vl: review+
jra: review+
Details
Patches for v4-3-test (46.22 KB, text/plain)
2016-09-24 21:57 UTC, Stefan Metzmacher
vl: review+
jra: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Schneider 2016-09-21 07:54:55 UTC
The segfaults happens in cleanupd_init() which calls tevent_req_poll() and when we try to derefence the event context we segfault.

Looking at the two backtraces at https://bugzilla.redhat.com/show_bug.cgi?id=1375973 the segfaults indicate that the event contest is freed in between.
Comment 1 Giuseppe Ghibò 2016-09-22 12:31:40 UTC
The same problem occurs in Mageia 6 development distro, whose bug report is this one:

https://bugs.mageia.org/show_bug.cgi?id=19356

The problem occurs with both samba 4.4.5 as well as 4.5.0 with latest libtevent 0.9.30; downgrading to libtevent 0.9.29 doesn't cause samba to crash, so probably the problems is due to the latest changes (from a quick diff I spotted a massive adding of multithreaded code). Can some libtevent expert have a look?
Comment 2 Andreas Schneider 2016-09-22 14:03:20 UTC
We need a valgrind log with debug symbols installed.

valgrind --tool=memcheck -v --num-callers=20 --track-origins=yes --log-file=smbd-valgrind.log /usr/sbin/smbd

I already got one without debug symbols which at least detects the error, but we need one with debug symbols to get line numbers!
Comment 3 Andreas Schneider 2016-09-22 18:21:30 UTC
The Red Hat bug contains a valgrind log:

==18185== Invalid read of size 8
==18185==    at 0x8B63168: tevent_timeval_is_zero (tevent_timed.c:107)
==18185==    by 0x8B63441: tevent_common_loop_timer_delay (tevent_timed.c:304)
==18185==    by 0x7583298: run_events_poll (events.c:199)
==18185==    by 0x7583436: s3_event_loop_once (events.c:303)
==18185==    by 0x8B5EA9C: _tevent_loop_once (tevent.c:680)
==18185==    by 0x8B5FE02: tevent_req_poll (tevent_req.c:264)
==18185==    by 0x1140D2: cleanupd_init (server.c:626)
==18185==    by 0x10E6A6: main (server.c:1860)
==18185==  Address 0x1780ce70 is 16 bytes after a block of size 112 alloc'd
==18185==    at 0x4C2DB8D: malloc (vg_replace_malloc.c:299)
==18185==    by 0x894CAFD: __talloc_with_prefix (talloc.c:675)
==18185==    by 0x894CAFD: __talloc (talloc.c:716)
==18185==    by 0x894CAFD: _talloc_named_const (talloc.c:873)
==18185==    by 0x894CAFD: _talloc_zero (talloc.c:2318)
==18185==    by 0x7582DC9: tevent_get_poll_private.part.1 (events.c:45)
==18185==    by 0x75834E4: tevent_get_poll_private (events.c:326)
==18185==    by 0x75834E4: s3_event_loop_once (events.c:297)
==18185==    by 0x8B5EA9C: _tevent_loop_once (tevent.c:680)
==18185==    by 0x8B5FE02: tevent_req_poll (tevent_req.c:264)
==18185==    by 0x1140D2: cleanupd_init (server.c:626)
==18185==    by 0x10E6A6: main (server.c:1860)
Comment 4 Andreas Schneider 2016-09-23 05:46:32 UTC
Is it possible that the absence of 110f9258ddf995a334280c42b98e6f15d4d947d8 in Samba 4.4 is the issue here?
Comment 5 Stefan Metzmacher 2016-09-23 05:59:05 UTC
(In reply to Andreas Schneider from comment #4)

The problem is that source3/lib/events.c uses
#include "lib/tevent/tevent_internal.h", which doesn't belong
to the used tevent version.

The fix will be to remove the usage of
s3_tevent_context_init, run_events_poll, event_add_to_poll_args
and dump_event_list.
Comment 6 Stefan Metzmacher 2016-09-24 21:56:47 UTC
Created attachment 12510 [details]
Patches for v4-5-test
Comment 7 Stefan Metzmacher 2016-09-24 21:57:22 UTC
Created attachment 12511 [details]
Patches for v4-4-test
Comment 8 Stefan Metzmacher 2016-09-24 21:57:54 UTC
Created attachment 12512 [details]
Patches for v4-3-test
Comment 9 Andreas Schneider 2016-09-25 01:53:40 UTC
As this breaks Samba on several distributions because libtevent was updated, I would argue that we need a new Samba release ASAP!
Comment 10 Volker Lendecke 2016-09-25 01:56:12 UTC
Yep. That together with 12045 is not nice.
Comment 11 Giuseppe Ghibò 2016-09-27 15:39:24 UTC
I've to say that the samba 4.5.0 with the patches you posted for this and for the bug #12045 no longer crashes with libtevent 0.9.30. So I think you can push for an official update to 4.5.1 (and 4.4.7).
Comment 12 Giuseppe Ghibò 2016-09-29 17:37:47 UTC
There is still one problem remained, probably not related: when samba starts and there is no printing cups and cups-browsed installed and processes running, the smbd eats 100% CPU. Probably the problem could be related to the call of

smbd_reinit_after_fork(msg_ctx, ev, true, "lpqd");

in source3/printing/queue_process.c, but I can't be more precise.
Comment 13 Mathieu Parent 2016-10-12 03:21:48 UTC
Reported in Debian too: https://bugs.debian.org/840382 (and https://bugs.debian.org/840298)
Comment 14 Karolin Seeger 2016-10-19 06:49:31 UTC
Pushed to autobuild-v4-{5,4,3}-test.
(There will be a 4.3 bugfix release including these fixes.)
Figuring out appropriate release dates.
Comment 15 Karolin Seeger 2016-10-25 07:43:05 UTC
(In reply to Karolin Seeger from comment #14)
Pushed to all branches.
Closing out bug report.

Thanks!