12283 – REGRESSION: smbd segfaults on startup, tevent context being freed

Bug 12283 - REGRESSION: smbd segfaults on startup, tevent context being freed

Summary: REGRESSION: smbd segfaults on startup, tevent context being freed

Status:	RESOLVED FIXED

Alias:	None

Product:	Samba 4.1 and newer
Classification:	Unclassified
Component:	File services (show other bugs)
Version:	4.5.0
Hardware:	All All

Importance:	P5 regression (vote)
Target Milestone:	---
Assignee:	Karolin Seeger
QA Contact:	Samba QA Contact

URL:
Keywords:

Depends on:
Blocks:

Reported:	2016-09-21 07:54 UTC by Andreas Schneider
Modified:	2016-10-25 07:43 UTC (History)
CC List:	7 users (show)

See Also:

Attachments
Patches for v4-5-test (46.23 KB, patch) 2016-09-24 21:56 UTC, Stefan Metzmacher	vl: review+ jra: review+	Details
Patches for v4-4-test (46.21 KB, text/plain) 2016-09-24 21:57 UTC, Stefan Metzmacher	vl: review+ jra: review+	Details
Patches for v4-3-test (46.22 KB, text/plain) 2016-09-24 21:57 UTC, Stefan Metzmacher	vl: review+ jra: review+	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andreas Schneider 2016-09-21 07:54:55 UTC

The segfaults happens in cleanupd_init() which calls tevent_req_poll() and when we try to derefence the event context we segfault.

Looking at the two backtraces at https://bugzilla.redhat.com/show_bug.cgi?id=1375973 the segfaults indicate that the event contest is freed in between.

Comment 1 Giuseppe Ghibò 2016-09-22 12:31:40 UTC

The same problem occurs in Mageia 6 development distro, whose bug report is this one:

https://bugs.mageia.org/show_bug.cgi?id=19356

The problem occurs with both samba 4.4.5 as well as 4.5.0 with latest libtevent 0.9.30; downgrading to libtevent 0.9.29 doesn't cause samba to crash, so probably the problems is due to the latest changes (from a quick diff I spotted a massive adding of multithreaded code). Can some libtevent expert have a look?

Comment 2 Andreas Schneider 2016-09-22 14:03:20 UTC

We need a valgrind log with debug symbols installed.

valgrind --tool=memcheck -v --num-callers=20 --track-origins=yes --log-file=smbd-valgrind.log /usr/sbin/smbd

I already got one without debug symbols which at least detects the error, but we need one with debug symbols to get line numbers!

Comment 3 Andreas Schneider 2016-09-22 18:21:30 UTC

The Red Hat bug contains a valgrind log:

==18185== Invalid read of size 8
==18185==    at 0x8B63168: tevent_timeval_is_zero (tevent_timed.c:107)
==18185==    by 0x8B63441: tevent_common_loop_timer_delay (tevent_timed.c:304)
==18185==    by 0x7583298: run_events_poll (events.c:199)
==18185==    by 0x7583436: s3_event_loop_once (events.c:303)
==18185==    by 0x8B5EA9C: _tevent_loop_once (tevent.c:680)
==18185==    by 0x8B5FE02: tevent_req_poll (tevent_req.c:264)
==18185==    by 0x1140D2: cleanupd_init (server.c:626)
==18185==    by 0x10E6A6: main (server.c:1860)
==18185==  Address 0x1780ce70 is 16 bytes after a block of size 112 alloc'd
==18185==    at 0x4C2DB8D: malloc (vg_replace_malloc.c:299)
==18185==    by 0x894CAFD: __talloc_with_prefix (talloc.c:675)
==18185==    by 0x894CAFD: __talloc (talloc.c:716)
==18185==    by 0x894CAFD: _talloc_named_const (talloc.c:873)
==18185==    by 0x894CAFD: _talloc_zero (talloc.c:2318)
==18185==    by 0x7582DC9: tevent_get_poll_private.part.1 (events.c:45)
==18185==    by 0x75834E4: tevent_get_poll_private (events.c:326)
==18185==    by 0x75834E4: s3_event_loop_once (events.c:297)
==18185==    by 0x8B5EA9C: _tevent_loop_once (tevent.c:680)
==18185==    by 0x8B5FE02: tevent_req_poll (tevent_req.c:264)
==18185==    by 0x1140D2: cleanupd_init (server.c:626)
==18185==    by 0x10E6A6: main (server.c:1860)

Comment 4 Andreas Schneider 2016-09-23 05:46:32 UTC

Is it possible that the absence of 110f9258ddf995a334280c42b98e6f15d4d947d8 in Samba 4.4 is the issue here?

Comment 5 Stefan Metzmacher 2016-09-23 05:59:05 UTC

(In reply to Andreas Schneider from comment #4)

The problem is that source3/lib/events.c uses
#include "lib/tevent/tevent_internal.h", which doesn't belong
to the used tevent version.

The fix will be to remove the usage of
s3_tevent_context_init, run_events_poll, event_add_to_poll_args
and dump_event_list.

Comment 6 Stefan Metzmacher 2016-09-24 21:56:47 UTC

Created attachment 12510 [details]
Patches for v4-5-test

Comment 7 Stefan Metzmacher 2016-09-24 21:57:22 UTC

Created attachment 12511 [details]
Patches for v4-4-test

Comment 8 Stefan Metzmacher 2016-09-24 21:57:54 UTC

Created attachment 12512 [details]
Patches for v4-3-test

Comment 9 Andreas Schneider 2016-09-25 01:53:40 UTC

As this breaks Samba on several distributions because libtevent was updated, I would argue that we need a new Samba release ASAP!

Comment 10 Volker Lendecke 2016-09-25 01:56:12 UTC

Yep. That together with 12045 is not nice.

Comment 11 Giuseppe Ghibò 2016-09-27 15:39:24 UTC

I've to say that the samba 4.5.0 with the patches you posted for this and for the bug #12045 no longer crashes with libtevent 0.9.30. So I think you can push for an official update to 4.5.1 (and 4.4.7).

Comment 12 Giuseppe Ghibò 2016-09-29 17:37:47 UTC

There is still one problem remained, probably not related: when samba starts and there is no printing cups and cups-browsed installed and processes running, the smbd eats 100% CPU. Probably the problem could be related to the call of

smbd_reinit_after_fork(msg_ctx, ev, true, "lpqd");

in source3/printing/queue_process.c, but I can't be more precise.

Comment 13 Mathieu Parent 2016-10-12 03:21:48 UTC

Reported in Debian too: https://bugs.debian.org/840382 (and https://bugs.debian.org/840298)

Comment 14 Karolin Seeger 2016-10-19 06:49:31 UTC

Pushed to autobuild-v4-{5,4,3}-test.
(There will be a 4.3 bugfix release including these fixes.)
Figuring out appropriate release dates.

Comment 15 Karolin Seeger 2016-10-25 07:43:05 UTC

(In reply to Karolin Seeger from comment #14)
Pushed to all branches.
Closing out bug report.

Thanks!