Bug 14652 - core dumps when restarting smbd when using dbwrap_tdb_mutexes:* = yes
Summary: core dumps when restarting smbd when using dbwrap_tdb_mutexes:* = yes
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.13.4
Hardware: All FreeBSD
: P5 normal (vote)
Target Milestone: ---
Assignee: Samba QA Contact
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-03-01 21:53 UTC by Peter Eriksson
Modified: 2021-03-04 14:16 UTC (History)
0 users

See Also:


Attachments
gdb stack trace (8.17 KB, text/plain)
2021-03-01 21:53 UTC, Peter Eriksson
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Eriksson 2021-03-01 21:53:30 UTC
Created attachment 16481 [details]
gdb stack trace

When I restart smbd on some of our more busy servers (ca 1500 smbd processes), sometimes I get a number of core dumps from the terminating smbd:s.

Using gdb and doing a stack trace (I recompiled samba with the tdb library internally without optimization) I noticed that it was crashing in pthread_mutex_unlock() while doing some cleanups before exiting. 

(See attached file).

This sounds _very_ similar to:
   https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237195
so it might be a FreeBSD bug but who knows...

With "dbwrap_tdb_mutexes:* = no" no more core dumps have been seen and things
seem to work fine. Might be good to know for other FreeBSD users if they run into the same stuff.

(Since it was during restarts of samba it wasn't generally that big an issue, until it started filling up my /var/cores directory :-)
Comment 1 Volker Lendecke 2021-03-02 07:34:59 UTC
https://bugs.openldap.org/show_bug.cgi?id=9278 also is an issue with robust mutexes on FreeBSD. Just for reference, not sure there are any other similarities.
Comment 2 Volker Lendecke 2021-03-04 12:30:50 UTC
(In reply to Peter Eriksson from comment #0)
> When I restart smbd on some of our more busy servers (ca 1500 smbd
> processes), sometimes I get a number of core dumps from the terminating
> smbd:s.

As discussed in 14636: Is it possible that the new smbd parent already started while other smbds are still shutting down?
Comment 3 Peter Eriksson 2021-03-04 13:04:44 UTC
Hmm. Very possible. If not even guaranteed. I'll make some tests with the "wait-for-all-to-die-when-restarting"-fix in place and with dbwrap-enabled.
Comment 4 Peter Eriksson 2021-03-04 14:16:16 UTC
Ok, ran a short (10 minutes) tests of continuous restarts-with-wait-for-all-smbd-to-exit with 100-clients on two of my test servers with dbwrap-mutexes enabled and no core dumps. So looking good.