13014 – Panic in allrecord_mutex_lock on shutdown because it seem the mutexes shared segment has been unmapped

Bug 13014 - Panic in allrecord_mutex_lock on shutdown because it seem the mutexes shared segment has been unmapped

Summary: Panic in allrecord_mutex_lock on shutdown because it seem the mutexes shared ...

Status:	RESOLVED FIXED

Alias:	None

Product:	Samba 4.1 and newer
Classification:	Unclassified
Component:	Winbind (show other bugs)
Version:	4.5.13
Hardware:	All All

Importance:	P5 normal (vote)
Target Milestone:	---
Assignee:	Volker Lendecke
QA Contact:	Samba QA Contact

URL:
Keywords:

Depends on:
Blocks:

Reported:	2017-09-05 15:28 UTC by Richard Sharpe
Modified:	2019-07-31 12:39 UTC (History)
CC List:	5 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Richard Sharpe 2017-09-05 15:28:50 UTC

We have seen this twice now during tests. It seems to only hit when winbindd is being shutdown.

[2017/08/20 17:11:01.927595,  0] ../source3/winbindd/winbindd.c:279(winbindd_sig_term_handler)
  Got sig[15] terminate (is_parent=0)
[2017/08/20 17:11:01.936107,  0] ../lib/util/fault.c:78(fault_report)
  ===============================================================
[2017/08/20 17:11:01.936136,  0] ../lib/util/fault.c:79(fault_report)
  INTERNAL ERROR: Signal 7 in pid 18845 (4.5.11)
  Please read the Trouble-Shooting section of the Samba HOWTO
[2017/08/20 17:11:01.936151,  0] ../lib/util/fault.c:81(fault_report)
  ===============================================================
[2017/08/20 17:11:01.936162,  0] ../source3/lib/util.c:791(smb_panic_s3)
  PANIC (pid 18845): internal error
[2017/08/20 17:11:01.936479,  0] ../source3/lib/util.c:902(log_stack_trace)
  BACKTRACE: 25 stack frames:
   #0 /lib64/libsmbconf.so.0(log_stack_trace+0x1a) [0x7f18351eabba]
   #1 /lib64/libsmbconf.so.0(smb_panic_s3+0x20) [0x7f18351eac90]
   #2 /lib64/libsamba-util.so.0(smb_panic+0x2f) [0x7f183891b12f]

Here is the more complete stack trace:

gdb) where
#0  0x00007f7ff59c91d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f7ff59ca8c8 in __GI_abort () at abort.c:90
#2  0x00007f7ff8c0c63b in dump_core () at ../source3/lib/dumpcore.c:322
#3  0x00007f7ff8bfdcf7 in smb_panic_s3 (why=<optimized out>) at ../source3/lib/util.c:814
#4  0x00007f7ffc32e12f in smb_panic (why=why@entry=0x7f7ffc37b6ab "internal error") at ../lib/util/fault.c:166
#5  0x00007f7ffc32e346 in fault_report (sig=<optimized out>) at ../lib/util/fault.c:83
#6  sig_fault (sig=<optimized out>) at ../lib/util/fault.c:94
#7  <signal handler called>
#8  __pthread_mutex_trylock (mutex=0x7f7ffdfbc0a8) at pthread_mutex_trylock.c:33
#9  0x00007f7ffc1098d5 in allrecord_mutex_lock (m=0x7f7ffdfbc000, waitflag=<optimized out>) at ../lib/tdb/common/mutex.c:203
#10 0x00007f7ffc109dfc in tdb_mutex_allrecord_lock (tdb=tdb@entry=0x7f7fff7e1c20, ltype=ltype@entry=1,
    flags=flags@entry=TDB_LOCK_NOWAIT) at ../lib/tdb/common/mutex.c:374
#11 0x00007f7ffc102d8e in tdb_allrecord_lock (tdb=0x7f7fff7e1c20, ltype=1, flags=TDB_LOCK_NOWAIT, upgradable=<optimized out>)
    at ../lib/tdb/common/lock.c:646
#12 0x00007f7ffc102e7e in tdb_lockall_nonblock (tdb=<optimized out>) at ../lib/tdb/common/lock.c:771
#13 0x00007f7ff8c13b6f in gencache_stabilize () at ../source3/lib/gencache.c:667
#14 0x00007f7ffe1f06b9 in terminate (is_parent=<optimized out>) at ../source3/winbindd/winbindd.c:247
#15 0x00007f7ffe1f079a in winbindd_sig_term_handler (ev=<optimized out>, se=<optimized out>, signum=15, count=<optimized out>,
    siginfo=<optimized out>, private_data=<optimized out>) at ../source3/winbindd/winbindd.c:280
#16 0x00007f7ffb896ef7 in tevent_common_check_signal (ev=0x7f7fff7e18e0) at ../lib/tevent/tevent_signal.c:461
#17 0x00007f7ffb898bec in epoll_event_loop (tvalp=0x7fff021cebc0, epoll_ev=0x7f7fff7e0bb0) at ../lib/tevent/tevent_epoll.c:647
#18 epoll_event_loop_once (ev=<optimized out>, location=<optimized out>) at ../lib/tevent/tevent_epoll.c:926
#19 0x00007f7ffb897137 in std_event_loop_once (ev=0x7f7fff7e18e0,
    location=0x7f7ffe287c70 "../source3/winbindd/winbindd_dual.c:1592") at ../lib/tevent/tevent_standard.c:114
#20 0x00007f7ffb89338d in _tevent_loop_once (ev=0x7f7fff7e18e0,
    location=location@entry=0x7f7ffe287c70 "../source3/winbindd/winbindd_dual.c:1592") at ../lib/tevent/tevent.c:533
#21 0x00007f7ffe21b4f8 in fork_domain_child (child=0x7f7ffe4d2d00 <static_idmap_child>)
    at ../source3/winbindd/winbindd_dual.c:1592
#22 0x00007f7ffe21bbc5 in wb_child_request_trigger (req=0x7f7fff7f5540, private_data=<optimized out>)
    at ../source3/winbindd/winbindd_dual.c:173
#23 0x00007f7ffb893bb4 in tevent_common_loop_immediate (ev=ev@entry=0x7f7fff7e18e0) at ../lib/tevent/tevent_immediate.c:135
#24 0x00007f7ffb898a2e in epoll_event_loop_once (ev=0x7f7fff7e18e0, location=<optimized out>) at ../lib/tevent/tevent_epoll.c:907
#25 0x00007f7ffb897137 in std_event_loop_once (ev=0x7f7fff7e18e0, location=0x7f7ffe271c48 "../source3/winbindd/winbindd.c:1809")
    at ../lib/tevent/tevent_standard.c:114
#26 0x00007f7ffb89338d in _tevent_loop_once (ev=0x7f7fff7e18e0,
    location=location@entry=0x7f7ffe271c48 "../source3/winbindd/winbindd.c:1809") at ../lib/tevent/tevent.c:533
#27 0x00007f7ffe1e1cfc in main (argc=<optimized out>, argv=<optimized out>) at ../source3/winbindd/winbindd.c:1809
(gdb) frame 10
#10 0x00007f7ffc109dfc in tdb_mutex_allrecord_lock (tdb=tdb@entry=0x7f7fff7e1c20, ltype=ltype@entry=1,
    flags=flags@entry=TDB_LOCK_NOWAIT) at ../lib/tdb/common/mutex.c:374
374             ret = allrecord_mutex_lock(m, waitflag);
(gdb) p m
$1 = (struct tdb_mutexes *) 0x7f7ffdfbc000
(gdb) p *m
Cannot access memory at address 0x7f7ffdfbc000
(gdb) frame 11
#11 0x00007f7ffc102d8e in tdb_allrecord_lock (tdb=0x7f7fff7e1c20, ltype=1, flags=TDB_LOCK_NOWAIT, upgradable=<optimized out>)
    at ../lib/tdb/common/lock.c:646
646                     ret = tdb_mutex_allrecord_lock(tdb, ltype, flags);
(gdb) p tdb
$2 = (struct tdb_context *) 0x7f7fff7e1c20
(gdb) p *tdb
$3 = {name = 0x7f7fff7f1f40 "/var/lib/samba/lock/gencache_notrans.tdb", map_ptr = 0x7f7ffe1a8000, fd = 35, map_size = 12288,
  read_only = 0, traverse_read = 0, traverse_write = 0, allrecord_lock = {off = 0, count = 0, ltype = 0}, num_lockrecs = 0,
  lockrecs = 0x0, lockrecs_array_length = 0, hdr_ofs = 8192, mutexes = 0x7f7ffdfbc000, ecode = TDB_ERR_LOCK, hash_size = 131,
  feature_flags = 1, flags = 6721, travlocks = {next = 0x0, off = 0, hash = 0, lock_rw = 0}, next = 0x7f7fff7e30a0,
  device = 64770, inode = 33578901, log = {log_fn = 0x7f7ff36b8e80 <tdb_wrap_log>, log_private = 0x0},
  hash_fn = 0x7f7ffc107cc0 <tdb_jenkins_hash>, open_flags = 66, methods = 0x7f7ffc310a80 <io_methods>, transaction = 0x0,
  page_size = 4096, max_dead_records = 0, interrupt_sig_ptr = 0x0}

This suggests that the memory region containing the mutexes has been unmapped.

Comment 1 Stefan Metzmacher 2019-07-31 10:17:54 UTC

Volker, do you think current releases could still have such a bug?

Comment 2 Volker Lendecke 2019-07-31 10:57:02 UTC

(In reply to Stefan Metzmacher from comment #1)
> Volker, do you think current releases could still have such a bug?

Well, I have never seen this bug myself, and I see a lot of stuff with heavily used tdb files. Also, some versions did not play well with gencache and gencache_notrans messing each other up. As gencache has no _notrans anymore, any specific problem to that should be gone.