The Samba-Bugzilla – Bug 9879
Deadlock while shuting down smbd in main():smbd_parent_loop() -> pthread_kill()
Last modified: 2014-06-16 13:18:58 UTC
Created attachment 8880 [details]
gdb backtrace of a deadlock smbd process
samba4 (pdc, ad) on freebsd 9.1 (amd64) with zfs hangs once in a while with serious side effects.
thats the problem:
once in a while a smbd process takes 100% cpu usage and does not respond to any smbcontrol command. unfortunately it still holds all locks and inotified-kqueue watch events... when i kill the process ungracefully (-9) windows complains about locked files and/or the whole explorer task waits together with the smbd process, but none of them will fail/quit nicely. it will probably loop forever.
what have i done so far:
compiled samba4 with DEBUG
gdb attached the odd smbd process and investigated the backtrace/threads
what's my finding:
when samba is trying to end that smbd process nicely, it hangs while trying to execute pthread_kill(). (see details in attachment)
since the whole thing seems to be related to freebsd specific stuff (intotify-wrapper for kqueue and pthread) i'm done with my mostly (linux) knowledge...
any clue what's going on?
when i force a return of frame #1 of thread 1, then this thread will exit and at least the process is not 100% cpu anymore.
(gdb) thr 1
[Switching to thread 1 (Thread 811809800 (LWP 100993/smbd))]#0 0x0000000803b7a37c in kevent () from /lib/libc.so.7
(gdb) f 1
#1 0x0000000808eed50b in worker_thread () from /usr/local/lib/libinotify.so.0
Make selected stack frame return now? (y or n) y
#0 0x00000008036700a4 in pthread_getprio () from /lib/libthr.so.3
[Thread 811809800 (LWP 100993/smbd) exited]
[New Thread 811809800 (LWP 100993/smbd)]
Program received signal SIGINT, Interrupt.
[Switching to Thread 811807400 (LWP 100694/smbd)]
0x000000080367787c in pthread_kill () from /lib/libthr.so.3
The program is running. Quit anyway (and detach it)? (y or n) y
Detaching from program: /usr/local/sbin/smbd, process 50662
but "ps" shows me that the process is waiting for a userland conditional variable (???)
PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND
50662 root 1 20 0 541M 88236K ucond 1 18:25 0.00% smbd
all locked files (with smbstatus and lsof) are the same (except /var/log/samab4/... and /dev/random)
DOES ANYONE KNOW WHAT THAT STRANGE BEHAVIOUR COMES FROM?