Created attachment 8880 [details] gdb backtrace of a deadlock smbd process the situation: samba4 (pdc, ad) on freebsd 9.1 (amd64) with zfs hangs once in a while with serious side effects. thats the problem: once in a while a smbd process takes 100% cpu usage and does not respond to any smbcontrol command. unfortunately it still holds all locks and inotified-kqueue watch events... when i kill the process ungracefully (-9) windows complains about locked files and/or the whole explorer task waits together with the smbd process, but none of them will fail/quit nicely. it will probably loop forever. what have i done so far: compiled samba4 with DEBUG gdb attached the odd smbd process and investigated the backtrace/threads what's my finding: when samba is trying to end that smbd process nicely, it hangs while trying to execute pthread_kill(). (see details in attachment) since the whole thing seems to be related to freebsd specific stuff (intotify-wrapper for kqueue and pthread) i'm done with my mostly (linux) knowledge... any clue what's going on?
when i force a return of frame #1 of thread 1, then this thread will exit and at least the process is not 100% cpu anymore. (gdb) thr 1 [Switching to thread 1 (Thread 811809800 (LWP 100993/smbd))]#0 0x0000000803b7a37c in kevent () from /lib/libc.so.7 (gdb) f 1 #1 0x0000000808eed50b in worker_thread () from /usr/local/lib/libinotify.so.0 (gdb) return Make selected stack frame return now? (y or n) y #0 0x00000008036700a4 in pthread_getprio () from /lib/libthr.so.3 (gdb) c Continuing. [Thread 811809800 (LWP 100993/smbd) exited] [New Thread 811809800 (LWP 100993/smbd)] ^C Program received signal SIGINT, Interrupt. [Switching to Thread 811807400 (LWP 100694/smbd)] 0x000000080367787c in pthread_kill () from /lib/libthr.so.3 (gdb) quit The program is running. Quit anyway (and detach it)? (y or n) y Detaching from program: /usr/local/sbin/smbd, process 50662 but "ps" shows me that the process is waiting for a userland conditional variable (???) PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND 50662 root 1 20 0 541M 88236K ucond 1 18:25 0.00% smbd all locked files (with smbstatus and lsof) are the same (except /var/log/samab4/... and /dev/random) DOES ANYONE KNOW WHAT THAT STRANGE BEHAVIOUR COMES FROM?