Bug 9879 - Deadlock while shuting down smbd in main():smbd_parent_loop() -> pthread_kill()
Summary: Deadlock while shuting down smbd in main():smbd_parent_loop() -> pthread_kill()
Status: NEW
Alias: None
Product: Samba 4.0
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.0.4
Hardware: x64 FreeBSD
: P5 normal (vote)
Target Milestone: ---
Assignee: Samba QA Contact
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-13 10:48 UTC by sascha
Modified: 2014-06-16 13:18 UTC (History)
3 users (show)

See Also:


Attachments
gdb backtrace of a deadlock smbd process (29.64 KB, text/plain)
2013-05-13 10:48 UTC, sascha
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description sascha 2013-05-13 10:48:39 UTC
Created attachment 8880 [details]
gdb backtrace of a deadlock smbd process

the situation:
samba4 (pdc, ad) on freebsd 9.1 (amd64) with zfs hangs once in a while with serious side effects.

thats the problem:
once in a while a smbd process takes 100% cpu usage and does not respond to any smbcontrol command. unfortunately it still holds all locks and inotified-kqueue watch events... when i kill the process ungracefully (-9) windows complains about locked files and/or the whole explorer task waits together with the smbd process, but none of them will fail/quit nicely. it will probably loop forever.

what have i done so far:
compiled samba4 with DEBUG
gdb attached the odd smbd process and investigated the backtrace/threads

what's my finding:
when samba is trying to end that smbd process nicely, it hangs while trying to execute pthread_kill(). (see details in attachment)

since the whole thing seems to be related to freebsd specific stuff (intotify-wrapper for kqueue and pthread) i'm done with my mostly (linux) knowledge...

any clue what's going on?
Comment 1 sascha 2013-05-13 15:18:10 UTC
when i force a return of frame #1 of thread 1, then this thread will exit and at least the process is not 100% cpu anymore.

(gdb) thr 1
[Switching to thread 1 (Thread 811809800 (LWP 100993/smbd))]#0  0x0000000803b7a37c in kevent () from /lib/libc.so.7
(gdb) f 1
#1  0x0000000808eed50b in worker_thread () from /usr/local/lib/libinotify.so.0
(gdb) return
Make selected stack frame return now? (y or n) y
#0  0x00000008036700a4 in pthread_getprio () from /lib/libthr.so.3
(gdb) c
Continuing.
[Thread 811809800 (LWP 100993/smbd) exited]
[New Thread 811809800 (LWP 100993/smbd)]
^C
Program received signal SIGINT, Interrupt.
[Switching to Thread 811807400 (LWP 100694/smbd)]
0x000000080367787c in pthread_kill () from /lib/libthr.so.3
(gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program: /usr/local/sbin/smbd, process 50662

but "ps" shows me that the process is waiting for a userland conditional variable (???)

  PID USERNAME       THR PRI NICE   SIZE    RES STATE   C   TIME    CPU COMMAND
50662 root             1  20    0   541M 88236K ucond   1  18:25  0.00% smbd

all locked files (with smbstatus and lsof) are the same (except /var/log/samab4/... and /dev/random)

DOES ANYONE KNOW WHAT THAT STRANGE BEHAVIOUR COMES FROM?