At some point while running a test from multiple users who log in, copy some files, etc., winbind stops responding. We see the one winbind deamon stuck in sigchld_hanlder: (gdb) bt #0 0x420b4769 in wait4 () from /lib/i686/libc.so.6 #1 0x4213030c in __DTOR_END__ () from /lib/i686/libc.so.6 #2 0x0806cda9 in sigchld_handler () #3 <signal handler called> #4 0x420daca2 in read () from /lib/i686/libc.so.6 #5 0x00000020 in ?? () #6 0x0806d20a in winbind_client_read () #7 0x0806d99b in process_loop () #8 0x0806e03c in main () #9 0x42017499 in __libc_start_main () from /lib/i686/libc.so.6 (gdb) the other stuck in read: gdb) bt #0 0x420daca4 in read () from /lib/i686/libc.so.6 #1 0x00000009 in ?? () #2 0x0806d20a in winbind_client_read () #3 0x08080723 in do_dual_daemon () #4 0x0806dfa6 in main () #5 0x42017499 in __libc_start_main () from /lib/i686/libc.so.6 strace of a wbinfo at this state shows connect to the pipe is unsuccessful: connect(3, {sin_family=AF_UNIX, path="/tmp/.winbindd/pipe"}, 110) = -1 EAGAIN (Resource temporarily unavailable) rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 nanosleep({2, 0}, {2, 0}) = 0 connect(3, {sin_family=AF_UNIX, path="/tmp/.winbindd/pipe"}, 110) = -1 EAGAIN (Resource temporarily unavailable) rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 nanosleep({2, 0}, <unfinished ...> Restart of winbind returns everything to working order. Environment is: w2k DC, with hundreds of shares (probably not relevant to the bug), and several users which belong to hundreds of groups each. The failure happens after test was running for about 30 minutes, but winbind was running already for ~3 days at the time. Any information we should collect for next time it happens ? Attaching winbind log anyhow.
Created attachment 748 [details] winbindd log before the problem
please retest against 3.0.11. Lots of work done since 3.0.6. Thanks.
sorry for the same, cleaning up the database to prevent unecessary reopens of bugs.