Bug 1982 - Winbind stops responding to requests from wbinfo and rest of system
Winbind stops responding to requests from wbinfo and rest of system
Status: CLOSED FIXED
Product: Samba 3.0
Classification: Unclassified
Component: winbind
3.0.6
All Linux
: P3 normal
: none
Assigned To: Samba Bugzilla Account
Samba QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2004-11-01 03:41 UTC by yuval yeret
Modified: 2005-08-24 10:23 UTC (History)
1 user (show)

See Also:


Attachments
winbindd log before the problem (14.29 KB, text/plain)
2004-11-01 03:44 UTC, yuval yeret
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description yuval yeret 2004-11-01 03:41:44 UTC
At some point while running a test from multiple users who log in, copy some
files, etc., winbind stops responding. 

We see the one winbind deamon stuck in sigchld_hanlder:
(gdb) bt
#0  0x420b4769 in wait4 () from /lib/i686/libc.so.6
#1  0x4213030c in __DTOR_END__ () from /lib/i686/libc.so.6
#2  0x0806cda9 in sigchld_handler ()
#3  <signal handler called>
#4  0x420daca2 in read () from /lib/i686/libc.so.6
#5  0x00000020 in ?? ()
#6  0x0806d20a in winbind_client_read ()
#7  0x0806d99b in process_loop ()
#8  0x0806e03c in main ()
#9  0x42017499 in __libc_start_main () from /lib/i686/libc.so.6
(gdb)


the other stuck in read:
gdb) bt
#0  0x420daca4 in read () from /lib/i686/libc.so.6
#1  0x00000009 in ?? ()
#2  0x0806d20a in winbind_client_read ()
#3  0x08080723 in do_dual_daemon ()
#4  0x0806dfa6 in main ()
#5  0x42017499 in __libc_start_main () from /lib/i686/libc.so.6

strace of a wbinfo at this state shows connect to the pipe is unsuccessful:

connect(3, {sin_family=AF_UNIX, path="/tmp/.winbindd/pipe"}, 110) = -1 EAGAIN 
(Resource temporarily unavailable)
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({2, 0}, {2, 0})               = 0
connect(3, {sin_family=AF_UNIX, path="/tmp/.winbindd/pipe"}, 110) = -1 EAGAIN 
(Resource temporarily unavailable)
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({2, 0},  <unfinished ...>

Restart of winbind returns everything to working order. 

Environment is: w2k DC, with hundreds of shares (probably not relevant to the
bug), and several users which belong to hundreds of groups each. 

The failure happens after test was running for about 30 minutes, but winbind was
running already for ~3 days at the time. 

Any information we should collect for next time it happens ? 
Attaching winbind log anyhow.
Comment 1 yuval yeret 2004-11-01 03:45:01 UTC
Created attachment 748 [details]
winbindd log before the problem
Comment 2 Gerald (Jerry) Carter 2005-02-11 08:38:43 UTC
please retest against 3.0.11.  Lots of work done since 3.0.6.  Thanks.
Comment 3 Gerald (Jerry) Carter 2005-08-24 10:23:49 UTC
sorry for the same, cleaning up the database to prevent unecessary reopens of bugs.