Sep 5 04:56:16 valencay winbindd[5300]: [2008/09/05 04:56:16, 0] lib/fault.c:fault_report(41) Sep 5 04:56:16 valencay winbindd[5300]: =============================================================== Sep 5 04:56:16 valencay winbindd[5300]: [2008/09/05 04:56:16, 0] lib/fault.c:fault_report(42) Sep 5 04:56:16 valencay winbindd[5300]: INTERNAL ERROR: Signal 6 in pid 5300 (3.0.28-1.el5_2.1) Sep 5 04:56:16 valencay winbindd[5300]: Please read the Trouble-Shooting section of the Samba3-HOWTO Sep 5 04:56:16 valencay winbindd[5300]: [2008/09/05 04:56:16, 0] lib/fault.c:fault_report(44) Sep 5 04:56:16 valencay winbindd[5300]: Sep 5 04:56:16 valencay winbindd[5300]: From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf Sep 5 04:56:16 valencay winbindd[5300]: [2008/09/05 04:56:16, 0] lib/fault.c:fault_report(45) Sep 5 04:56:16 valencay winbindd[5300]: =============================================================== Sep 5 04:56:16 valencay winbindd[5300]: [2008/09/05 04:56:16, 0] lib/util.c:smb_panic(1655) Sep 5 04:56:16 valencay winbindd[5300]: PANIC (pid 5300): internal error Sep 5 04:56:16 valencay winbindd[5300]: [2008/09/05 04:56:16, 0] lib/util.c:log_stack_trace(1759) Sep 5 04:56:16 valencay winbindd[5300]: BACKTRACE: 15 stack frames: Sep 5 04:56:16 valencay winbindd[5300]: #0 winbindd(log_stack_trace+0x2d) [0x683bad] Sep 5 04:56:16 valencay winbindd[5300]: #1 winbindd(smb_panic+0x5d) [0x683cdd] Sep 5 04:56:16 valencay winbindd[5300]: #2 winbindd [0x66e86a] Sep 5 04:56:16 valencay winbindd[5300]: #3 [0x110420] Sep 5 04:56:16 valencay winbindd[5300]: #4 [0x110402] Sep 5 04:56:16 valencay winbindd[5300]: #5 /lib/libc.so.6(gsignal+0x50) [0xa9fd10] Sep 5 04:56:16 valencay winbindd[5300]: #6 /lib/libc.so.6(abort+0x101) [0xaa1621] Sep 5 04:56:16 valencay winbindd[5300]: #7 winbindd(talloc_get_name+0x3e) [0x66715e] Sep 5 04:56:16 valencay winbindd[5300]: #8 winbindd(talloc_check_name+0x2c) [0x6673cc] Sep 5 04:56:16 valencay winbindd[5300]: #9 winbindd(talloc_check_name_abort+0x2c) [0x683e6c] Sep 5 04:56:16 valencay winbindd[5300]: #10 winbindd [0x61a5d8] Sep 5 04:56:16 valencay winbindd[5300]: #11 winbindd [0x5eb508] Sep 5 04:56:16 valencay winbindd[5300]: #12 winbindd(main+0x94d) [0x5ebf5d] Sep 5 04:56:16 valencay winbindd[5300]: #13 /lib/libc.so.6(__libc_start_main+0xdc) [0xa8cdec] Sep 5 04:56:16 valencay winbindd[5300]: #14 winbindd [0x5ea201] Sep 5 04:56:16 valencay winbindd[5300]: [2008/09/05 04:56:16, 0] lib/fault.c:dump_core(181) Sep 5 04:56:16 valencay winbindd[5300]: dumping core in /var/log/samba/cores/winbindd Sep 5 04:56:16 valencay winbindd[5300]: Sep 5 04:56:16 valencay winbindd[8934]: [2008/09/05 04:56:16, 0] lib/util_sock.c:read_data(536) Sep 5 04:56:16 valencay winbindd[8934]: read_data: read failure for 2088. Error = Connection reset by peer Sep 5 04:56:16 valencay winbindd[4967]: [2008/09/05 04:56:16, 0] lib/fault.c:fault_report(41) Sep 5 04:56:16 valencay winbindd[4967]: ===============================================================
Winbind works fine during 1 month and this night the process died. There is bellow the error message in /var/log/messages. I'm using Redhat Enterprise 5.1 on a DELL Poweredge 1950. The samba RPM package is 3.0.28-1.el5_2.1 and was installed RHN site. The samba server connected to AD domain ( windows 2003 server ) and we have 290 users which connect to it.
Can you try the patch from http://git.samba.org/?p=samba.git;a=commitdiff;h=c93d4296945 Thanks, Volker
In our setup too we are facing this issue. We are using Samba-3.0.28. We did apply the patch given comment #2, http://git.samba.org/?p=samba.git;a=commitdiff;h=c93d4296945 but still we are seeing the issue. We are seeing this issue when touch 200+ simultaneous connections. Copying the log file entries below. winbindd: Exceeding 200 client connections, no idle connection found [2008/09/13 17:25:10, 0] nsswitch/winbindd.c:process_loop(916) winbindd: Exceeding 200 client connections, no idle connection found [2008/09/13 17:25:10, 0] lib/fault.c:fault_report(41) =============================================================== [2008/09/13 17:25:10, 0] lib/fault.c:fault_report(42) INTERNAL ERROR: Signal 6 in pid 25570 (3.0.28-0-1902-SUSE-CODE10) Please read the Trouble-Shooting section of the Samba3-HOWTO [2008/09/13 17:25:10, 0] lib/fault.c:fault_report(44) From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf [2008/09/13 17:25:10, 0] lib/fault.c:fault_report(45) =============================================================== [2008/09/13 17:25:10, 0] lib/util.c:smb_panic(1633) PANIC (pid 25570): internal error [2008/09/13 17:25:10, 0] lib/util.c:log_stack_trace(1737) BACKTRACE: 13 stack frames: #0 /usr/sbin/winbindd(log_stack_trace+0x2d) [0x800d5a0d] #1 /usr/sbin/winbindd(smb_panic+0x5d) [0x800d5b3d] #2 /usr/sbin/winbindd [0x800c0b9a] #3 [0xffffe420] #4 /lib/libc.so.6(abort+0x103) [0xb7c91ff3] #5 /usr/sbin/winbindd(talloc_get_name+0x3e) [0x800b963e] #6 /usr/sbin/winbindd(talloc_check_name+0x2c) [0x800b98ac] #7 /usr/sbin/winbindd(talloc_check_name_abort+0x2c) [0x800d5ccc] #8 /usr/sbin/winbindd [0x8006f1d2] #9 /usr/sbin/winbindd [0x80040298] #10 /usr/sbin/winbindd(main+0x92d) [0x80040ccd] #11 /lib/libc.so.6(__libc_start_main+0xdc) [0xb7c7d8ac] #12 /usr/sbin/winbindd [0x8003ed41] [2008/09/13 17:25:10, 0] lib/fault.c:dump_core(181) dumping core in /var/log/samba/cores/winbindd Please help us in resolving the issue.
Did you manually patch you RPM? The backtrace still says 3.0.28-0-1902-SUSE-CODE10.
We are able to locate the root cause of the problem. Please correct us if we are wrong here. My analysis is as follows: In our setup, the winbindd connections are reaching 1000+ due to some entity not responding (or responding slowly), as this is a stress environment. (We are working on finding the culprit). Now in this case if we get one more request, we are forking a child (fork_domain_child() in schedule_async_request()), where we are calling socketpair(). This call fails with an error "Could not open child pipe: Too many open files". Thus we call the continuation function for all the requests present in the child->requests in the error path. This set the "state->finished" flag to "True" and comes out. Now in the next iteration of "process_loop()" we are freeing all the states but the "child->request" is not removed from the queue. Now if async_main_request_sent() is called, the "talloc_check_name_abort()" does "smb_panic()". Or in case of a new request, the addition to child->requests using DLIST_ADD_END() in "async_request()" is causing SIGSEGV. As a fix, can we call the async_request_fail() in this failure path of schedule_async_request() after calling fork_domain_child(), similar to Volker's patch in above comment ? Also let us know should we call this for all of the child->requests or only the current one is enough ?
Created attachment 3604 [details] The winbindd core dump seen in case of "fd" exhaustions The file descriptor are exhausted by the too many socket connections of winbindd (1000+). If a request times out now, the fork_domain_child() fails to create the socketpair() with error "Too many open files". Now a call to the continuation function doesn't free the child->requests. In the fix calling the async_request_fail() in the failure path of fork_domain_child() in schedule_async_request(). This call will free the "state" from child->request and fd_event.
please ignore the above patch.
Created attachment 3605 [details] fixing the core dump in the case of file descritor exhaustion
I will evaluate this for inclusion early next week - thanks ! Jeremy.
Created attachment 3619 [details] Patch I've applied. Your first patch was closer, but not quite correct. This is the patch I've applied to all branches. Please test ! Thanks, Jeremy.
Jeremy, Thanks for the reply. I tested your patch. It is working fine.
We are using 3.0.28 version of samba. Did you checkin the patch in this branch ? How do I get this patch ?
I checked this into all git trees, including 3_0. Talk to Jim McDonnough for details on getting this patch. Jeremy.