In my development environment, I was experimenting with how quickly it takes winbind to notice when a domain controller cannot be reached, and how long it takes it to attempt to find and connect to another one, making sure 3.5.6 works properly. In doing so, I blocked via iptables connections to one of the DCs, and had to run an errand. I came back an hour later, logged into one of the test machines as a local user (ssh connection timed out) and winbind crashed. [2010/10/14 15:15:33.224948, 1] winbindd/winbindd_ads.c:126(ads_cached_connection) ads_connect for domain AWESOME failed: No logon servers [2010/10/14 18:20:03.076780, 0] lib/fault.c:46(fault_report) =============================================================== [2010/10/14 18:20:03.076935, 0] lib/fault.c:47(fault_report) INTERNAL ERROR: Signal 11 in pid 29111 (3.5.6) Please read the Trouble-Shooting section of the Samba3-HOWTO [2010/10/14 18:20:03.077034, 0] lib/fault.c:49(fault_report) From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf [2010/10/14 18:20:03.077194, 0] lib/fault.c:50(fault_report) =============================================================== [2010/10/14 18:20:03.077259, 0] lib/util.c:1465(smb_panic) PANIC (pid 29111): internal error [2010/10/14 18:20:03.158872, 0] lib/util.c:1569(log_stack_trace) BACKTRACE: 17 stack frames: #0 winbindd(log_stack_trace+0x1c) [0x2ac25c60102c] #1 winbindd(smb_panic+0x2b) [0x2ac25c6010fb] #2 winbindd [0x2ac25c5f108e] #3 /lib64/libc.so.6 [0x2ac25e9b52d0] #4 winbindd(winbindd_getdcname_recv+0xab) [0x2ac25c58421b] #5 winbindd [0x2ac25c53585c] #6 winbindd [0x2ac25c57da54] #7 winbindd [0x2ac25c560b86] #8 winbindd [0x2ac25c56049b] #9 winbindd [0x2ac25c58752d] #10 winbindd [0x2ac25c587ce1] #11 winbindd(run_events+0x182) [0x2ac25c6104c2] #12 winbindd [0x2ac25c610741] #13 winbindd(_tevent_loop_once+0x90) [0x2ac25c610b10] #14 winbindd(main+0x97e) [0x2ac25c536c0e] #15 /lib64/libc.so.6(__libc_start_main+0xf4) [0x2ac25e9a2994] #16 winbindd [0x2ac25c534609] [2010/10/14 18:20:03.232683, 0] lib/fault.c:326(dump_core) dumping core in /var/log/samba/cores/winbindd and in gdb a where shows: (gdb) where #0 0x00002ac25e9b5265 in raise () from /lib64/libc.so.6 #1 0x00002ac25e9b6d10 in abort () from /lib64/libc.so.6 #2 0x00002ac25c5f0b5d in dump_core () at lib/fault.c:337 #3 0x00002ac25c601139 in smb_panic (why=<value optimized out>) at lib/util.c:1481 #4 0x00002ac25c5f108e in fault_report (sig=1) at lib/fault.c:52 #5 sig_fault (sig=1) at lib/fault.c:75 #6 <signal handler called> #7 0x00002ac25c58421b in winbindd_getdcname_recv (req=0x2ac25ce60a10, response=0x2ac25cea8930) at winbindd/winbindd_getdcname.c:86 #8 0x00002ac25c53585c in wb_request_done (req=0x2ac25ce60a10) at winbindd/winbindd.c:651 #9 0x00002ac25c57da54 in wb_dsgetdcname_done (subreq=0x2ac25cebd900) at winbindd/wb_dsgetdcname.c:100 #10 0x00002ac25c560b86 in wb_ndr_dispatch_done (subreq=0x2ac25ce5d490) at winbindd/winbindd_dual_ndr.c:135 #11 0x00002ac25c56049b in wb_child_request_done (subreq=0x2ac25ce5d550) at winbindd/winbindd_dual.c:170 #12 0x00002ac25c58752d in wb_simple_trans_read_done (subreq=0x2ac25ceced90) at ../nsswitch/libwbclient/wb_reqtrans.c:432 #13 0x00002ac25c587ce1 in wb_resp_read_done (subreq=0x2ac25ce82260) at ../nsswitch/libwbclient/wb_reqtrans.c:275 #14 0x00002ac25c6104c2 in run_events (ev=0x2ac25ce53330, selrtn=1, read_fds=0x7fffce6bdb20, write_fds=0x7fffce6bdaa0) at lib/events.c:148 #15 0x00002ac25c610741 in s3_event_loop_once (ev=0x2ac25ce53330, location=<value optimized out>) at lib/events.c:211 #16 0x00002ac25c610b10 in _tevent_loop_once (ev=0x2ac25ce53330, location=0x2ac25c97d8d7 "winbindd/winbindd.c:1275") at ../lib/tevent/tevent.c:497 #17 0x00002ac25c536c0e in main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at winbindd/winbindd.c:1275
Created attachment 6011 [details] Patch for 3.5 Can you try the attached patch? Thanks, Volker
sure thing, I have built it with the new patch. I will let you know what happens. Thanks!
Assigning to Volker...
So far in all of my testing, I no longer have this issue. The behavior that I am seeing is exactly what I would expect. Setup: 2 server 2008 domain controllers, 2 linux servers RHEL 5.4, samba 3.5.6 w/the patch you supplied wbinfo --getdcname shows it connected to one of the DCs. Via iptables I do a DROP on all connections from that DC it shows it is connected to After a few minutes (< 10) wbinfo --getdcname shows it connected to the other dc, id lookups are working great as expected, no hangs. winbind is not crashing when an attempt to login is made when it cannot contact the DC. Very awesome, patch seems to have fixed my issue completely! Thank you!
Comment on attachment 6011 [details] Patch for 3.5 Looks good.
Karolin, can you please merge this for 3.5.7? Thanks, Volker
BTW: v3-6-test has this as 0060b1ebac0960d95b5a24c7611d1f1568d29551 already
Yes, I did check that and also all the other rpccli_*_recv places in winbind 3.5. Volker
Pushed to v3-5-test. Closing out bug report. Thanks!