Bug 7730 - INTERNAL ERROR: Signal 11 logging in when no DCs could be reached.
Summary: INTERNAL ERROR: Signal 11 logging in when no DCs could be reached.
Alias: None
Product: Samba 3.5
Classification: Unclassified
Component: Winbind (show other bugs)
Version: 3.5.6
Hardware: x64 Linux
: P3 normal
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
Depends on:
Reported: 2010-10-14 13:41 UTC by Andrew Tranquada
Modified: 2010-11-11 05:09 UTC (History)
1 user (show)

See Also:

Patch for 3.5 (767 bytes, patch)
2010-10-15 09:39 UTC, Volker Lendecke
metze: review+

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Tranquada 2010-10-14 13:41:56 UTC
In my development environment, I was experimenting with how quickly it takes winbind to notice when a domain controller cannot be reached, and how long it takes it to attempt to find and connect to another one, making sure 3.5.6 works properly. 
In doing so, I blocked via iptables connections to one of the DCs, and had to run an errand. I came back an hour later, logged into one of the test machines as a local user (ssh connection timed out) and winbind crashed. 

[2010/10/14 15:15:33.224948,  1] winbindd/winbindd_ads.c:126(ads_cached_connection)
  ads_connect for domain AWESOME failed: No logon servers
[2010/10/14 18:20:03.076780,  0] lib/fault.c:46(fault_report)
[2010/10/14 18:20:03.076935,  0] lib/fault.c:47(fault_report)
  INTERNAL ERROR: Signal 11 in pid 29111 (3.5.6)
  Please read the Trouble-Shooting section of the Samba3-HOWTO
[2010/10/14 18:20:03.077034,  0] lib/fault.c:49(fault_report)
  From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf
[2010/10/14 18:20:03.077194,  0] lib/fault.c:50(fault_report)
[2010/10/14 18:20:03.077259,  0] lib/util.c:1465(smb_panic)
  PANIC (pid 29111): internal error
[2010/10/14 18:20:03.158872,  0] lib/util.c:1569(log_stack_trace)
  BACKTRACE: 17 stack frames:
   #0 winbindd(log_stack_trace+0x1c) [0x2ac25c60102c]
   #1 winbindd(smb_panic+0x2b) [0x2ac25c6010fb]
   #2 winbindd [0x2ac25c5f108e]
   #3 /lib64/libc.so.6 [0x2ac25e9b52d0]
   #4 winbindd(winbindd_getdcname_recv+0xab) [0x2ac25c58421b]
   #5 winbindd [0x2ac25c53585c]
   #6 winbindd [0x2ac25c57da54]
   #7 winbindd [0x2ac25c560b86]
   #8 winbindd [0x2ac25c56049b]
   #9 winbindd [0x2ac25c58752d]
   #10 winbindd [0x2ac25c587ce1]
   #11 winbindd(run_events+0x182) [0x2ac25c6104c2]
   #12 winbindd [0x2ac25c610741]
   #13 winbindd(_tevent_loop_once+0x90) [0x2ac25c610b10]
   #14 winbindd(main+0x97e) [0x2ac25c536c0e]
   #15 /lib64/libc.so.6(__libc_start_main+0xf4) [0x2ac25e9a2994]
   #16 winbindd [0x2ac25c534609]
[2010/10/14 18:20:03.232683,  0] lib/fault.c:326(dump_core)
  dumping core in /var/log/samba/cores/winbindd

and in gdb a where shows:
(gdb) where
#0  0x00002ac25e9b5265 in raise () from /lib64/libc.so.6
#1  0x00002ac25e9b6d10 in abort () from /lib64/libc.so.6
#2  0x00002ac25c5f0b5d in dump_core () at lib/fault.c:337
#3  0x00002ac25c601139 in smb_panic (why=<value optimized out>) at lib/util.c:1481
#4  0x00002ac25c5f108e in fault_report (sig=1) at lib/fault.c:52
#5  sig_fault (sig=1) at lib/fault.c:75
#6  <signal handler called>
#7  0x00002ac25c58421b in winbindd_getdcname_recv (req=0x2ac25ce60a10, response=0x2ac25cea8930) at winbindd/winbindd_getdcname.c:86
#8  0x00002ac25c53585c in wb_request_done (req=0x2ac25ce60a10) at winbindd/winbindd.c:651
#9  0x00002ac25c57da54 in wb_dsgetdcname_done (subreq=0x2ac25cebd900) at winbindd/wb_dsgetdcname.c:100
#10 0x00002ac25c560b86 in wb_ndr_dispatch_done (subreq=0x2ac25ce5d490) at winbindd/winbindd_dual_ndr.c:135
#11 0x00002ac25c56049b in wb_child_request_done (subreq=0x2ac25ce5d550) at winbindd/winbindd_dual.c:170
#12 0x00002ac25c58752d in wb_simple_trans_read_done (subreq=0x2ac25ceced90) at ../nsswitch/libwbclient/wb_reqtrans.c:432
#13 0x00002ac25c587ce1 in wb_resp_read_done (subreq=0x2ac25ce82260) at ../nsswitch/libwbclient/wb_reqtrans.c:275
#14 0x00002ac25c6104c2 in run_events (ev=0x2ac25ce53330, selrtn=1, read_fds=0x7fffce6bdb20, write_fds=0x7fffce6bdaa0) at lib/events.c:148
#15 0x00002ac25c610741 in s3_event_loop_once (ev=0x2ac25ce53330, location=<value optimized out>) at lib/events.c:211
#16 0x00002ac25c610b10 in _tevent_loop_once (ev=0x2ac25ce53330, location=0x2ac25c97d8d7 "winbindd/winbindd.c:1275") at ../lib/tevent/tevent.c:497
#17 0x00002ac25c536c0e in main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at winbindd/winbindd.c:1275
Comment 1 Volker Lendecke 2010-10-15 09:39:05 UTC
Created attachment 6011 [details]
Patch for 3.5

Can you try the attached patch?


Comment 2 Andrew Tranquada 2010-10-15 13:44:45 UTC
sure thing, I have built it with the new patch. I will let you know what happens. 
Comment 3 Michael Adam 2010-10-16 04:08:02 UTC
Assigning to Volker...
Comment 4 Andrew Tranquada 2010-10-18 08:57:54 UTC
So far in all of my testing, I no longer have this issue. The behavior that I am seeing is exactly what I would expect. 
Setup: 2 server 2008 domain controllers, 2 linux servers RHEL 5.4, samba 3.5.6 w/the patch you supplied
wbinfo --getdcname shows it connected to one of the DCs.
Via iptables I do a DROP on all connections from that DC it shows it is connected to
After a few minutes (< 10) wbinfo --getdcname shows it connected to the other dc, id lookups are working great as expected, no hangs. 
winbind is not crashing when an attempt to login is made when it cannot contact the DC. 

Very awesome, patch seems to have fixed my issue completely!
Thank you!
Comment 5 Stefan Metzmacher 2010-10-18 10:48:17 UTC
Comment on attachment 6011 [details]
Patch for 3.5

Looks good.
Comment 6 Volker Lendecke 2010-10-18 10:50:47 UTC
Karolin, can you please merge this for 3.5.7?


Comment 7 Stefan Metzmacher 2010-10-18 10:53:30 UTC
BTW: v3-6-test has this as 0060b1ebac0960d95b5a24c7611d1f1568d29551 already
Comment 8 Volker Lendecke 2010-10-18 10:59:52 UTC
Yes, I did check that and also all the other rpccli_*_recv places in winbind 3.5.

Comment 9 Karolin Seeger 2010-11-11 05:09:59 UTC
Pushed to v3-5-test.
Closing out bug report.