Bug 5572 - winbind freezes after request when timeout is received
Summary: winbind freezes after request when timeout is received
Status: NEW
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: winbind (show other bugs)
Version: 3.0.30
Hardware: x86 Linux
: P3 major
Target Milestone: none
Assignee: Samba Bugzilla Account
QA Contact: Samba QA Contact
Depends on:
Reported: 2008-06-30 11:23 UTC by mark.cave-ayland (dead mail address)
Modified: 2008-07-03 07:41 UTC (History)
0 users

See Also:

smb.conf file for the server (2.41 KB, text/plain)
2008-06-30 11:28 UTC, mark.cave-ayland (dead mail address)
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description mark.cave-ayland (dead mail address) 2008-06-30 11:23:46 UTC
Hi there,

I am experiencing a problem with winbind freezing on 3.0.30 after a timeout is experienced from wbinfo. The scenario in this situation is that the local domain EU.COMPANY.LOCAL trusts a remote domain called ASIA.COMPANY.LOCAL which is connected via a VPN link. When "wbinfo -u" is issued after winbind is started, it contacts the ASIA.COMPANY.LOCAL domain and starts downloading the user list which takes about 10 mins in total.

After about 60s, the "wbinfo -u" process times out while winbind continues to download the users in the background. Unfortunately subsequent attempts to connect to winbind using wbinfo, including after waiting for the user download in the background to complete, result in wbinfo being unable to contact winbind at all. At this point, even a "wbinfo -p" fails and times out.

The only way I can get wbinfo to work again is to kill all winbind processes using kill -9; a standard kill signal is not enough to terminate the server process.

When winbind is in this state, there are 2 active winbind processes shown below:

uk01:~# ps -ef | grep winbin
root      6605     1  0 16:24 ?        00:00:01 /usr/local/samba/sbin/winbindd
root      6606  6605  0 16:24 ?        00:00:00 /usr/local/samba/sbin/winbindd

And here are the relevant backtraces from gdb:

uk01:~# gdb -p 6605
(gdb) bt
#0  0xb7fe5792 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0xb7e221fe in pthread_exit () from /lib/tls/i686/cmov/libc.so.6
#2  0xb74dc04d in sys_gethostbyname () from /lib/libnss_wins.so.2
#3  0xb74ef32f in interpret_addr () from /lib/libnss_wins.so.2
#4  0xb74ef41d in interpret_addr2 () from /lib/libnss_wins.so.2
#5  0xb74e1f73 in wins_srv_count () from /lib/libnss_wins.so.2
#6  0xb74e214b in wins_srv_tags () from /lib/libnss_wins.so.2
#7  0xb74cc45a in resolve_wins () from /lib/libnss_wins.so.2
#8  0xb7485e7b in _nss_wins_gethostbyname_r () from /lib/libnss_wins.so.2
#9  0xb7e2ae9b in gethostbyname_r () from /lib/tls/i686/cmov/libc.so.6
#10 0xb7e2a7ee in gethostbyname () from /lib/tls/i686/cmov/libc.so.6
#11 0x081145d2 in sys_gethostbyname (name=0xbff808a0 "ROOTDC02") at lib/system.c:701
#12 0x081940f4 in resolve_hosts (name=0xbff808a0 "ROOTDC02", name_type=32, return_iplist=0xbff806dc, return_count=0xbff806d8) at libsmb/namequery.c:1031
#13 0x08194ad8 in internal_resolve_name (name=0xbff808a0 "ROOTDC02", name_type=32, sitename=0x83f57c8 "EUCOMPANY", return_iplist=0xbff806dc, return_count=0xbff806d8,
    resolve_order=0x83f54c0 "lmhosts wins host bcast") at libsmb/namequery.c:1222
#14 0x081950cb in resolve_name (name=0xbff808a0 "ROOTDC02", return_ip=0xbff8089c, name_type=32) at libsmb/namequery.c:1323
#15 0x080a3cff in get_dc_name_via_netlogon (domain=0x83acd88, dcname=0xbff808a0 "ROOTDC02", dc_ip=0xbff8089c) at nsswitch/winbindd_cm.c:581
#16 0x080a5c76 in get_dcs (mem_ctx=0x83f8290, domain=0x83acd88, dcs=0xbff80a00, num_dcs=0xbff809fc) at nsswitch/winbindd_cm.c:1168
#17 0x080a60a4 in find_new_dc (mem_ctx=0x83f8290, domain=0x83acd88, dcname=0x83ad0f8 "", addr=0x83ad1f8, fd=0xbff80b60) at nsswitch/winbindd_cm.c:1256
#18 0x080a6969 in cm_open_connection (domain=0x83acd88, new_conn=0x83ad214) at nsswitch/winbindd_cm.c:1412
#19 0x080a6eee in init_dc_connection_network (domain=0x83acd88) at nsswitch/winbindd_cm.c:1562
#20 0x080a6f6b in init_dc_connection (domain=0x83acd88) at nsswitch/winbindd_cm.c:1578
#21 0x0808d0d9 in find_domain_from_name (domain_name=0x83a9e08 "company.local") at nsswitch/winbindd_util.c:583
#22 0x0808d255 in find_root_domain () at nsswitch/winbindd_util.c:648
#23 0x080b7fe1 in lookupname_recv (mem_ctx=0x83f7460, success=1, response=0x83d2594, c=0x808af06, private_data=0x834ea18) at nsswitch/winbindd_async.c:846
#24 0x080b58f9 in do_async_recv (private_data=0x83d1d68, success=1) at nsswitch/winbindd_async.c:57
#25 0x080b32f5 in async_reply_recv (private_data=0x834eeb0, success=1) at nsswitch/winbindd_dual.c:279
#26 0x08081f64 in rw_callback (event=0x83aa4f0, flags=1) at nsswitch/winbindd.c:405
#27 0x080830e0 in process_loop () at nsswitch/winbindd.c:861
#28 0x08083d1a in main (argc=1, argv=0xbff81cf4, envp=0xbff81cfc) at nsswitch/winbindd.c:1121

uk01:~# gdb -p 6606
(gdb) bt
#0  0xb7fe5792 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0xb7e0e0fd in select () from /lib/tls/i686/cmov/libc.so.6
#2  0x08132e5c in sys_select (maxfd=14, readfds=0xbff7fe6c, writefds=0x0, errorfds=0x0, tval=0x0) at lib/select.c:93
#3  0x080b555f in fork_domain_child (child=0x83aa0e0) at nsswitch/winbindd_dual.c:1054
#4  0x080b3349 in schedule_async_request (child=0x83aa0e0) at nsswitch/winbindd_dual.c:296
#5  0x080b2b59 in async_request (mem_ctx=0x83a9358, child=0x83aa0e0, request=0x83acdb8, response=0x83ad618, continuation=0x808c9a6 <init_child_recv>,
    private_data=0x83ae2f8) at nsswitch/winbindd_dual.c:137
#6  0x0808c787 in init_child_connection (domain=0x83a9c08, continuation=0x80b3633 <domain_init_recv>, private_data=0x831d298) at nsswitch/winbindd_util.c:374
#7  0x080b3512 in async_domain_request (mem_ctx=0x83ab7c8, domain=0x83a9c08, request=0x83ab878, response=0x83ac0d8, continuation=0x808c059 <trustdom_recv>,
    private_data_data=0x831d350) at nsswitch/winbindd_dual.c:358
#8  0x0808c053 in add_trusted_domains (domain=0x83a9c08) at nsswitch/winbindd_util.c:218
#9  0x0808c4ad in rescan_trusted_domains () at nsswitch/winbindd_util.c:312
#10 0x08082dcd in process_loop () at nsswitch/winbindd.c:779
#11 0x08083d1a in main (argc=1, argv=0xbff81cf4, envp=0xbff81cfc) at nsswitch/winbindd.c:1121

I've attached the smb.conf configuration file to this bug report for reference.

On a separate note, the reason that the "ldap timeout" parameter is so high is because it became apparent during debugging that while the documentation indicates that it is a connection timeout, it is also used in ldap_search_with_timeout() to control the timeout used to download the remote user list. Since the remote user list from asia.company.local cannot be downloaded within the default value of 15s, ldap_search_with_timeout() would return after 15s before the complete user list could be downloaded. This then caused winbind to get stuck in a loop where it would contact asia.company.local, download the first 15s worth of users over LDAP, timeout and then repeat infinitely. I wasn't sure if this was a documentation bug as opposed to an implementation bug, but it seemed important enough to be worth mentioning.


Comment 1 mark.cave-ayland (dead mail address) 2008-06-30 11:28:14 UTC
Created attachment 3376 [details]
smb.conf file for the server
Comment 2 mark.cave-ayland (dead mail address) 2008-07-03 07:41:48 UTC
Okay, I've determined what the issue is here. There is no WINS server on this Windows network, and someone had helpfully added the following line to /etc/nsswitch.conf:

hosts: files dns wins

Apparently if there is no WINS server defined in smb.conf (or WINS is unavailable on the PDC) then winbind hangs indefinitely. I've also verified the LDAP bug which I'll re-report separately.