3.0.9 is configured as W2k3 domain member. Winbind and Kerberos are used for user validation against the AD. In the W2k3 realm 2 domain controllers exist; 192.168.100.100 and 192.168.100.101 SMB.CONF (Global Section) looks like this: # Global parameters [global] workgroup = NH-HOTELES realm = NH-HOTELES.COM server string = %h server (Samba %v) security = ADS password server = 192.168.100.100 192.168.100.101 ldap timeout = 3 log file = /var/log/samba/%m.log socket options = TCP_NODELAY SO_RCVBUF=8192 SO_SNDBUF=8192 printcap cache time = 600 domain master = No idmap uid = 10000-20000 idmap gid = 10000-20000 template homedir = /data/hom/%U template shell = /bin/bash printer admin = root, "@NH-HOTELES.COM\Domain Admins", @NH-HOTELES.COM\DEP_ADMIN_GERMANY oplocks = No level2 oplocks = No When contact with DC1 is lost (192.168.100.100); within 5 minutes "wbinfo -u" report "Error looking up domain users" and XP clients loose their shares with the samba server. In a level 10 log of winbind I was able to determine the problem: [2004/12/21 12:33:13, 3] nsswitch/winbindd_user.c:winbindd_list_users(592) [ 974]: list users [2004/12/21 12:33:13, 10] nsswitch/winbindd_cache.c:fetch_cache_seqnum(287) fetch_cache_seqnum: timeout [NH-HOTELES][18495042 @ 1103628090] [2004/12/21 12:33:13, 3] nsswitch/winbindd_ads.c:sequence_number(792) ads: fetch sequence_number for NH-HOTELES [2004/12/21 12:33:13, 7] nsswitch/winbindd_ads.c:ads_cached_connection(48) Current tickets expire at 1103663094, time is now 1103628793 [2004/12/21 12:49:21, 3] libads/ldap.c:ads_do_paged_search(477) ldap_search_ext_s((objectclass=*)) -> Can't contact LDAP server [2004/12/21 12:49:21, 3] libads/ldap_utils.c:ads_do_search_retry(66) Reopening ads connection to realm 'NH-HOTELES.COM' after error Can't contact LDAP server [2004/12/21 12:49:21, 6] libads/ldap.c:ads_find_dc(176) ads_find_dc: looking for realm 'NH-HOTELES.COM' [2004/12/21 12:49:21, 8] libsmb/namequery.c:get_sorted_dc_list(1434) get_sorted_dc_list: attempting lookup using [ads] [2004/12/21 12:49:21, 10] libsmb/conncache.c:check_negative_conn_cache(72) check_negative_conn_cache: cache entry expired for NH-HOTELES.COM, 192.168.100.100 [2004/12/21 12:49:21, 10] libsmb/namequery.c:remove_duplicate_addrs2(320) remove_duplicate_addrs2: looking for duplicate address/port pairs [2004/12/21 12:49:21, 4] libsmb/namequery.c:get_dc_list(1407) get_dc_list: returning 2 ip addresses in an ordered list [2004/12/21 12:49:21, 4] libsmb/namequery.c:get_dc_list(1408) get_dc_list: 192.168.100.100:389 192.168.100.101:389 [2004/12/21 12:49:21, 5] libads/ldap.c:ads_try_connect(85) ads_try_connect: trying ldap server '192.168.100.100' port 389 [2004/12/21 12:49:24, 10] libsmb/conncache.c:add_failed_connection_entry(132) add_failed_connection_entry: added domain NH-HOTELES.COM (192.168.100.100) to failed conn cache [2004/12/21 12:49:24, 5] libads/ldap.c:ads_try_connect(85) ads_try_connect: trying ldap server '192.168.100.101' port 389 [2004/12/21 12:49:24, 3] libads/ldap.c:ads_connect(247) Connected to LDAP server 192.168.100.101 This part is causing the problem according to me: [2004/12/21 12:33:13, 7] nsswitch/winbindd_ads.c:ads_cached_connection(48) Current tickets expire at 1103663094, time is now 1103628793 [2004/12/21 12:49:21, 3] libads/ldap.c:ads_do_paged_search(477) ldap_search_ext_s((objectclass=*)) -> Can't contact LDAP server [2004/12/21 12:49:21, 3] libads/ldap_utils.c:ads_do_search_retry(66) Reopening ads connection to realm 'NH-HOTELES.COM' after error Can't contact LDAP server It seems that a timeout of 16 minutes must be expired before winbind can determine that the primary LDAP server is down! Once this timeout has expired "wbinfo -u" gives the appropriate output. In a next message I'll upload a zipped level 10 log of the complete test I made.
Created attachment 853 [details] level 10 log winbind In attached ZIP you can find a log level 10 output of winbind of the subjoined test I performed: 12:05 startup winbindd 12:12 wbinfo -u; output Ok. 12:21 wbinfo -u; output Ok. 12:23 connection with DC1 (192.168.100.100) removed (removed UTP plug from NIC) 12:33 wbinfo -u; error looking up domain users 13:06 wbinfo -u; output Ok. 12:11 wbinfo -u; output Ok. 13:13 re-established connection with DC1 (192.168.100.100) Problem resides here: [2004/12/21 12:33:13, 7] nsswitch/winbindd_ads.c:ads_cached_connection(48) Current tickets expire at 1103663094, time is now 1103628793 [2004/12/21 12:49:21, 3] libads/ldap.c:ads_do_paged_search(477) ldap_search_ext_s((objectclass=*)) -> Can't contact LDAP server [2004/12/21 12:49:21, 3] libads/ldap_utils.c:ads_do_search_retry(66) Reopening ads connection to realm 'NH-HOTELES.COM' after error Can't contact LDAP server Can reproduce: [2004/12/20 17:22:44, 3] nsswitch/winbindd_user.c:winbindd_list_users(592) [14142]: list users [2004/12/20 17:22:44, 10] nsswitch/winbindd_cache.c:fetch_cache_seqnum(287) fetch_cache_seqnum: timeout [NH-HOTELES][18492523 @ 1103559273] [2004/12/20 17:22:44, 3] nsswitch/winbindd_ads.c:sequence_number(792) ads: fetch sequence_number for NH-HOTELES [2004/12/20 17:22:44, 7] nsswitch/winbindd_ads.c:ads_cached_connection(48) Current tickets expire at 1103595267, time is now 1103559764 [2004/12/20 17:38:29, 3] libads/ldap.c:ads_do_paged_search(477) ldap_search_ext_s((objectclass=*)) -> Can't contact LDAP server [2004/12/20 17:38:29, 3] libads/ldap_utils.c:ads_do_search_retry(66) Reopening ads connection to realm 'NH-HOTELES.COM' after error Can't contact LDAP server [2004/12/20 17:38:29, 6] libads/ldap.c:ads_find_dc(176) ads_find_dc: looking for realm 'NH-HOTELES.COM' [2004/12/20 17:38:29, 8] libsmb/namequery.c:get_sorted_dc_list(1434) get_sorted_dc_list: attempting lookup using [ads] [2004/12/20 17:38:29, 10] libsmb/conncache.c:check_negative_conn_cache(72) check_negative_conn_cache: cache entry expired for NH-HOTELES.COM, 192.168.100.100 [2004/12/20 17:38:29, 10] libsmb/namequery.c:remove_duplicate_addrs2(320) remove_duplicate_addrs2: looking for duplicate address/port pairs [2004/12/20 17:38:29, 4] libsmb/namequery.c:get_dc_list(1407) get_dc_list: returning 2 ip addresses in an ordered list [2004/12/20 17:38:29, 4] libsmb/namequery.c:get_dc_list(1408) get_dc_list: 192.168.100.100:389 192.168.100.101:389 [2004/12/20 17:38:29, 5] libads/ldap.c:ads_try_connect(85) ads_try_connect: trying ldap server '192.168.100.100' port 389 [2004/12/20 17:38:31, 10] libsmb/conncache.c:add_failed_connection_entry(132) add_failed_connection_entry: added domain NH-HOTELES.COM (192.168.100.100) to failed conn cache [2004/12/20 17:38:31, 5] libads/ldap.c:ads_try_connect(85) ads_try_connect: trying ldap server '192.168.100.101' port 389 [2004/12/20 17:38:31, 3] libads/ldap.c:ads_connect(247) Connected to LDAP server 192.168.100.101
DC failover didn't work because of LDAP timeouts in winbind. This is fixed with rev. 4655 of ads.h and ldap.c
sorry for the same, cleaning up the database to prevent unecessary reopens of bugs.