While doing some stress tests on todays winbind (Thu Sep 11 16:42:49 CEST 2003) i noticed two things that happen to winbind when the DC suddenly disappears ( e.g. reboot): my setup: suse linux 8.2, security = ads, one win2k-DC - winbindd segfaults on getpwnam-operations while the DC is currently not available: libads/ldap_utils.c:ads_do_search_retry(68) ads_search_retry: failed to reconnect (Transport endpoint is not connected) libads/ads_ldap.c:ads_name_to_sid(58) name_to_sid ads_search: Transport endpoint is not connected nsswitch/winbindd_cache.c:wcache_save_name_to_sid(602) wcache_save_name_to_sid: ADMINISTRATOR -> S-0-0 nsswitch/winbindd_user.c:winbindd_getpwnam(147) user 'administrator' does not exist ... nsswitch/winbindd.c:process_request(305) process_request: request fn GETPWNAM nsswitch/winbindd_user.c:winbindd_getpwnam(112) [ 2370]: getpwnam my.domain\administrator nsswitch/winbindd_cache.c:refresh_sequence_number(342) refresh_sequence_number: MYDOMAIN time ok nsswitch/winbindd_cache.c:refresh_sequence_number(367) refresh_sequence_number: MYDOMAIN seq number is now 18510 nsswitch/winbindd_cache.c:name_to_sid(958) name_to_sid: [Cached] - doing backend query for name for domain MYDOMAIN nsswitch/winbindd_ads.c:name_to_sid(312) ads: name_to_sid lib/fault.c:fault_report(36) =============================================================== lib/fault.c:fault_report(37) INTERNAL ERROR: Signal 11 in pid 6762 (CVS 3.0.1pre1-SuSE) Please read the appendix Bugs of the Samba HOWTO collection lib/fault.c:fault_report(39) =============================================================== lib/util.c:smb_panic(1400) PANIC: internal error lib/util.c:smb_panic(1407) BACKTRACE: 15 stack frames: #0 /usr/sbin/winbindd(smb_panic+0x1ab) [0x80b9bbb] #1 /usr/sbin/winbindd [0x80a89cd] #2 /usr/sbin/winbindd [0x80a8a2e] #3 /lib/libc.so.6 [0x4023f5c8] #4 /usr/sbin/winbindd(ads_name_to_sid+0x45) [0x8157ff9] #5 /usr/sbin/winbindd [0x807f44b] #6 /usr/sbin/winbindd [0x8076a28] #7 /usr/sbin/winbindd(winbindd_lookup_sid_by_name+0x73) [0x8073122] #8 /usr/sbin/winbindd(winbindd_getpwnam+0x298) [0x806e536] #9 /usr/sbin/winbindd(strftime+0x1340) [0x806cdf4] #10 /usr/sbin/winbindd(winbind_process_packet+0x1f) [0x806d0e0] #11 /usr/sbin/winbindd(strftime+0x1ec4) [0x806d978] #12 /usr/sbin/winbindd(main+0x51b) [0x806df76] #13 /lib/libc.so.6(__libc_start_main+0xce) [0x4022b8ae] #14 /usr/sbin/winbindd(ldap_msgfree+0x7d) [0x806c711] nsswitch/winbindd.c:winbind_client_read(455) client_read: read 0 bytes. Need 1568 more for a full request. nsswitch/winbindd.c:winbind_client_read(462) read failed on sock 9, pid 6762: EOF - and sometimes winbindd dies silently while the DC is doing a reboot: nsswitch/winbindd_ads.c:sequence_number(778) ads: fetch sequence_number for MYDOMAIN libads/ldap.c:ads_do_paged_search(451) ldap_search_ext_s((objectclass=*)) -> Can't contact LDAP server libads/ldap_utils.c:ads_do_search_retry(60) Reopening ads connection to realm 'MY.DOMAIN' after error Can't contact LDAP server libads/ldap.c:ads_find_dc(147) ads_find_dc: looking for realm 'MY.DOMAIN' libsmb/namequery.c:get_sorted_dc_list(1215) get_sorted_dc_list: attempting lookup using [hosts] libsmb/namequery.c:internal_resolve_name(989) internal_resolve_name: looking up ads#20 lib/gencache.c:gencache_get(264) Returning valid cache entry: key = NBT/ADS#20, value = 10.60.4.2:0, timeout = Thu Sep 11 16:47:27 2003 libsmb/namecache.c:namecache_fetch(201) name ads#20 found. libsmb/namequery.c:remove_duplicate_addrs2(312) remove_duplicate_addrs2: looking for duplicate address/port pairs libsmb/namequery.c:get_dc_list(1350) get_dc_list: returning 1 ip addresses in an ordered list libsmb/namequery.c:get_dc_list(1351) get_dc_list: 10.60.4.2:389 libsmb/conncache.c:check_negative_conn_cache(83) check_negative_conn_cache: returning negative entry for MY.DOMAIN, 10.60.4.2 libads/ldap_utils.c:ads_do_search_retry(68) ads_search_retry: failed to reconnect (Transport endpoint is not connected) winbindd: unbind.c:40: ldap_unbind_ext: Assertion `( (ld)->ld_options.ldo_valid == 0x2 )' failed. nothing more. winbind is just dead then. although disappearing DCs are not very common, i think winbind should possibly deal better with that and survive the period until the DC is available again.
Created attachment 137 [details] Backtrace handler for samba Guenther have you considered installing a panic action handler for your system? You are coming up with some really good bugs and it would certainly make things faster if we could get a stack backtrace from you early on. Tridge has written a neat little script that should produce a stack trace in the log file when a Samba program crashes. If you copy it to /usr/local/bin and add the following to your smb.conf file you will be able to provide a stack backtrace in your initial reports: [global] panic action = /usr/local/bin/backtrace %d
cc me
Guenther, I can't reproduce this using the latest SAMBA_3_0 cvs. Is it still a propblem for you?
Nevermind. I found it. I have a crash in winbindd_ads.c: sequence_number() where we are trying to destroy an LDAP connection. Probably a double free or something.
Found it. We we're calling ads_destroy twice on the same structure. Fixed checked into SAMBA_3_0.
Sorry for off topic. But who is the author of attachment 137 [details], a simple script to call gdb. We want to include it to the example/scripts/ of SuSE's package. And as usual I want to add an author and copyright.
It's already in ./testsuite/build_farm/backtrace Tridge wrote it IIRC.
originally reported against one of the 3.0.0rc[1-4] releases. Cleaning up non-production versions.
sorry for the same, cleaning up the database to prevent unecessary reopens of bugs.