Hi, sometimes we get an internal error from a winbindd child. The parent is still running. The version of the dependencies are: - libiconv (1.8) - MIT kerberos v5 (1.4.2) - openssl (0.9.7g) - openldap (2.2.28) - samba (3.0.20) Everything else (sasl for openldap) is the default version as it comes with Solaris 10. Those explicit dependencies where linked statically (we didn't generated shared libs for them for some reasons). The global section of the smb.conf is: ---snip--- [global] show add printer wizard = no server string = schiller workgroup = publications encrypt passwords = yes load printers = no password server = <ip> add user script = /usr/sbin/useradd -g smbusers -s /bin/false %u local master = no dns proxy = no realm = publications.win log level = 2 wins server = <ip, same as above> log file = /opt/OPsamba/var/log.%m security = ads preferred master = false netbios name = schiller domain master = false idmap uid = 30000 - 40000 ---snip--- log.winbindd: ---snip--- [2005/09/08 11:21:41, 1] nsswitch/winbindd.c:main(935) winbindd version 3.0.20 started. Copyright The Samba Team 2000-2004 [2005/09/08 11:21:41, 2] param/loadparm.c:do_section(3559) Processing section "[alpha]" [2005/09/08 11:21:41, 2] param/loadparm.c:do_section(3559) Processing section "[test0815]" [2005/09/08 11:21:41, 2] param/loadparm.c:do_section(3559) Processing section "[truc]" [2005/09/08 11:21:41, 2] param/loadparm.c:do_section(3559) Processing section "[test]" [2005/09/08 11:21:41, 2] lib/interface.c:add_interface(81) added interface ip=<own ip> bcast=<bcast> nmask=<mask> [2005/09/08 11:21:41, 2] lib/interface.c:add_interface(81) added interface ip=<own ip> bcast=<bcast> nmask=<mask> [2005/09/08 11:21:41, 0] nsswitch/winbindd_util.c:winbindd_param_init(766) winbindd: idmap uid range missing or invalid [2005/09/08 11:21:41, 0] nsswitch/winbindd_util.c:winbindd_param_init(767) winbindd: cannot continue, exiting. [2005/09/08 11:21:41, 1] nsswitch/winbindd.c:main(968) Could not init idmap -- netlogon proxy only [2005/09/08 11:21:41, 2] lib/tallocmsg.c:register_msg_pool_usage(56) Registered MSG_REQ_POOL_USAGE [2005/09/08 11:21:41, 2] lib/dmallocmsg.c:register_dmalloc_msgs(71) Registered MSG_REQ_DMALLOC_MARK and LOG_CHANGED [2005/09/08 11:21:41, 2] nsswitch/winbindd_util.c:add_trusted_domain(166) Added domain PUBLICATIONS PUBLICATIONS.WIN S-1-5-21-117609710-1229272821- 839522115 [2005/09/08 11:21:41, 2] nsswitch/winbindd_util.c:add_trusted_domain(166) Added domain BUILTIN S-1-5-32 [2005/09/08 11:21:41, 2] nsswitch/winbindd_util.c:add_trusted_domain(166) Added domain SCHILLER S-1-5-21-308816121-94223975-3382285697 ---snip--- log.wb-PUBLICATIONS contains: ---snip--- [2005/09/08 11:21:41, 0] lib/fault.c:fault_report(37) INTERNAL ERROR: Signal 11 in pid 9260 (3.0.20) Please read the appendix Bugs of the Samba HOWTO collection [2005/09/08 11:21:41, 0] lib/fault.c:fault_report(39) =============================================================== [2005/09/08 11:21:41, 0] lib/util.c:smb_panic2(1548) PANIC: internal error [2005/09/08 11:21:42, 2] libsmb/cliconnect.c:cli_session_setup_kerberos(532) Doing kerberos session setup [2005/09/08 11:21:42, 0] lib/fault.c:fault_report(36) =============================================================== [2005/09/08 11:21:42, 0] lib/fault.c:fault_report(37) INTERNAL ERROR: Signal 11 in pid 9262 (3.0.20) Please read the appendix Bugs of the Samba HOWTO collection [2005/09/08 11:21:42, 0] lib/fault.c:fault_report(39) =============================================================== [2005/09/08 11:21:42, 0] lib/util.c:smb_panic2(1548) PANIC: internal error ---snip--- The gdb backtrace is: ---snip--- (gdb) bt #0 0xff13d5ec in setitimer () from /lib/libc.so.1 #1 0xff0dd88c in putspent () from /lib/libc.so.1 #2 0xff0bde40 in abort () from /lib/libc.so.1 #3 0x000cac9c in smb_panic2 (why=0x3c1980 "internal error", decrement_pid_count=4506464) at lib/util.c:1614 #4 0x000caae4 in smb_panic (why=0x3c1980 "internal error") at lib/util.c:1500 #5 0x000b7314 in fault_report (sig=11) at lib/fault.c:41 #6 0x000b7378 in sig_fault (sig=11) at lib/fault.c:64 #7 0xff13c534 in __csigsetjmp () from /lib/libc.so.1 #8 0xff1319a0 in call_user_handler () from /lib/libc.so.1 #9 0xff1135ec in _ndoprnt () from /lib/libc.so.1 #10 0xff115c6c in vfprintf () from /lib/libc.so.1 #11 0x000d1010 in talloc_vasprintf (t=0x44c760, fmt=0x3b3688 "%s\\%s\\%s", ap=0xffbfe884) at lib/talloc.c:953 #12 0x000d1078 in talloc_asprintf (t=0x44c760, fmt=0x3b3688 "%s\\%s\\%s") at lib/talloc.c:976 #13 0x0007266c in winbindd_dual_list_trusted_domains (domain=0x44dd18, state=0xffbfe990) at nsswitch/winbindd_misc.c:133 #14 0x0007d68c in child_process_request (domain=0x44dd18, state=0xffbfe990) at nsswitch/winbindd_dual.c:361 #15 0x0007dbd8 in fork_domain_child (child=0x44e104) at nsswitch/winbindd_dual.c:490 #16 0x0007d2b4 in schedule_async_request (child=0x44e104) at nsswitch/winbindd_dual.c:198 #17 0x0007d878 in winbind_child_died (pid=9146) at nsswitch/winbindd_dual.c:416 #18 0x00061e08 in process_loop () at nsswitch/winbindd.c:860 #19 0x00062424 in main (argc=4387840, argv=0x42f400) at nsswitch/winbindd.c:1032 (gdb) up 11 #11 0x000d1010 in talloc_vasprintf (t=0x44c760, fmt=0x3b3688 "%s\\%s\\%s", ap=0xffbfe894) at lib/talloc.c:953 953 len = vsnprintf(NULL, 0, fmt, ap2); (gdb) print fmt $1 = 0x3b3688 "%s\\%s\\%s" (gdb) print ap2 No symbol "ap2" in current context. (gdb) list 948 char *ret; 949 va_list ap2; 950 951 VA_COPY(ap2, ap); 952 953 len = vsnprintf(NULL, 0, fmt, ap2); 954 955 ret = _talloc(t, len+1); 956 if (ret) { 957 VA_COPY(ap2, ap); (gdb) print ap $2 = 0xffbfe894 (gdb) print *ap Attempt to dereference a generic pointer. (gdb) print *(char *)ap $3 = 0 '\0' (gdb) up 1 #12 0x000d1078 in talloc_asprintf (t=0x44c760, fmt=0x3b3688 "%s\\%s\\%s") at lib/talloc.c:976 976 ret = talloc_vasprintf(t, fmt, ap); (gdb) list 971 { 972 va_list ap; 973 char *ret; 974 975 va_start(ap, fmt); 976 ret = talloc_vasprintf(t, fmt, ap); 977 va_end(ap); 978 return ret; 979 } 980 (gdb) print t $4 = (const void *) 0x44c760 (gdb) print fmt $5 = 0x3b3688 "%s\\%s\\%s" (gdb) up 1 #13 0x0007266c in winbindd_dual_list_trusted_domains (domain=0x44dd08, state=0xffbfe9a0) at nsswitch/winbindd_misc.c:133 133 extra_data = talloc_asprintf(state->mem_ctx, "%s\\%s\\% s", (gdb) list 128 &alt_names, &sids); 129 130 extra_data = talloc_strdup(state->mem_ctx, ""); 131 132 if (num_domains > 0) 133 extra_data = talloc_asprintf(state->mem_ctx, "%s\\%s\\% s", 134 names[0], alt_names[0], 135 sid_string_static(&sids [0])); 136 137 for (i=1; i<num_domains; i++) (gdb) print state->mem_ctx $6 = (TALLOC_CTX *) 0x44c760 (gdb) print names[0] $7 = 0x44c3a0 "OPOCE.DOM" (gdb) print alt_names[0] $8 = 0x0 (gdb) print sids[0] $9 = {sid_rev_num = 1 '\001', num_auths = 4 '\004', id_auth = "\0\0\0\0\0\005", sub_auths = {21, 1692678069, 581711446, 178676651, 0 <repeats 11 times>}} (gdb) print &sids[0] $10 = (DOM_SID *) 0x44c270 ---snip--- In case you want more debugging output or more information, just ask. Bye, Alexander.
Hi, I've recompiled everything with dynamic libs and without kerberos support in openssl (samba is still compiled with MIT-kerberos). It still dumps core. Bye, Alexander.
Alexander. Please retest the current SAMBA_3_0_RELEASE tree (svn co svn://svnanon.samba.org/samba/branches/SAMBA_3_0_RELEASE samba-3.0.20a) This should be fixed now.
Hi, I've tested with 3.0.20a, but unfortunately the bug isn't fixed. Here's the part which differs from the backtrace with 3.0.20: ---snip--- #13 0x0004adc4 in winbindd_dual_list_trusted_domains (domain=0x1e2b60, state=0xffbfe7f8) at nsswitch/winbindd_misc.c:133 #14 0x00055e38 in child_process_request (domain=0x1e2b60, state=0xffbfe7f8) at nsswitch/winbindd_dual.c:353 #15 0x00056398 in fork_domain_child (child=0x1e2f4c) at nsswitch/winbindd_dual.c:483 #16 0x00055a60 in schedule_async_request (child=0x1e2f4c) at nsswitch/winbindd_dual.c:197 #17 0x00056024 in winbind_child_died (pid=15651) at nsswitch/winbindd_dual.c:408 #18 0x0003a504 in process_loop () at nsswitch/winbindd.c:860 #19 0x0003ab20 in main (argc=1867776, argv=0x1c8000) at nsswitch/winbindd.c:1032 ---snip--- In case you need more debugging output, just ask. Bye, Alexander.
Created attachment 1467 [details] Possible fix Could you try the attached patch? Thanks, Volker
Hi, it seems to fix the problem. No immediate core dump and no core dump each 5 minutes so far (it's running for ~1 hour now). Additionally there's a new thrusted domain showing up in the logfile. Thanks, Alexander.
Applied patch to SVN. Thanks, Jeremy.