I believe I have found a bug in smbd. I've never experienced this before, nor has anyone else, but looking at the source code (see below), I'm really surprised this hasn't arisen before. Since I'm using this on Gentoo, I've filed a similar report there: http://bugs.gentoo.org/show_bug.cgi?id=283919 I've also been discussing this on the mailing list: http://lists.samba.org/archive/samba/2009-September/150351.html At the moment a client attempts to connect, smbd is crashing with signal 11. Here's the relevant bit of the log: [2009/09/06 22:24:44, 0] smbd/server.c:main(1274) smbd version 3.3.7 started. Copyright Andrew Tridgell and the Samba Team 1992-2009 [2009/09/06 22:24:44, 0] printing/print_cups.c:cups_connect(103) Unable to connect to CUPS server /var/run/cups/cups.sock:631 - No such file or directory [2009/09/06 22:24:44, 0] printing/print_cups.c:cups_connect(103) Unable to connect to CUPS server /var/run/cups/cups.sock:631 - No such file or directory [2009/09/06 22:26:09, 0] smbd/server.c:main(1274) smbd version 3.3.7 started. Copyright Andrew Tridgell and the Samba Team 1992-2009 [2009/09/06 22:26:09, 0] printing/print_cups.c:cups_connect(103) Unable to connect to CUPS server /var/run/cups/cups.sock:631 - No such file or directory [2009/09/06 22:26:09, 0] printing/print_cups.c:cups_connect(103) Unable to connect to CUPS server /var/run/cups/cups.sock:631 - No such file or directory [2009/09/06 22:26:43, 0] lib/fault.c:fault_report(40) =============================================================== [2009/09/06 22:26:43, 0] lib/fault.c:fault_report(41) INTERNAL ERROR: Signal 11 in pid 16066 (3.3.7) Please read the Trouble-Shooting section of the Samba3-HOWTO [2009/09/06 22:26:43, 0] lib/fault.c:fault_report(43) From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf [2009/09/06 22:26:43, 0] lib/fault.c:fault_report(44) =============================================================== [2009/09/06 22:26:43, 0] lib/util.c:smb_panic(1673) PANIC (pid 16066): internal error [2009/09/06 22:26:43, 0] lib/util.c:log_stack_trace(1777) BACKTRACE: 8 stack frames: #0 /usr/sbin/smbd(log_stack_trace+0x1c) [0x7f4fdfff6b10] #1 /usr/sbin/smbd(smb_panic+0x5b) [0x7f4fdfff6c1d] #2 /usr/sbin/smbd [0x7f4fdffe3e71] #3 /lib/libpthread.so.0 [0x7f4fde09bef0] #4 /usr/sbin/smbd(dns_register_smbd_reply+0x1c) [0x7f4fdfe59e3b] #5 /usr/sbin/smbd(main+0x16e8) [0x7f4fe01f05cc] #6 /lib/libc.so.6(__libc_start_main+0xe6) [0x7f4fdca49a26] #7 /usr/sbin/smbd [0x7f4fdfde1339] [2009/09/06 22:26:43, 0] lib/fault.c:dump_core(231) dumping core in /var/log/samba/cores/smbd Note that I fixed the CUPS issue, but that didn't help. Using gdb, this is the stack trace: #0 0x00007f4fdca5d645 in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. in ../nptl/sysdeps/unix/sysv/linux/raise.c (gdb) where #0 0x00007f4fdca5d645 in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007f4fdca5eb63 in abort () at abort.c:88 #2 0x00007f4fdffe38db in dump_core () at lib/fault.c:242 #3 0x00007f4fdfff6d3b in smb_panic (why=<value optimized out>) at lib/util.c:1689 #4 0x00007f4fdffe3e71 in sig_fault (sig=11) at lib/fault.c:46 #5 <signal handler called> #6 dns_register_smbd_reply (dns_state=0x0, lfds=0x7ffff4963ed0, timeout=0x7ffff4964060) at smbd/dnsregister.c:171 #7 0x00007f4fe01f05cc in main (argc=<value optimized out>, argv=<value optimized out>) at smbd/server.c:689 To make a long story short, in server.c, there's this code: static bool open_sockets_smbd(bool is_daemon, bool interactive, const char *smb_ports) { ... struct dns_reg_state * dns_reg = NULL; ... nothing that modifies dns_reg ... /* process pending nDNS responses */ if (dns_register_smbd_reply(dns_reg, &r_fds, &idle_timeout)) { --num; } ... } Then the function dns_register_smbd_reply (disregister.c) blindly rereferences the first argument: bool dns_register_smbd_reply(struct dns_reg_state *dns_state, fd_set *lfds, struct timeval *timeout) { int mdnsd_conn_fd = -1; if (dns_state->srv_ref == NULL) { return false; } ... } I was hoping someone could help me figure out why this is happening now and didn't before, so I can work around it. Thanks.
This "patch" solves the problem: bool dns_register_smbd_reply(struct dns_reg_state *dns_state, fd_set *lfds, struct timeval *timeout) { int mdnsd_conn_fd = -1; + if (!dns_state) return false; if (dns_state->srv_ref == NULL) { return false; }
Created attachment 4657 [details] Patch in git-format-patch form Thanks a lot for that patch! I never got around to diagnose this, I always recommend to use the native avahi support. Volker
Comment on attachment 4657 [details] Patch in git-format-patch form The patch is obviously correct. => ACK
Karolin, please pick this for the next 3.3. bugfix release. Cheers - Michael
BTW, should we try to figure out why samba, for me, suddenly started taking this code path when apparently it never had before? This could uncover another bug.
Pushed to v3-3-test and v3-4-test (will be included in 3.4.1).
(In reply to comment #5) > BTW, should we try to figure out why samba, for me, suddenly started taking > this code path when apparently it never had before? This could uncover another > bug. Jeremy, should anyone investigate? If not, the bug report can be closed as the patch has been pushed. Thanks!
The patch is correct but essentially disables the mDNS code. There is a missing chunk of code from 3.2.x that should be in the 3.3.x main loop. I'll add this code back in. Jeremy.
Created attachment 4665 [details] Second part of fix for 3.3.x. Fix in git-format-patch format. Jeremy.
(In reply to comment #8) > The patch is correct but essentially disables the mDNS code. There is a missing > chunk of code from 3.2.x that should be in the 3.3.x main loop. > I'll add this code back in. > Jeremy. > Does this partially answer why I'm the only one to experience this problem?
*** Bug 6734 has been marked as a duplicate of this bug. ***
Is there a chance to get the second patch into 3.3.8, too?
Well, I don't really know. I don't have the environment at hand to actually test this. Someone from Apple (James?) needs to ack it, everybody else should go with the native avahi interface. Volker
Still presents on samba 3.3.8
What shall we do with this one? Lowering security?
Well, I'd say just put it in. It can't get worse than it is now. If Apple does not react, there's not much we can do. Volker
Comment on attachment 4665 [details] Second part of fix for 3.3.x. Just put it in. Volker
Pushed both patches too v3-3-test. Will be included in Samba 3.3.10. Closing out bug report. Please reopen, if it's still an issue. Thanks!