I've been having problems with Samba segfaulting recently (apparently during printer-related operations, although it seems to also show up during "print-to-PDF" from windows), and so I've been running it under valgrind to attempt to isolate the problem. The configuration was working for a long while, but after a software update to several packages (NSS, samba, and openldap included) I started having this segfault issue. Unfortunately the Samba "panic-action" script is usually unable to intercept the process before it disappears. I usually get: > The Samba 'panic action' script, /usr/share/samba/panic-action, > was called for PID 7460 (). > > This means there was a problem with the program, such as a segfault. > However, the executable could not be found for process 7460. > It may have died unexpectedly, or you may not have permission to debug > the process. For reference, the system is a Debian testing/unstable box running samba "2:3.2.3-1" and slapd "2.4.10-3". /etc/nsswitch.conf has "ldap" configured for passwd, group, networks, and netgroup, backed by the libnss-ldapd package (version 0.6.5). The guts of the configuration is: > security = user > encrypt passwords = true > obey pam restrictions = yes > passdb backend = ldapsam:ldapi://%2fvar%2frun%2fslapd%2fldapi > enable privileges = Yes > ldap suffix = dc=moffett,dc=home > ldap machine suffix = ou=host > ldap user suffix = ou=user > ldap group suffix = ou=group > ldap admin dn = dc=samba0,ou=samba,dc=moffett,dc=home > ldapsam:trusted = yes > ldapsam:editposix = yes > ldap delete dn = yes > idmap domains = MOFFETT.HOME > idmap config MOFFETT.HOME:backend = ldap > idmap config MOFFETT.HOME:readonly = no > idmap config MOFFETT.HOME:default = yes > idmap config MOFFETT.HOME:ldap_base_dn = ou=idmap,dc=moffett,dc=home > idmap config MOFFETT.HOME:ldap_user_dn = dc=samba0,ou=samba,dc=moffett,dc=home > idmap config MOFFETT.HOME:ldap_url = ldapi://%2fvar%2frun%2fslapd%2fldapi > idmap config MOFFETT.HOME:range = 10000-99999 > idmap alloc backend = ldap > idmap alloc config:ldap_base_dn = ou=idmap,dc=moffett,dc=home > idmap alloc config:ldap_user_dn = dc=samba0,ou=samba,dc=moffett,dc=home > idmap alloc config:ldap_url = ldapi://%2fvar%2frun%2fslapd%2fldapi > idmap alloc config:range = 10000-99999 > map acl inherit = Yes > domain logins = yes > logon path = \\%N\%U\Library\Windows\Profile > logon drive = H: > logon home = \\%N\%U > logon script = scripts\logon.bat > load printers = yes > printing = cups > printcap name = cups Here's the first couple errors I got (before I installed debugging symbols): > 0x00002ac80370c465 in waitpid () from /lib/libc.so.6 > #0 0x00002ac80370c465 in waitpid () from /lib/libc.so.6 > #1 0x00002ac8036ad401 in ?? () from /lib/libc.so.6 > #2 0x000000000060d11b in smb_panic () > #3 0x0000000000611f11 in ?? () > #4 0x00000000004b858e in ?? () > #5 0x00000000004b890b in set_sec_ctx () > #6 0x00000000004ae60f in change_to_user () > #7 0x00000000004c62e4 in ?? () > #8 0x00000000004c7872 in smbd_process () > #9 0x00000000006bce1d in main () and: > 00007f4d61b43a55 in waitpid () from /lib/libc.so.6 > #0 0x00007f4d61b43a55 in waitpid () from /lib/libc.so.6 > #1 0x00007f4d61ae3e31 in do_system () from /lib/libc.so.6 > #2 0x000000000060df2b in smb_panic () > #3 0x00000000005fbd62 in ?? () > #4 <signal handler called> > #5 0x00007f4d5b5a0204 in ?? () from /lib/libnss_ldap.so.2 > #6 0x0000000000000000 in ?? () The only other usable error I got was: > 0x00007f9f09e2ca55 in waitpid () from /lib/libc.so.6 > #0 0x00007f9f09e2ca55 in waitpid () from /lib/libc.so.6 > #1 0x00007f9f09dcce31 in do_system () from /lib/libc.so.6 > #2 0x00000000005ae5c8 in smb_panic () > #3 0x000000000059aee4 in sig_fault () > #4 <signal handler called> > #5 0x00007f9f09e7a524 in free_memory () from /lib/libc.so.6 > #6 0x00007f9f09e7a844 in innetgr () from /lib/libc.so.6 > #7 0x0000000000483553 in user_in_netgroup () > #8 0x0000000000774ad5 in token_contains_name_in_list () > #9 0x0000000000488722 in change_to_user () > #10 0x00000000004a6be6 in make_connection_snum () > #11 0x00000000004a7896 in make_connection () > #12 0x0000000000791fe0 in reply_tcon_and_X () > #13 0x00000000004a264e in switch_message () > #14 0x00000000004a40f8 in smbd_process () > #15 0x0000000000471843 in main () This is the first valgrind error I have run into: > ==00:00:07:36.006 7274== Syscall param write(buf) points to uninitialised byte(s) > ==00:00:07:36.006 7274== at 0x6281730: __write_nocancel (in /usr/lib/debug/libpthread-2.7.so) > ==00:00:07:36.006 7274== by 0x59E6D2: sys_write (in /usr/sbin/smbd) > ==00:00:07:36.006 7274== by 0x5B1DD2: write_data (in /usr/sbin/smbd) > ==00:00:07:36.006 7274== by 0x4A3C3D: srv_send_smb (in /usr/sbin/smbd) > ==00:00:07:36.006 7274== by 0x784689: send_nt_replies (in /usr/sbin/smbd) > ==00:00:07:36.006 7274== by 0x784A68: call_nt_transact_ioctl (in /usr/sbin/smbd) > ==00:00:07:36.006 7274== by 0x787A5B: handle_nttrans (in /usr/sbin/smbd) > ==00:00:07:36.006 7274== by 0x788F86: reply_nttrans (in /usr/sbin/smbd) > ==00:00:07:36.006 7274== by 0x4A264D: switch_message (in /usr/sbin/smbd) > ==00:00:07:36.006 7274== by 0x4A40F7: smbd_process (in /usr/sbin/smbd) > ==00:00:07:36.006 7274== by 0x471842: main (in /usr/sbin/smbd) > ==00:00:07:36.006 7274== Address 0x8d765d3 is 747 bytes inside a block of size 8,288 alloc'd > ==00:00:07:36.006 7274== at 0x4C20FEB: malloc (vg_replace_malloc.c:207) > ==00:00:07:36.006 7274== by 0x779CE23: talloc_pool (in /usr/lib/libtalloc.so.1.2.0) > ==00:00:07:36.006 7274== by 0x58E837: talloc_stackframe_internal (in /usr/sbin/smbd) > ==00:00:07:36.007 7274== by 0x58E89B: talloc_stackframe_pool (in /usr/sbin/smbd) > ==00:00:07:36.007 7274== by 0x4A3DAF: smbd_process (in /usr/sbin/smbd) > ==00:00:07:36.007 7274== by 0x471842: main (in /usr/sbin/smbd)
The valgrind error is a bug, but it does not lead to segfaults. This VERY much looks like a fedora bug that we have been hit by for a while, now seemingly in debian. If you have "valid users = @group", please exchange this by "valid users = +group" (assuming you're not using netgroups). Do the segfaults still happen with this +group setting? Volker
Unfortunately the only way I've been able to make it crash has stopped working under valgrind (no other errors), so I can only assume it's either a race condition or something related to the quotactl() syscall (which is unimplemented in valgrind). Even when I can make it crash, the crash trigger isn't 100% reliable, it sometimes takes a while after the trigger before smbd actually segfaults. It's entirely possible that it's some kind of race condition. I didn't have any "valid users" lines, just "invalid users = root". I did, however, have a bunch of "admin users = @domadmins". I changed every occurrence of "@GROUPNAME" to "+GROUPNAME" and restarted, now I'll have to wait and see if I can trigger it again. Cheers, Kyle Moffett
After a couple more weeks of testing, I can report that changing: admin users = @domadmins to: admin users = +domadmins has not changed the crashing behavior at all. We are now going to attempt to live with the significant slowdown for a week while I run it under valgrind to attempt to isolate the problem. I've identified a few issues with valgrind not handling the capget, capset, and quotactl syscalls under x86_64. The valgrind bugreport is here: http://bugs.kde.org/show_bug.cgi?id=173177 I have noticed a few other errors, I'm not sure if they're false positives triggered by the lack of support for those 3 syscalls or actual bugs. I've trimmed the bug list to eliminate the very-similar backtraces, although virtually all of them seem to come from the tdb code. It also looks like perhaps cap_free() is called on an uninitialized cap_t in certain cases if capget fails (buggy cap_get_proc?): ==00:06:16:14.333 28206== 1 errors in context 4 of 100: ==00:06:16:14.333 28206== Conditional jump or move depends on uninitialised value(s) ==00:06:16:14.333 28206== at 0x4031137: cap_free (in /lib/libcap.so.1.10) ==00:06:16:14.333 28206== by 0x403124E: cap_get_proc (in /lib/libcap.so.1.10) ==00:06:16:14.333 28206== by 0x59DEB7: set_process_capability (in /usr/sbin/smbd) ==00:06:16:14.333 28206== by 0x59E06D: set_effective_capability (in /usr/sbin/smbd) ==00:06:16:14.333 28206== by 0x5DCFA3: linux_setlease (in /usr/sbin/smbd) ==00:06:16:14.333 28206== by 0x4AFC5C: vfswrap_linux_setlease (in /usr/sbin/smbd) ==00:06:16:14.333 28206== by 0x5DCD50: linux_set_kernel_oplock (in /usr/sbin/smbd) ==00:06:16:14.333 28206== by 0x5DA586: set_file_oplock (in /usr/sbin/smbd) ==00:06:16:14.333 28206== by 0x4907CA: open_file_ntcreate (in /usr/sbin/smbd) ==00:06:16:14.333 28206== by 0x491529: create_file_unixpath (in /usr/sbin/smbd) ==00:06:16:14.333 28206== by 0x491D7F: create_file (in /usr/sbin/smbd) ==00:06:16:14.333 28206== by 0x783622: reply_ntcreate_and_X (in /usr/sbin/smbd) ==00:06:16:14.333 28206== by 0x4A264D: switch_message (in /usr/sbin/smbd) ==00:06:16:14.333 28206== by 0x4A40F7: smbd_process (in /usr/sbin/smbd) ==00:06:16:14.333 28206== by 0x471842: main (in /usr/sbin/smbd) A sampling of the other valgrind errors: ==00:06:16:14.332 28206== 1 errors in context 1 of 100: ==00:06:16:14.332 28206== Conditional jump or move depends on uninitialised value(s) ==00:06:16:14.332 28206== at 0x836F02: tdb_allocate (in /usr/sbin/smbd) ==00:06:16:14.332 28206== by 0x834183: tdb_store (in /usr/sbin/smbd) ==00:06:16:14.332 28206== by 0x593550: db_tdb_store (in /usr/sbin/smbd) ==00:06:16:14.332 28206== by 0x47FDD6: claim_connection (in /usr/sbin/smbd) ==00:06:16:14.332 28206== by 0x4A69F0: make_connection_snum (in /usr/sbin/smbd) ==00:06:16:14.332 28206== by 0x4A7895: make_connection (in /usr/sbin/smbd) ==00:06:16:14.332 28206== by 0x791FDF: reply_tcon_and_X (in /usr/sbin/smbd) ==00:06:16:14.332 28206== by 0x4A264D: switch_message (in /usr/sbin/smbd) ==00:06:16:14.332 28206== by 0x4A40F7: smbd_process (in /usr/sbin/smbd) ==00:06:16:14.332 28206== by 0x471842: main (in /usr/sbin/smbd) ==00:06:16:14.332 28206== 1 errors in context 2 of 100: ==00:06:16:14.332 28206== Conditional jump or move depends on uninitialised value(s) ==00:06:16:14.332 28206== at 0x837C14: tdb_oob (in /usr/sbin/smbd) ==00:06:16:14.332 28206== by 0x836A52: tdb_rec_free_read (in /usr/sbin/smbd) ==00:06:16:14.332 28206== by 0x836F24: tdb_allocate (in /usr/sbin/smbd) ==00:06:16:14.332 28206== by 0x834183: tdb_store (in /usr/sbin/smbd) ==00:06:16:14.332 28206== by 0x593550: db_tdb_store (in /usr/sbin/smbd) ==00:06:16:14.332 28206== by 0x47F815: store_pipe_opendb (in /usr/sbin/smbd) ==00:06:16:14.332 28206== by 0x7829D6: nt_open_pipe (in /usr/sbin/smbd) ==00:06:16:14.332 28206== by 0x7833DF: reply_ntcreate_and_X (in /usr/sbin/smbd) ==00:06:16:14.332 28206== by 0x4A264D: switch_message (in /usr/sbin/smbd) ==00:06:16:14.332 28206== by 0x4A40F7: smbd_process (in /usr/sbin/smbd) ==00:06:16:14.332 28206== by 0x471842: main (in /usr/sbin/smbd) ==00:06:16:14.335 28206== 9 errors in context 9 of 100: ==00:06:16:14.335 28206== Conditional jump or move depends on uninitialised value(s) ==00:06:16:14.335 28206== at 0x833DC3: tdb_delete_hash (in /usr/sbin/smbd) ==00:06:16:14.335 28206== by 0x834345: tdb_delete (in /usr/sbin/smbd) ==00:06:16:14.335 28206== by 0x5934DC: db_tdb_delete (in /usr/sbin/smbd) ==00:06:16:14.335 28206== by 0x47FB6F: yield_connection (in /usr/sbin/smbd) ==00:06:16:14.335 28206== by 0x4A441F: close_cnum (in /usr/sbin/smbd) ==00:06:16:14.335 28206== by 0x78D4E4: reply_tdis (in /usr/sbin/smbd) ==00:06:16:14.335 28206== by 0x4A264D: switch_message (in /usr/sbin/smbd) ==00:06:16:14.335 28206== by 0x4A40F7: smbd_process (in /usr/sbin/smbd) ==00:06:16:14.335 28206== by 0x471842: main (in /usr/sbin/smbd) ==00:06:16:14.335 28206== 9 errors in context 10 of 100: ==00:06:16:14.335 28206== Conditional jump or move depends on uninitialised value(s) ==00:06:16:14.335 28206== at 0x833976: tdb_parse_record (in /usr/sbin/smbd) ==00:06:16:14.335 28206== by 0x593992: db_tdb_fetch_locked (in /usr/sbin/smbd) ==00:06:16:14.335 28206== by 0x5BBB5D: connections_fetch_record (in /usr/sbin/smbd) ==00:06:16:14.336 28206== by 0x5BBBE7: connections_fetch_entry (in /usr/sbin/smbd) ==00:06:16:14.336 28206== by 0x47FB08: yield_connection (in /usr/sbin/smbd) ==00:06:16:14.336 28206== by 0x4A441F: close_cnum (in /usr/sbin/smbd) ==00:06:16:14.336 28206== by 0x78D4E4: reply_tdis (in /usr/sbin/smbd) ==00:06:16:14.336 28206== by 0x4A264D: switch_message (in /usr/sbin/smbd) ==00:06:16:14.336 28206== by 0x4A40F7: smbd_process (in /usr/sbin/smbd) ==00:06:16:14.336 28206== by 0x471842: main (in /usr/sbin/smbd) ==00:06:16:14.336 28206== 9 errors in context 11 of 100: ==00:06:16:14.336 28206== Conditional jump or move depends on uninitialised value(s) ==00:06:16:14.336 28206== at 0x83419F: tdb_store (in /usr/sbin/smbd) ==00:06:16:14.336 28206== by 0x593550: db_tdb_store (in /usr/sbin/smbd) ==00:06:16:14.336 28206== by 0x47FDD6: claim_connection (in /usr/sbin/smbd) ==00:06:16:14.336 28206== by 0x4A69F0: make_connection_snum (in /usr/sbin/smbd) ==00:06:16:14.336 28206== by 0x4A7895: make_connection (in /usr/sbin/smbd) ==00:06:16:14.336 28206== by 0x791FDF: reply_tcon_and_X (in /usr/sbin/smbd) ==00:06:16:14.336 28206== by 0x4A264D: switch_message (in /usr/sbin/smbd) ==00:06:16:14.336 28206== by 0x4A40F7: smbd_process (in /usr/sbin/smbd) ==00:06:16:14.336 28206== by 0x471842: main (in /usr/sbin/smbd) ==00:06:16:14.337 28206== 9 errors in context 13 of 100: ==00:06:16:14.337 28206== Conditional jump or move depends on uninitialised value(s) ==00:06:16:14.337 28206== at 0x836EA8: tdb_allocate (in /usr/sbin/smbd) ==00:06:16:14.337 28206== by 0x834183: tdb_store (in /usr/sbin/smbd) ==00:06:16:14.337 28206== by 0x593550: db_tdb_store (in /usr/sbin/smbd) ==00:06:16:14.337 28206== by 0x47F815: store_pipe_opendb (in /usr/sbin/smbd) ==00:06:16:14.337 28206== by 0x7829D6: nt_open_pipe (in /usr/sbin/smbd) ==00:06:16:14.337 28206== by 0x7833DF: reply_ntcreate_and_X (in /usr/sbin/smbd) ==00:06:16:14.337 28206== by 0x4A264D: switch_message (in /usr/sbin/smbd) ==00:06:16:14.337 28206== by 0x4A40F7: smbd_process (in /usr/sbin/smbd) ==00:06:16:14.337 28206== by 0x471842: main (in /usr/sbin/smbd) ==00:06:16:14.337 28206== 10 errors in context 15 of 100: ==00:06:16:14.337 28206== Conditional jump or move depends on uninitialised value(s) ==00:06:16:14.337 28206== at 0x837C14: tdb_oob (in /usr/sbin/smbd) ==00:06:16:14.337 28206== by 0x836A52: tdb_rec_free_read (in /usr/sbin/smbd) ==00:06:16:14.337 28206== by 0x836E84: tdb_allocate (in /usr/sbin/smbd) ==00:06:16:14.337 28206== by 0x834183: tdb_store (in /usr/sbin/smbd) ==00:06:16:14.337 28206== by 0x593550: db_tdb_store (in /usr/sbin/smbd) ==00:06:16:14.337 28206== by 0x47FDD6: claim_connection (in /usr/sbin/smbd) ==00:06:16:14.338 28206== by 0x4A69F0: make_connection_snum (in /usr/sbin/smbd) ==00:06:16:14.338 28206== by 0x4A7895: make_connection (in /usr/sbin/smbd) ==00:06:16:14.338 28206== by 0x791FDF: reply_tcon_and_X (in /usr/sbin/smbd) ==00:06:16:14.338 28206== by 0x4A264D: switch_message (in /usr/sbin/smbd) ==00:06:16:14.338 28206== by 0x4A40F7: smbd_process (in /usr/sbin/smbd) ==00:06:16:14.338 28206== by 0x471842: main (in /usr/sbin/smbd) ==00:06:16:14.364 28206== 45 errors in context 76 of 100: ==00:06:16:14.364 28206== Conditional jump or move depends on uninitialised value(s) ==00:06:16:14.364 28206== at 0x4C23226: memcpy (mc_replace_strmem.c:402) ==00:06:16:14.364 28206== by 0x593801: db_tdb_fetchlock_parse (in /usr/sbin/smbd) ==00:06:16:14.365 28206== by 0x837AB9: tdb_parse_data (in /usr/sbin/smbd) ==00:06:16:14.365 28206== by 0x8339AB: tdb_parse_record (in /usr/sbin/smbd) ==00:06:16:14.365 28206== by 0x593992: db_tdb_fetch_locked (in /usr/sbin/smbd) ==00:06:16:14.365 28206== by 0x5BBB5D: connections_fetch_record (in /usr/sbin/smbd) ==00:06:16:14.365 28206== by 0x47F67E: delete_pipe_opendb (in /usr/sbin/smbd) ==00:06:16:14.365 28206== by 0x53DA05: close_rpc_pipe_hnd (in /usr/sbin/smbd) ==00:06:16:14.365 28206== by 0x82583E: reply_pipe_close (in /usr/sbin/smbd) ==00:06:16:14.365 28206== by 0x78D884: reply_close (in /usr/sbin/smbd) ==00:06:16:14.365 28206== by 0x4A264D: switch_message (in /usr/sbin/smbd) ==00:06:16:14.365 28206== by 0x4A40F7: smbd_process (in /usr/sbin/smbd) ==00:06:16:14.365 28206== by 0x471842: main (in /usr/sbin/smbd) ==00:06:16:14.366 28206== 46 errors in context 80 of 100: ==00:06:16:14.366 28206== Syscall param write(buf) points to uninitialised byte(s) ==00:06:16:14.366 28206== at 0x6281730: __write_nocancel (in /usr/lib/debug/libpthread-2.7.so) ==00:06:16:14.366 28206== by 0x59E6D2: sys_write (in /usr/sbin/smbd) ==00:06:16:14.366 28206== by 0x5B1DD2: write_data (in /usr/sbin/smbd) ==00:06:16:14.366 28206== by 0x4A3C3D: srv_send_smb (in /usr/sbin/smbd) ==00:06:16:14.366 28206== by 0x784689: send_nt_replies (in /usr/sbin/smbd) ==00:06:16:14.366 28206== by 0x784A68: call_nt_transact_ioctl (in /usr/sbin/smbd) ==00:06:16:14.366 28206== by 0x787A5B: handle_nttrans (in /usr/sbin/smbd) ==00:06:16:14.366 28206== by 0x788F86: reply_nttrans (in /usr/sbin/smbd) ==00:06:16:14.367 28206== by 0x4A264D: switch_message (in /usr/sbin/smbd) ==00:06:16:14.367 28206== by 0x4A40F7: smbd_process (in /usr/sbin/smbd) ==00:06:16:14.367 28206== by 0x471842: main (in /usr/sbin/smbd) ==00:06:16:14.367 28206== Address 0xeac44f3 is 747 bytes inside a block of size 8,288 alloc'd ==00:06:16:14.367 28206== at 0x4C20FEB: malloc (vg_replace_malloc.c:207) ==00:06:16:14.367 28206== by 0x779CE23: talloc_pool (in /usr/lib/libtalloc.so.1.2.0) ==00:06:16:14.367 28206== by 0x58E837: talloc_stackframe_internal (in /usr/sbin/smbd) ==00:06:16:14.367 28206== by 0x58E89B: talloc_stackframe_pool (in /usr/sbin/smbd) ==00:06:16:14.367 28206== by 0x4A3DAF: smbd_process (in /usr/sbin/smbd) ==00:06:16:14.367 28206== by 0x471842: main (in /usr/sbin/smbd) Then the summary: ==00:06:16:14.377 28206== IN SUMMARY: 4146 errors from 100 contexts (suppressed: 32 from 1) ==00:06:16:14.377 28206== ==00:06:16:14.378 28206== malloc/free: in use at exit: 1,442,182 bytes in 11,068 blocks. ==00:06:16:14.378 28206== malloc/free: 5,591,030 allocs, 5,579,962 frees, 11,783,545,858 bytes allocated. ==00:06:16:14.378 28206== ==00:06:16:14.392 28206== searching for pointers to 11,068 not-freed blocks. ==00:06:16:14.448 28206== checked 1,877,984 bytes. ==00:06:16:14.449 28206== ==00:06:16:14.449 28206== LEAK SUMMARY: ==00:06:16:14.449 28206== definitely lost: 86,918 bytes in 1,111 blocks. ==00:06:16:14.449 28206== possibly lost: 301 bytes in 3 blocks. ==00:06:16:14.449 28206== still reachable: 1,354,963 bytes in 9,954 blocks. ==00:06:16:14.449 28206== suppressed: 0 bytes in 0 blocks. ==00:06:16:14.449 28206== Rerun with --leak-check=full to see details of leaked memory. Perhaps at some point I'll rerun with leak-checking to isolate those leaks, but at the moment I'm more interested in solving the random infrequent crashing. Cheers, Kyle Moffett
Finally figured out how to make it backtrace without valgrind. Here's one crash trace: [2008/11/13 22:41:20, 2, pid=23726] smbd/close.c:close_normal_file(586) brad closed file Library/Windows/Application Data/Google/Google Desktop/83b0a4501a89/uinfo.dat (numopen=78) NT_STATUS_OK Then the following repeats several times: ### BEGIN ### [2008/11/13 22:42:16, 2, pid=23726] lib/smbldap.c:smbldap_open_connection(796) smbldap_open_connection: connection opened [2008/11/13 22:42:16, 2, pid=23726] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 22:42:16, 2, pid=23726] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 22:42:17, 2, pid=23726] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 22:42:17, 0, pid=23726] libsmb/clientgen.c:cli_receive_smb(165) Receiving SMB: Server stopped responding [2008/11/13 22:42:17, 1, pid=23726] libsmb/cliconnect.c:cli_start_connection(1675) failed negprot [2008/11/13 22:42:17, 2, pid=23726] rpc_server/srv_spoolss_nt.c:spoolss_connect_to_client(2601) spoolss_connect_to_client: connection to [PERSEPHONE] failed! [2008/11/13 22:42:17, 0, pid=23726] libsmb/clientgen.c:cli_receive_smb(165) Receiving SMB: Server stopped responding [2008/11/13 22:42:17, 1, pid=23726] libsmb/cliconnect.c:cli_start_connection(1675) failed negprot [2008/11/13 22:42:17, 2, pid=23726] rpc_server/srv_spoolss_nt.c:spoolss_connect_to_client(2601) spoolss_connect_to_client: connection to [PERSEPHONE] failed! [2008/11/13 22:42:17, 0, pid=23726] libsmb/clientgen.c:cli_receive_smb(165) Receiving SMB: Server stopped responding [2008/11/13 22:42:17, 1, pid=23726] libsmb/cliconnect.c:cli_start_connection(1675) failed negprot [2008/11/13 22:42:17, 2, pid=23726] rpc_server/srv_spoolss_nt.c:spoolss_connect_to_client(2601) spoolss_connect_to_client: connection to [PERSEPHONE] failed! ### END ### Then I get this: [2008/11/13 22:42:26, 0, pid=23726] lib/fault.c:fault_report(40) =============================================================== [2008/11/13 22:42:26, 0, pid=23726] lib/fault.c:fault_report(41) INTERNAL ERROR: Signal 11 in pid 23726 (3.2.3) Please read the Trouble-Shooting section of the Samba3-HOWTO [2008/11/13 22:42:26, 0, pid=23726] lib/fault.c:fault_report(43) From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf [2008/11/13 22:42:26, 0, pid=23726] lib/fault.c:fault_report(44) =============================================================== [2008/11/13 22:42:26, 0, pid=23726] lib/util.c:smb_panic(1663) PANIC (pid 23726): internal error [2008/11/13 22:42:26, 0, pid=23726] lib/util.c:log_stack_trace(1767) BACKTRACE: 24 stack frames: #0 /usr/sbin/smbd(log_stack_trace+0x1c) [0x5ae10d] #1 /usr/sbin/smbd(smb_panic+0x5b) [0x5ae21b] #2 /usr/sbin/smbd [0x59ada4] #3 /lib/libpthread.so.0 [0x7fa0604a0a80] #4 /usr/sbin/smbd(read_smb_length_return_keepalive+0x56) [0x5b21dc] #5 /usr/sbin/smbd(receive_smb_raw+0x42) [0x5b2353] #6 /usr/sbin/smbd(cli_receive_smb+0x51) [0x4b205d] #7 /usr/sbin/smbd(cli_negprot+0x168) [0x4b3a1e] #8 /usr/sbin/smbd(cli_start_connection+0x3c1) [0x4b43b9] #9 /usr/sbin/smbd(cli_full_connection+0x90) [0x4b75fe] #10 /usr/sbin/smbd(_spoolss_rffpcnex+0x5ae) [0x5276e4] #11 /usr/sbin/smbd [0x51873d] #12 /usr/sbin/smbd(api_rpcTNP+0x26d) [0x544f35] #13 /usr/sbin/smbd(api_pipe_request+0x1f5) [0x54545d] #14 /usr/sbin/smbd [0x53f314] #15 /usr/sbin/smbd(write_to_pipe+0x115) [0x53da8e] #16 /usr/sbin/smbd [0x77557f] #17 /usr/sbin/smbd [0x775c2d] #18 /usr/sbin/smbd(reply_trans+0x724) [0x7768bf] #19 /usr/sbin/smbd [0x4a250e] #20 /usr/sbin/smbd(smbd_process+0x419) [0x4a3fb8] #21 /usr/sbin/smbd(main+0x1fee) [0x4716f3] #22 /lib/libc.so.6(__libc_start_main+0xe6) [0x7fa05ea431a6] #23 /usr/sbin/smbd [0x46ecd9]
Actually, I found this in my logs from earlier too. This only has one repetition before it crashes: [2008/11/13 15:43:35, 2, pid=31771] smbd/open.c:open_file(407) brad opened file Library/Windows/Application Data/Google/Google Desktop/83b0a4501a89/uinfo.dat read=No write=No (numopen=75) [2008/11/13 15:43:35, 2, pid=31771] smbd/close.c:close_normal_file(586) brad closed file Library/Windows/Application Data/Google/Google Desktop/83b0a4501a89/uinfo.dat (numopen=74) NT_STATUS_OK [2008/11/13 15:43:35, 2, pid=31771] smbd/close.c:close_normal_file(586) brad closed file Library/Windows/Application Data/Google/Google Desktop/83b0a4501a89/uinfo.dat (numopen=73) NT_STATUS_OK [2008/11/13 15:43:39, 2, pid=31771] lib/smbldap.c:smbldap_open_connection(796) smbldap_open_connection: connection opened [2008/11/13 15:43:39, 2, pid=31771] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 15:43:39, 2, pid=31771] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_sam_from_ldap(571) init_sam_from_ldap: Entry found for user: root [2008/11/13 15:43:40, 2, pid=31771] passdb/pdb_ldap.c:init_group_from_ldap(2344) init_group_from_ldap: Entry found for group: 0 [2008/11/13 15:43:40, 0, pid=31771] lib/fault.c:fault_report(40) =============================================================== [2008/11/13 15:43:40, 0, pid=31771] lib/fault.c:fault_report(41) INTERNAL ERROR: Signal 11 in pid 31771 (3.2.3) Please read the Trouble-Shooting section of the Samba3-HOWTO [2008/11/13 15:43:40, 0, pid=31771] lib/fault.c:fault_report(43) From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf [2008/11/13 15:43:40, 0, pid=31771] lib/fault.c:fault_report(44) =============================================================== [2008/11/13 15:43:40, 0, pid=31771] lib/util.c:smb_panic(1663) PANIC (pid 31771): internal error [2008/11/13 15:43:40, 0, pid=31771] lib/util.c:log_stack_trace(1767) BACKTRACE: 24 stack frames: #0 /usr/sbin/smbd(log_stack_trace+0x1c) [0x5ae10d] #1 /usr/sbin/smbd(smb_panic+0x5b) [0x5ae21b] #2 /usr/sbin/smbd [0x59ada4] #3 /lib/libpthread.so.0 [0x7fa0604a0a80] #4 /usr/sbin/smbd(read_smb_length_return_keepalive+0x56) [0x5b21dc] #5 /usr/sbin/smbd(receive_smb_raw+0x42) [0x5b2353] #6 /usr/sbin/smbd(cli_receive_smb+0x51) [0x4b205d] #7 /usr/sbin/smbd(cli_negprot+0x168) [0x4b3a1e] #8 /usr/sbin/smbd(cli_start_connection+0x3c1) [0x4b43b9] #9 /usr/sbin/smbd(cli_full_connection+0x90) [0x4b75fe] #10 /usr/sbin/smbd(_spoolss_rffpcnex+0x5ae) [0x5276e4] #11 /usr/sbin/smbd [0x51873d] #12 /usr/sbin/smbd(api_rpcTNP+0x26d) [0x544f35] #13 /usr/sbin/smbd(api_pipe_request+0x1f5) [0x54545d] #14 /usr/sbin/smbd [0x53f314] #15 /usr/sbin/smbd(write_to_pipe+0x115) [0x53da8e] #16 /usr/sbin/smbd [0x77557f] #17 /usr/sbin/smbd [0x775c2d] #18 /usr/sbin/smbd(reply_trans+0x724) [0x7768bf] #19 /usr/sbin/smbd [0x4a250e] #20 /usr/sbin/smbd(smbd_process+0x419) [0x4a3fb8] #21 /usr/sbin/smbd(main+0x1fee) [0x4716f3] #22 /lib/libc.so.6(__libc_start_main+0xe6) [0x7fa05ea431a6] #23 /usr/sbin/smbd [0x46ecd9] [2008/11/13 15:43:40, 0, pid=31771] lib/util.c:smb_panic(1668) smb_panic(): calling panic action [/usr/share/samba/panic-action 31771] [2008/11/13 15:43:40, 0, pid=31771] lib/util.c:smb_panic(1676) smb_panic(): action returned status 0 [2008/11/13 15:43:40, 0, pid=31771] lib/fault.c:dump_core(201) dumping core in /var/log/samba/cores/smbd Only I never seem to actually get cores in that directory... :-\ Cheers, Kyle Moffett
Created attachment 3929 [details] Additional logfile with backtraces I'm still experiencing this issue, unfortunately, and no closer to figuring out why it occurs. This is the tail of a large logfile ending in several smbd crashes. This bug takes a while to trigger, around a day or so of use, and I'm unfortunately unable to reproduce it (or even the invalid memory access) under valgrind. There are a couple oddities in this log. The first oddity is the big string of inotify errors, the second is the attempted accesses to the "\\ares\bra" service (the user share is "brad", and I've search the registry and found no references to "bra" without the "d"). A bit if wireshark-foo indicates that the client is actually sending a request for share "bra", but I'm not sure if that's due to an improper response to a home-share query earlier. If the user reboots the computer after the samba segfault, he is able to log in and have full functionality again under heavy load for roughly a day, including printing. At some point the error recurs and the system crashes. Print jobs are generated as PostScript on the local computer, forwarded to CUPS on the server for job accounting, and then forwarded on from there to the networked PostScript printer. Please let me know if I can provide any additional information. Cheers, Kyle Moffett
Created attachment 3930 [details] Samba smb.conf file This is the config file I'm using with which I am experiencing the crashes while printing.
To be honest, those crashes are so spread out that If I was you I would test your hardware with something like memcheck or so. Do you have any other programs on that box showing weird behaviour? And, parts of the logs are completely weird: #5 /usr/sbin/smbd(bitmap_copy+0xa8) [0x5a21d7] #6 /usr/sbin/smbd(receive_smb_raw+0x42) [0x5b2353] I don't see a path how receive_smb_raw would call bitmap_copy. This makes it really difficult to diagnose what's going on here. I'm closing this as WORKSFORME, I don't see a way to reproduce any of these crashes here. You should really have someone look at your box physically. Volker