When vfs objects = full_audit configured not per-service but in [global] section,
copying files between Windows XP and Samba sporadically stops with error "The specified network name is no longer available".
10.6.7.2 (10.6.7.2) closed connection to service IPC$
INTERNAL ERROR: Signal 11 in pid 22842 (3.3.4)
Please read the Trouble-Shooting section of the Samba3-HOWTO
#0 0x0000000801ce8e8a in wait4 () from /lib/libc.so.7
#1 0x0000000801caeffe in system () from /lib/libc.so.7
#2 0x000000000069fe98 in smb_panic (why=Variable "why" is not available.) at lib/util.c:1679
#3 0x000000000068a838 in sig_fault (sig=Variable "sig" is not available.) at lib/fault.c:46
#4 <signal handler called>
#5 0x000000080234eb84 in do_log (op=SMB_VFS_OP_DISCONNECT, success=Variable "success" is not available.) at modules/vfs_full_audit.c:712
#6 0x000000080235166b in smb_full_audit_disconnect (handle=0x802029050) at modules/vfs_full_audit.c:915
#7 0x00000000004e6154 in close_cnum (conn=0x80207e050, vuid=101) at smbd/service.c:1326
#8 0x00000000004a46f2 in reply_tdis (req=0x802022130) at smbd/reply.c:4605
#9 0x00000000004e3a30 in switch_message (type=113 'q', req=0x802022130, size=Variable "size" is not available.) at smbd/process.c:1486
#10 0x00000000004e5ea7 in smbd_process () at smbd/process.c:1509
#11 0x00000000008c22f4 in main (argc=-1, argv=0x7fffffffd278) at smbd/server.c:1519
Event processing in vfs_full_audit.c:
smb_full_audit_disconnect() => do_log() => audit_prefix() =>
talloc_sub_advanced(..., conn->server_info->unix_name, ...)
BUT: conn->server_info == NULL in case of IPC$
I think it's a real reason of SIGSEGV.
Ok, I think I understand the problem here. Give me a little while to prepare a fix (and a torture test for this).
Created attachment 4099 [details]
Patch for 3.3.4.
Ok, here is the (untested) code I think will fix this bug. I will write a torture test to confirm my understanding of the problem. Please apply and test and let me know.
Created attachment 4100 [details]
Better patch for 3.3.4 and above.
Ok, here is the patch I'd like to use in released code. The underlying problem is that once SMBulogoff is called, all server_info contexts associated with the vuid should become invalid, even if that's the context being currently used by the connection struct (tid). When the SMBtdis comes in it doesn't need a valid vuid value, but the code called inside vfs_full_audit always assumes that there is one (and hence a valid conn->server_info pointer) available.
This is actually a bug inside the vfs_full_audit and other code inside Samba, which should only indirect conn->server_info on calls which require AS_USER to be set in our process table. I could fix all these issues, but there's no guarentee that someone might not add more code that fails this assumption, as it's a hard assumption to break (it's usually true).
So what I've done is to ensure that on SMBulogoff the previously used conn->server_info struct is kept around to be used for print debugging purposes (it won't be used to change to an invalid user context, as such calls need AS_USER set). This isn't strictly correct, as there's no association with the (now invalid) context being freed and the call that causes conn->server_info to be indirected, but it's good enough for most cases.
The hard part was to ensure that once a valid context is used again (via new sessionsetupX calls, or new calls on a still valid vuid on this tid) that we don't leak memory by simply replacing the stored conn->server_info pointer. We would never actually leak the memory (as all conn->server_info pointers are talloc children of conn), but with the previous patch a malicious client could cause many server_info structs to be talloced by the right combination of SMB calls. This new patch introduces free_conn_server_info_if_unused(), which protects against the above.
Please test and report back to me. Thanks,
Thank you, Jeremy. I had patched uid.c and problem disappeared.
vfs_full_audit now logs "IPC_|disconnect|ok|IPC$" string instead
of crashing smbd.
Ok, just to confirm - you used the attachment titled "Better patch for 3.3.4 and above" - correct ? Thanks very much for testing.
Yes, I used exactly "Better patch for 3.3.4 and above" with the same configuration. No more file transfer breaks and messages like "kernel: pid 22427 (smbd), uid 11290: exited on signal 11" appear.
Please go ahead and put this into 3.3.5.
Patch is upstream. Will be included in 3.3.5.
Closing out bug report.
Thanks for reporting!