Server is a FreeBSD 11.3 jailed AD member, with the following extract of smb.conf: [global] workgroup=XXXXXXXX realm=XXXXXXXA.local interfaces=em0 hosts allow=192.168.xxx. security=ADS map archive=No kerberos method = secrets and keytab idmap config *:backend = tdb idmap config *:range = 100000-999999 idmap config XXXXXXXX:backend=rid idmap config XXXXXXXX:range = 10000-99999 template homedir = /home/%U winbind use default domain = yes winbind enum users = yes winbind enum groups = yes winbind refresh tickets = Yes winbind expand groups = 4 winbind normalize names = Yes domain master = no local master = no map acl inherit = Yes store dos attributes = Yes unix extensions=no vfs objects=audit audit:facility=LOCAL7 audit:priority=INFO rpc_daemon:mdssd = fork rpc_server:mdssvc = external In smbd.log I see: [2019/12/27 15:27:00.063074, 0] ../../lib/util/fault.c:79(fault_report) =============================================================== [2019/12/27 15:27:00.063178, 0] ../../lib/util/fault.c:80(fault_report) INTERNAL ERROR: Signal 11 in pid 9544 (4.10.11) If you are running a recent Samba version, and if you think this problem is not yet fixed in the latest versions, please consider reporting this bug, see https://wiki.samba.org/index.php/Bug_Reporting [2019/12/27 15:27:00.063221, 0] ../../lib/util/fault.c:86(fault_report) =============================================================== [2019/12/27 15:27:00.063234, 0] ../../source3/lib/util.c:824(smb_panic_s3) PANIC (pid 9544): internal error [2019/12/27 15:27:00.130454, 0] ../../lib/util/fault.c:265(log_stack_trace) BACKTRACE: 6 stack frames: #0 0x8014741ea <log_stack_trace+0x2a> at /usr/local/lib/samba4/libsamba-util.so.0 #1 0x8030f3cfc <smb_panic_s3+0x9c> at /usr/local/lib/samba4/libsmbconf.so.0 #2 0x801473e17 <smb_panic+0x27> at /usr/local/lib/samba4/libsamba-util.so.0 #3 0x8014745dc <log_stack_trace+0x41c> at /usr/local/lib/samba4/libsamba-util.so.0 #4 0x801473de3 <fault_setup+0x73> at /usr/local/lib/samba4/libsamba-util.so.0 #5 0x812b70cf0 <pthread_sigmask+0x530> at /lib/libthr.so.3 I tracked this to a normal disconnection from a MacOS client (possibly being shut down). Full stack trace: #0 0x0000000813a919ba in thr_kill () from /lib/libc.so.7 #1 0x0000000813a91984 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52 #2 0x0000000813a918f9 in abort () at /usr/src/lib/libc/stdlib/abort.c:65 #3 0x00000008030da28a in dump_core () at ../../source3/lib/dumpcore.c:338 #4 0x00000008030f3edd in smb_panic_s3 (why=0x801499c72 "internal error") at ../../source3/lib/util.c:847 #5 0x0000000801473e17 in smb_panic (why=0x801499c72 "internal error") at ../../lib/util/fault.c:174 #6 0x00000008014745dc in fault_report (sig=11) at ../../lib/util/fault.c:88 #7 0x0000000801473de3 in sig_fault (sig=11) at ../../lib/util/fault.c:99 #8 0x0000000812b70cf0 in handle_signal (actp=0x7fffffffcf00, sig=11, info=0x7fffffffd2f0, ucp=0x7fffffffcf80) at /usr/src/lib/libthr/thread/thr_sig.c:246 #9 0x0000000812b702bf in thr_sighandler (sig=11, info=0x7fffffffd2f0, _ucp=0x7fffffffcf80) at /usr/src/lib/libthr/thread/thr_sig.c:189 #10 <signal handler called> #11 0x0000000808439fe9 in smb_signing_sign_pdu (si=0x32fc5a58, outhdr=0x81615dad4 "\377SMB\240", len=71, seqnum=1) at ../../libcli/smb/smb_signing.c:226 #12 0x0000000801cd93bb in srv_calculate_sign_mac (conn=0x816065d60, outbuf=0x81615dad0 "", seqnum=1) at ../../source3/smbd/signing.c:92 #13 0x0000000801cb72c8 in srv_send_smb (xconn=0x816065d60, buffer=0x81615dad0 "", do_signing=true, seqnum=1, do_encrypt=false, pcd=0x81615da18) at ../../source3/smbd/process.c:231 #14 0x0000000801c21378 in send_nt_replies (conn=0x816057ae0, req=0x81615d9a0, nt_error=..., params=0x0, paramsize=0, pdata=0x0, datasize=0) at ../../source3/smbd/nttrans.c:88 #15 0x0000000801c220d7 in smbd_smb1_notify_reply (req=0x81615d9a0, error_code=..., buf=0x0, len=0) at ../../source3/smbd/nttrans.c:1752 #16 0x0000000801d3bbee in change_notify_reply (req=0x81615d9a0, error_code=..., max_param=0, notify_buf=0x0, reply_fn=0x801c22090 <smbd_smb1_notify_reply>) at ../../source3/smbd/notify.c:228 #17 0x0000000801d3d415 in remove_pending_change_notify_requests_by_fid (fsp=0x816066ee0, status=...) at ../../source3/smbd/notify.c:564 #18 0x0000000801c90df0 in close_directory (req=0x0, fsp=0x816066ee0, close_type=SHUTDOWN_CLOSE) at ../../source3/smbd/close.c:1220 #19 0x0000000801c902f1 in close_file (req=0x0, fsp=0x816066ee0, close_type=SHUTDOWN_CLOSE) at ../../source3/smbd/close.c:1253 #20 0x0000000801be9b59 in file_close_conn (conn=0x816057ae0) at ../../source3/smbd/files.c:174 #21 0x0000000801cc9031 in close_cnum (conn=0x816057ae0, vuid=27866) at ../../source3/smbd/service.c:1119 #22 0x0000000801d219cb in smbXsrv_tcon_disconnect (tcon=0x8160913c0, vuid=27866) at ../../source3/smbd/smbXsrv_tcon.c:922 #23 0x0000000801d249d9 in smbXsrv_tcon_disconnect_all_callback (local_rec=0x7fffffffde50, private_data=0x7fffffffdf80) at ../../source3/smbd/smbXsrv_tcon.c:1065 #24 0x000000080802806d in db_rbt_traverse_internal (db=0x81612da60, f=0x801d248a0 <smbXsrv_tcon_disconnect_all_callback>, private_data=0x7fffffffdf80, count=0x7fffffffdee4, rw=true) at ../../lib/dbwrap/dbwrap_rbt.c:455 #25 0x0000000808026bc9 in db_rbt_traverse (db=0x81612da60, f=0x801d248a0 <smbXsrv_tcon_disconnect_all_callback>, private_data=0x7fffffffdf80) at ../../lib/dbwrap/dbwrap_rbt.c:513 #26 0x0000000808022f7e in dbwrap_traverse (db=0x81612da60, f=0x801d248a0 <smbXsrv_tcon_disconnect_all_callback>, private_data=0x7fffffffdf80, count=0x7fffffffdf74) at ../../lib/dbwrap/dbwrap.c:361 #27 0x0000000801d22cee in smbXsrv_tcon_disconnect_all (table=0x8160c0de0, vuid=0) at ../../source3/smbd/smbXsrv_tcon.c:1014 #28 0x0000000801d22c5d in smb1srv_tcon_disconnect_all (client=0x8160471e0) at ../../source3/smbd/smbXsrv_tcon.c:1132 #29 0x0000000801d2a8d6 in exit_server_common (how=SERVER_EXIT_NORMAL, reason=0x801e06fe0 "failed to receive smb request") at ../../source3/smbd/server_exit.c:154 #30 0x0000000801d2aec9 in smbd_exit_server_cleanly (explanation=0x801e06fe0 "failed to receive smb request") at ../../source3/smbd/server_exit.c:252 #31 0x0000000808678bc7 in exit_server_cleanly (reason=0x801e06fe0 "failed to receive smb request") at ../../source3/lib/smbd_shim.c:131 #32 0x0000000801cc45fd in smbd_server_connection_read_handler (xconn=0x816065d60, fd=38) at ../../source3/smbd/process.c:2593 #33 0x0000000801cbe524 in smbd_server_connection_handler (ev=0x816052060, fde=0x81604a760, flags=1, private_data=0x816065d60) at ../../source3/smbd/process.c:2627 #34 0x0000000812d92c9d in tevent_common_invoke_fd_handler () from /usr/local/lib/libtevent.so.0 #35 0x0000000812d95a94 in ?? () from /usr/local/lib/libtevent.so.0 #36 0x0000000812d91ec1 in _tevent_loop_once () from /usr/local/lib/libtevent.so.0 #37 0x0000000812d9214b in tevent_common_loop_wait () from /usr/local/lib/libtevent.so.0 #38 0x0000000801cbf3bb in smbd_process (ev_ctx=0x816052060, msg_ctx=0x81604a300, sock_fd=38, interactive=false) at ../../source3/smbd/process.c:4120 #39 0x0000000001034a49 in smbd_accept_connection (ev=0x816052060, fde=0x81604b560, flags=1, private_data=0x8160902e0) at ../../source3/smbd/server.c:1067 #40 0x0000000812d92c9d in tevent_common_invoke_fd_handler () from /usr/local/lib/libtevent.so.0 #41 0x0000000812d95a94 in ?? () from /usr/local/lib/libtevent.so.0 #42 0x0000000812d91ec1 in _tevent_loop_once () from /usr/local/lib/libtevent.so.0 #43 0x0000000812d9214b in tevent_common_loop_wait () from /usr/local/lib/libtevent.so.0 #44 0x0000000001031656 in smbd_parent_loop (ev_ctx=0x816052060, parent=0x81604a760) at ../../source3/smbd/server.c:1423 #45 0x000000000102f066 in main (argc=3, argv=0x7fffffffecf0) at ../../source3/smbd/server.c:2200 Frame #11 is: void smb_signing_sign_pdu(struct smb_signing_state *si, uint8_t *outhdr, size_t len, uint32_t seqnum) { uint8_t calc_md5_mac[16]; uint8_t com; uint8_t flags; => if (si->mac_key.length == 0) { if (!si->negotiated) { return; } } Here "si" cannot be dereferenced. I suspect this looks similar to https://bugzilla.samba.org/show_bug.cgi?id=10344. However that was closed a long time ago. I do see other strange behaviours, especially with some Mac clients, but don't know if they are related.
I guess the problem is that change_notify_reply() is getting triggered in this shutdown codepath.
I have similar crash on FreeBSD 11.3 after update to samba 4.10: frame #10: 0x00007ffffffff003 * frame #11: 0x0000000801d20e2b libsmbd-base-samba4.so`srv_calculate_sign_mac(conn=0x0000000816ad7960, outbuf="", seqnum=1) at signing.c:92:2 frame #12: 0x0000000801cfe5b8 libsmbd-base-samba4.so`srv_send_smb(xconn=0x0000000816ad7960, buffer="", do_signing=true, seqnum=1, do_encrypt=false, pcd=0x0000000816bb4218) at process.c:231:3 frame #13: 0x0000000801c5e7a8 libsmbd-base-samba4.so`send_nt_replies(conn=0x0000000816a57ae0, req=0x0000000816bb41a0, nt_error=(v = 0), params=0x0000000000000000, paramsize=0, pdata=0x0000000000000000, datasize=0) at nttrans.c:88:8 frame #14: 0x0000000801c5f507 libsmbd-base-samba4.so`smbd_smb1_notify_reply(req=0x0000000816bb41a0, error_code=(v = 0), buf=0x0000000000000000, len=0) at nttrans.c:1752:2 frame #15: 0x0000000801d861ae libsmbd-base-samba4.so`change_notify_reply(req=0x0000000816bb41a0, error_code=(v = 0), max_param=0, notify_buf=0x0000000000000000, reply_fn=(libsmbd-base-samba4.so`smbd_smb1_notify_reply at nttrans.c:1751)) at notify.c: 228:3 frame #16: 0x0000000801d879d5 libsmbd-base-samba4.so`remove_pending_change_notify_requests_by_fid(fsp=0x0000000816bc79e0, status=(v = 0)) at notify.c:564:3 frame #17: 0x0000000801cd7a80 libsmbd-base-samba4.so`close_directory(req=0x0000000000000000, fsp=0x0000000816bc79e0, close_type=SHUTDOWN_CLOSE) at close.c:1220:2 frame #18: 0x0000000801cd6f81 libsmbd-base-samba4.so`close_file(req=0x0000000000000000, fsp=0x0000000816bc79e0, close_type=SHUTDOWN_CLOSE) at close.c:1253:12 frame #19: 0x0000000801c20db9 libsmbd-base-samba4.so`file_close_conn(conn=0x0000000816a57ae0) at files.c:174:3 frame #20: 0x0000000801d105f1 libsmbd-base-samba4.so`close_cnum(conn=0x0000000816a57ae0, vuid=44527) at service.c:1119:2 frame #21: 0x0000000801d6befb libsmbd-base-samba4.so`smbXsrv_tcon_disconnect(tcon=0x0000000816b694c0, vuid=44527) at smbXsrv_tcon.c:922:3 frame #22: 0x0000000801d6ef09 libsmbd-base-samba4.so`smbXsrv_tcon_disconnect_all_callback(local_rec=0x00007fffffffde40, private_data=0x00007fffffffdf70) at smbXsrv_tcon.c:1065:11 frame #23: 0x0000000804b7706d libdbwrap-samba4.so`db_rbt_traverse_internal(db=0x0000000816aa7a60, f=(libsmbd-base-samba4.so`smbXsrv_tcon_disconnect_all_callback at smbXsrv_tcon.c:1037), private_data=0x00007fffffffdf70, count=0x00007fffffffded4, rw=t rue) at dbwrap_rbt.c:455:9 frame #24: 0x0000000804b75bc9 libdbwrap-samba4.so`db_rbt_traverse(db=0x0000000816aa7a60, f=(libsmbd-base-samba4.so`smbXsrv_tcon_disconnect_all_callback at smbXsrv_tcon.c:1037), private_data=0x00007fffffffdf70) at dbwrap_rbt.c:513:8 frame #25: 0x0000000804b71f7e libdbwrap-samba4.so`dbwrap_traverse(db=0x0000000816aa7a60, f=(libsmbd-base-samba4.so`smbXsrv_tcon_disconnect_all_callback at smbXsrv_tcon.c:1037), private_data=0x00007fffffffdf70, count=0x00007fffffffdf64) at dbwrap.c:3 61:12 frame #26: 0x0000000801d6d21e libsmbd-base-samba4.so`smbXsrv_tcon_disconnect_all(table=0x0000000816aa95e0, vuid=0) at smbXsrv_tcon.c:1014:11 frame #27: 0x0000000801d6d18d libsmbd-base-samba4.so`smb1srv_tcon_disconnect_all(client=0x0000000816a471e0) at smbXsrv_tcon.c:1132:9 frame #28: 0x0000000801d74e3c libsmbd-base-samba4.so`exit_server_common(how=SERVER_EXIT_NORMAL, reason="") at server_exit.c:154:12 frame #29: 0x0000000801d75429 libsmbd-base-samba4.so`smbd_exit_server_cleanly(explanation="") at server_exit.c:252:2 frame #30: 0x00000008084ecbc7 libsmbd-shim-samba4.so`exit_server_cleanly(reason="") at smbd_shim.c:131:3 frame #31: 0x0000000801d0bb3d libsmbd-base-samba4.so`smbd_server_connection_read_handler(xconn=0x0000000816ad7960, fd=41) at process.c:2600:3 frame #32: 0x0000000801d058b4 libsmbd-base-samba4.so`smbd_server_connection_handler(ev=0x0000000816a52060, fde=0x0000000816a4aa00, flags=1, private_data=0x0000000816ad7960) at process.c:2634:3 frame #33: 0x0000000813197cad libtevent.so.0`tevent_common_invoke_fd_handler + 141 frame #34: 0x000000081319aade libtevent.so.0`___lldb_unnamed_symbol61$$libtevent.so.0 + 1934 frame #35: 0x0000000813196ed1 libtevent.so.0`_tevent_loop_once + 225 frame #36: 0x000000081319715b libtevent.so.0`tevent_common_loop_wait + 91 frame #37: 0x0000000801d06795 libsmbd-base-samba4.so`smbd_process(ev_ctx=0x0000000816a52060, msg_ctx=0x0000000816a4a300, sock_fd=41, interactive=false) at process.c:4127:8 frame #38: 0x0000000001031199 smbd`smbd_accept_connection(ev=0x0000000816a52060, fde=0x0000000816a4bb80, flags=1, private_data=0x0000000816b69240) at server.c:1067:3 frame #39: 0x0000000813197cad libtevent.so.0`tevent_common_invoke_fd_handler + 141 frame #40: 0x000000081319aade libtevent.so.0`___lldb_unnamed_symbol61$$libtevent.so.0 + 1934 frame #41: 0x0000000813196ed1 libtevent.so.0`_tevent_loop_once + 225 frame #42: 0x000000081319715b libtevent.so.0`tevent_common_loop_wait + 91 frame #43: 0x000000000102dda6 smbd`smbd_parent_loop(ev_ctx=0x0000000816a52060, parent=0x0000000816a4aa00) at server.c:1423:8 frame #44: 0x000000000102b6f6 smbd`main(argc=3, argv=0x00007fffffffecf0) at server.c:2200:2 frame #45: 0x00000000010288c9 smbd`_start + 153 unfortunately frame #10 is broken but it seems to be the same code path, frame #11: frame #11: 0x0000000801d20e2b libsmbd-base-samba4.so`srv_calculate_sign_mac(conn=0x0000000816ad7960, outbuf="", seqnum=1) at signing.c:92:2 89 len = smb_len(outbuf); 90 outhdr = (uint8_t *)outbuf + NBT_HDR_SIZE; 91 -> 92 smb_signing_sign_pdu(conn->smb1.signing_state, outhdr, len, seqnum); 93 } (lldb) p conn->smb1.signing_state (smb_signing_state *) $1 = 0x0000000022f84928 contents of smb1 probably look odd: smb1 = { echo_handler = { socket_lock_fd = 1007 socket_mutex = 0x0000000000000000 trusted_fd = 1553865647 trusted_fde = 0x0000000022f84928 ref_count = 1584526980 } negprot = { encrypted_passwords = true spnego = true auth_context = 0x000000005e71f683 done = true max_recv = 0 } sessions = (done_sesssetup = true, max_send = 0) signing_state = 0x0000000022f84928 unix_info = { client_major = 0 }
(In reply to Marcin Gryszkalis from comment #2) Is this crash reproducible ? If so, can you run it under valgrind to see what is trashing the conn->smb1.signing_state or conn->smb1 struct ? We can always avoid this by adding a quick hack into the close path that avoids doing the remove_pending_change_notify_requests_by_fid() call if it's invoked with SHUTDOWN_CLOSE. But I recently put a lot of work into making async SHUTDOWN_CLOSE more reliable in many code paths, so I'd like to avoid adding more "hack" solutions if possible. If we can catch this in the act, we can find the memory corruption I think.
(In reply to Jeremy Allison from comment #3) > Is this crash reproducible ? I got 3-5 crashes a day on not so busy server (5-20 users connected). So I guess it's not for every disconnect - and I can't see anything common for sessions. > If so, can you run it under valgrind to see what is trashing the > conn->smb1.signing_state or conn->smb1 struct ? Never did that, is it possible to run smb under valgrind on prod? > We can always avoid this by adding a quick hack into the close path that > avoids doing the remove_pending_change_notify_requests_by_fid() call if > it's invoked with SHUTDOWN_CLOSE. > But I recently put a lot of work into making async SHUTDOWN_CLOSE more > reliable in many code paths, so I'd like to avoid adding more "hack" > solutions if possible. > If we can catch this in the act, we can find the memory corruption I think. sure
Yes, valgrind runs on any binary (it's a cpu-emulator). It would give better information if the symbols for the binary are loaded/not stripped however. The crash is informative - something wrote over the struct smb_signing_state *si in conn->smb1.signing_state. But whether that is due to accessing a freed object out of order (after the free) or because something else overwrote it on teardown isn't yet clear. That's what I'm trying to figure out.
The code that tears down existing file handles is inside smbXsrv_tcon_disconnect(), and looks like: if (tcon->compat) { bool ok; ok = chdir_current_service(tcon->compat); if (!ok) { status = NT_STATUS_INTERNAL_ERROR; DEBUG(0, ("smbXsrv_tcon_disconnect(0x%08x, '%s'): " "chdir_current_service() failed: %s\n", tcon->global->tcon_global_id, tcon->global->share_name, nt_errstr(status))); tcon->compat = NULL; return status; } close_cnum(tcon->compat, vuid); So everything should still be present (including tcon->compat->smb1.si) at this point, which is why the crash is confusing.
I'll try with the valgrind and get back with info. btw, did you notice that value at smb1.signing_state is the same as trusted_fde = 0x0000000022f84928?
No I didn't. That's strange, as struct smbXsrv_connection { has (removing comments).. struct { struct { int socket_lock_fd; pthread_mutex_t *socket_mutex; int trusted_fd; struct tevent_fd *trusted_fde; int ref_count; } echo_handler; .. struct smb_signing_state *signing_state; .. } smb1; but it's a set of structs, not a union, so signing_state should not be overlaying smb1.echo_handler.trusted_fde.
I need to reboot the box because valgrind on FreeBSD 11+ requires COMPAT_FREEBSD10 kernel option. Reboot scheduled for saturday.
I was running on memcheck for a day and I got 1.5GB log but I'm not sure if it's worth anything :) There are several reports regarding disconnect (I'll split the comment and add more info below) ==70392== 2 bytes in 1 blocks are still reachable in loss record 2 of 555 ==70392== at 0x4C265B4: malloc (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so) ==70392== by 0x173495C0: strdup (in /lib/libc.so.7) ==70392== by 0x55A4249: vfs_ChDir (vfs.c:857) ==70392== by 0x55D182B: close_cnum (service.c:1134) ==70392== by 0x562CEFA: smbXsrv_tcon_disconnect (smbXsrv_tcon.c:922) ==70392== by 0x553A650: reply_tdis (reply.c:5752) ==70392== by 0x55C2E5C: switch_message (process.c:1716) ==70392== by 0x55C905D: construct_reply (process.c:1752) ==70392== by 0x55C8BFB: process_smb (process.c:2007) ==70392== by 0x55CCB9D: smbd_server_connection_read_handler (process.c:2607) ==70392== by 0x55C68B3: smbd_server_connection_handler (process.c:2634) ==70392== by 0x16451CAC: tevent_common_invoke_fd_handler (in /usr/local/lib/libtevent.so.0) ==79371== 2 bytes in 1 blocks are still reachable in loss record 2 of 563 ==79371== at 0x4C265B4: malloc (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so) ==79371== by 0x173495C0: strdup (in /lib/libc.so.7) ==79371== by 0x55A4249: vfs_ChDir (vfs.c:857) ==79371== by 0x55D182B: close_cnum (service.c:1134) ==79371== by 0x562CEFA: smbXsrv_tcon_disconnect (smbXsrv_tcon.c:922) ==79371== by 0x562FF08: smbXsrv_tcon_disconnect_all_callback (smbXsrv_tcon.c:1065) ==79371== by 0x843806C: db_rbt_traverse_internal (dbwrap_rbt.c:455) ==79371== by 0x8436BC8: db_rbt_traverse (dbwrap_rbt.c:513) ==79371== by 0x8432F7D: dbwrap_traverse (dbwrap.c:361) ==79371== by 0x562E21D: smbXsrv_tcon_disconnect_all (smbXsrv_tcon.c:1014) ==79371== by 0x562E18C: smb1srv_tcon_disconnect_all (smbXsrv_tcon.c:1132) ==79371== by 0x5635E3B: exit_server_common (server_exit.c:154) ==64403== 2 errors in context 2 of 362: ==64403== Invalid read of size 4 ==64403== at 0x55BF56F: srv_send_smb (process.c:220) ==64403== by 0x551F7A7: send_nt_replies (nttrans.c:88) ==64403== by 0x5520506: smbd_smb1_notify_reply (nttrans.c:1752) ==64403== by 0x56471AD: change_notify_reply (notify.c:228) ==64403== by 0x56489D4: remove_pending_change_notify_requests_by_fid (notify.c:564) ==64403== by 0x5598A7F: close_directory (close.c:1220) ==64403== by 0x5597F80: close_file (close.c:1253) ==64403== by 0x54E1DB8: file_close_conn (files.c:174) ==64403== by 0x55D15F0: close_cnum (service.c:1119) ==64403== by 0x562CEFA: smbXsrv_tcon_disconnect (smbXsrv_tcon.c:922) ==64403== by 0x562FF08: smbXsrv_tcon_disconnect_all_callback (smbXsrv_tcon.c:1065) ==64403== by 0x843806C: db_rbt_traverse_internal (dbwrap_rbt.c:455) ==64403== Address 0x1f617b08 is 152 bytes inside a block of size 560 free'd ==64403== at 0x4C27A2C: free (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so) ==64403== by 0x16664D1D: ??? (in /usr/local/lib/libtalloc.so.2) ==64403== by 0x5635D75: exit_server_common (server_exit.c:132) ==64403== by 0x5636428: smbd_exit_server_cleanly (server_exit.c:252) ==64403== by 0xBBB2BC6: exit_server_cleanly (smbd_shim.c:131) ==64403== by 0x55CCB3C: smbd_server_connection_read_handler (process.c:2600) ==64403== by 0x55C68B3: smbd_server_connection_handler (process.c:2634) ==64403== by 0x16451CAC: tevent_common_invoke_fd_handler (in /usr/local/lib/libtevent.so.0) ==64403== by 0x16454ADD: ??? (in /usr/local/lib/libtevent.so.0) ==64403== by 0x16450ED0: _tevent_loop_once (in /usr/local/lib/libtevent.so.0) ==64403== by 0x1645115A: tevent_common_loop_wait (in /usr/local/lib/libtevent.so.0) ==64403== by 0x55C7794: smbd_process (process.c:4127) ==51283== Invalid read of size 4 ==51283== at 0x551F67F: send_nt_replies (nttrans.c:73) ==51283== by 0x5520506: smbd_smb1_notify_reply (nttrans.c:1752) ==51283== by 0x56471AD: change_notify_reply (notify.c:228) ==51283== by 0x56489D4: remove_pending_change_notify_requests_by_fid (notify.c:564) ==51283== by 0x5598A7F: close_directory (close.c:1220) ==51283== by 0x5597F80: close_file (close.c:1253) ==51283== by 0x54E1DB8: file_close_conn (files.c:174) ==51283== by 0x55D15F0: close_cnum (service.c:1119) ==51283== by 0x562CEFA: smbXsrv_tcon_disconnect (smbXsrv_tcon.c:922) ==51283== by 0x562FF08: smbXsrv_tcon_disconnect_all_callback (smbXsrv_tcon.c:1065) ==51283== by 0x843806C: db_rbt_traverse_internal (dbwrap_rbt.c:455) ==51283== by 0x8436BC8: db_rbt_traverse (dbwrap_rbt.c:513) ==51283== Address 0x1f6332a4 is 244 bytes inside a block of size 560 free'd ==51283== at 0x4C27A2C: free (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so) ==51283== by 0x16664D1D: ??? (in /usr/local/lib/libtalloc.so.2) ==51283== by 0x5635D75: exit_server_common (server_exit.c:132) ==51283== by 0x5636428: smbd_exit_server_cleanly (server_exit.c:252) ==51283== by 0xBBB2BC6: exit_server_cleanly (smbd_shim.c:131) ==51283== by 0x55CCB3C: smbd_server_connection_read_handler (process.c:2600) ==51283== by 0x55C68B3: smbd_server_connection_handler (process.c:2634) ==51283== by 0x16451CAC: tevent_common_invoke_fd_handler (in /usr/local/lib/libtevent.so.0) ==51283== by 0x16454ADD: ??? (in /usr/local/lib/libtevent.so.0) ==51283== by 0x16450ED0: _tevent_loop_once (in /usr/local/lib/libtevent.so.0) ==51283== by 0x1645115A: tevent_common_loop_wait (in /usr/local/lib/libtevent.so.0) ==51283== by 0x55C7794: smbd_process (process.c:4127)
One - maybe interesting memcpy overlap ==75554== 2 errors in context 3 of 361: ==75554== Source and destination overlap in memcpy(0x1f8fb330, 0x1f8fb330, 96) ==75554== at 0x4C2B384: memcpy (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so) ==75554== by 0x547F608: del_share_mode (locking.c:894) ==75554== by 0x559898C: close_directory (close.c:1212) ==75554== by 0x5597F80: close_file (close.c:1253) ==75554== by 0x5538D2F: reply_close (reply.c:5450) ==75554== by 0x55C2E5C: switch_message (process.c:1716) ==75554== by 0x55C905D: construct_reply (process.c:1752) ==75554== by 0x55C8BFB: process_smb (process.c:2007) ==75554== by 0x55CCB9D: smbd_server_connection_read_handler (process.c:2607) ==75554== by 0x55C68B3: smbd_server_connection_handler (process.c:2634) ==75554== by 0x16451CAC: tevent_common_invoke_fd_handler (in /usr/local/lib/libtevent.so.0) ==75554== by 0x16454ADD: ??? (in /usr/local/lib/libtevent.so.0)
Most important - it never crashed with valgrind attached like it did before. I don't know if it's important but I also get many unhandled syscalls: 27 WARNING: unhandled syscall: 530 265 WARNING: unhandled syscall: 547 489 WARNING: unhandled syscall: 356 31840 WARNING: unhandled syscall: 437 256385 WARNING: unhandled syscall: 438 836500 WARNING: unhandled syscall: 357 I can put compressed log for download if you need it or I can run valgrind with different options. I used it like this: /usr/local/bin/valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes --verbose --log-file=/tmp/valgrind-out-1585339433.txt /usr/local/sbin/smbd --configfile=/usr/local/etc/smb4.conf
(In reply to Marcin Gryszkalis from comment #11) The memcpy overwrite is a false positive here. The code in del_share_mode() looks like: bool del_share_mode(struct share_mode_lock *lck, files_struct *fsp) { struct share_mode_entry *e; e = find_share_mode_entry(lck, fsp); if (e == NULL) { return False; } remove_share_mode_lease(lck->data, e); *e = lck->data->share_modes[lck->data->num_share_modes-1]; if find_share_mode_entry(lck, fsp) returns a pointer to &lck->data->share_modes[lck->data->num_share_modes-1] (which it can if the share mode to be removed is the last entry in the array) then it will be the equivalent of: memcpy(&p, &p, sizeof(struct share_mode_entry)); which should have no effect.
Actually the key is in this backtrace: ==64403== 2 errors in context 2 of 362: ==64403== Invalid read of size 4 ==64403== at 0x55BF56F: srv_send_smb (process.c:220) ==64403== by 0x551F7A7: send_nt_replies (nttrans.c:88) ==64403== by 0x5520506: smbd_smb1_notify_reply (nttrans.c:1752) ==64403== by 0x56471AD: change_notify_reply (notify.c:228) ==64403== by 0x56489D4: remove_pending_change_notify_requests_by_fid (notify.c:564) ==64403== by 0x5598A7F: close_directory (close.c:1220) ==64403== by 0x5597F80: close_file (close.c:1253) ==64403== by 0x54E1DB8: file_close_conn (files.c:174) ==64403== by 0x55D15F0: close_cnum (service.c:1119) ==64403== by 0x562CEFA: smbXsrv_tcon_disconnect (smbXsrv_tcon.c:922) ==64403== by 0x562FF08: smbXsrv_tcon_disconnect_all_callback (smbXsrv_tcon.c:1065) ==64403== by 0x843806C: db_rbt_traverse_internal (dbwrap_rbt.c:455) ==64403== Address 0x1f617b08 is 152 bytes inside a block of size 560 free'd ==64403== at 0x4C27A2C: free (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so) ==64403== by 0x16664D1D: ??? (in /usr/local/lib/libtalloc.so.2) ==64403== by 0x5635D75: exit_server_common (server_exit.c:132) ==64403== by 0x5636428: smbd_exit_server_cleanly (server_exit.c:252) ==64403== by 0xBBB2BC6: exit_server_cleanly (smbd_shim.c:131) ==64403== by 0x55CCB3C: smbd_server_connection_read_handler (process.c:2600) ==64403== by 0x55C68B3: smbd_server_connection_handler (process.c:2634)
Created attachment 15880 [details] Test patch for 4.10. Test (raw) patch for 4.10. Shutdown xconn in two passes. Firstly sets the xconn->transport.status = NT_STATUS_INTERNAL_ERROR or NT_STATUS_LOCAL_DISCONNECT, and leaves the connection in place. *After* we call smb1srv_tcon_disconnect_all() (which may send the SMB1 changenofity pending replies which need the signing state) *then* we can walk the xconn list and call TALLOC_FREE(xconn). Can you test this on 4.10 ? I'm pretty sure this will fix the bug ! I'll look to see if this is still an issue in 4.11 and above (I'm pretty sure it is). Jeremy.
Created attachment 15881 [details] Test patch that might work :-). Version #2 - actually goes through the xconn list at the right point in server_exit. Please test !
Aha ! This is actually a duplicate of bug: https://bugzilla.samba.org/show_bug.cgi?id=14064 This already got fixed by Volker for 4.11, but it didn't get back-ported to 4.10. I'm closing this one as my patch essentially duplicates the work Volker already did for #14064 (only his patch was better :-). The patch for #14064 will apply to 4.10. Let me know if not and I'll add a back-ported version to #14064. *** This bug has been marked as a duplicate of bug 14064 ***
both patches applied cleanly, now building samba 4.10 with Volker's patch.
I just gut two crashed on patched samba but on SIGABRT, don't know if the are related... (lldb) target create "smbd" --core "0-smbd-77253.core" Core file '/var/tmp/0-smbd-77253.core' (x86_64) was loaded. (lldb) bt * thread #1, name = 'smbd', stop reason = signal SIGABRT * frame #0: 0x0000000813e9696a libc.so.7`__sys_thr_kill + 10 frame #1: 0x0000000813e96934 libc.so.7`raise + 52 frame #2: 0x0000000813e968a9 libc.so.7`abort + 73 frame #3: 0x000000080337e9aa libsmbconf.so.0`dump_core at dumpcore.c:338:2 frame #4: 0x0000000803354f6d libsmbconf.so.0`smb_panic_s3(why="") at util.c:847:2 frame #5: 0x000000080148df97 libsamba-util.so.0`smb_panic(why="") at fault.c:174:3 frame #6: 0x000000080148e75c libsamba-util.so.0`fault_report(sig=11) at fault.c:88:2 frame #7: 0x000000080148df63 libsamba-util.so.0`sig_fault(sig=11) at fault.c:99:2 frame #8: 0x000000081317dcf0 libthr.so.3`___lldb_unnamed_symbol101$$libthr.so.3 + 208 frame #9: 0x000000081317d2bf libthr.so.3`___lldb_unnamed_symbol82$$libthr.so.3 + 319 frame #10: 0x00007ffffffff003 frame #11: 0x0000000801a7213d libsmbd-base-samba4.so`call_trans2qfilepathinfo(conn=0x0000000816857ae0, req=0x00000008169a0280, tran_call=5, pparams=0x000000081c828520, total_params=236, ppdata=0x000000081c828530, total_data=0, max_data_bytes=528) at trans2.c:6154:11 frame #12: 0x0000000801a67a1b libsmbd-base-samba4.so`handle_trans2(conn=0x0000000816857ae0, req=0x00000008169a0280, state=0x000000081c8284c0) at trans2.c:9487:3 frame #13: 0x0000000801a66b0c libsmbd-base-samba4.so`reply_trans2(req=0x00000008169a0280) at trans2.c:9728:3 frame #14: 0x0000000801abedcd libsmbd-base-samba4.so`switch_message(type='2', req=0x00000008169a0280) at process.c:1716:2 frame #15: 0x0000000801ac4fce libsmbd-base-samba4.so`construct_reply(xconn=0x00000008168d7960, inbuf=0x0000000000000000, size=308, unread_bytes=0, seqnum=0, encrypted=false, deferred_pcd=0x0000000000000000) at process.c:1752:14 frame #16: 0x0000000801ac4b6c libsmbd-base-samba4.so`process_smb(xconn=0x00000008168d7960, inbuf="", nread=308, unread_bytes=0, seqnum=0, encrypted=false, deferred_pcd=0x0000000000000000) at process.c:2007:3 frame #17: 0x0000000801ac8b0e libsmbd-base-samba4.so`smbd_server_connection_read_handler(xconn=0x00000008168d7960, fd=43) at process.c:2607:2 frame #18: 0x0000000801ac2824 libsmbd-base-samba4.so`smbd_server_connection_handler(ev=0x0000000816852060, fde=0x000000081684aa00, flags=1, private_data=0x00000008168d7960) at process.c:2634:3 frame #19: 0x00000008139b6cad libtevent.so.0`tevent_common_invoke_fd_handler + 141 frame #20: 0x00000008139b9ade libtevent.so.0`___lldb_unnamed_symbol61$$libtevent.so.0 + 1934 frame #21: 0x00000008139b5ed1 libtevent.so.0`_tevent_loop_once + 225 frame #22: 0x00000008139b615b libtevent.so.0`tevent_common_loop_wait + 91 frame #23: 0x0000000801ac3705 libsmbd-base-samba4.so`smbd_process(ev_ctx=0x0000000816852060, msg_ctx=0x000000081684a300, sock_fd=43, interactive=false) at process.c:4127:8 frame #24: 0x00000000010306a9 smbd`smbd_accept_connection(ev=0x0000000816852060, fde=0x000000081684baa0, flags=1, private_data=0x00000008169691a0) at server.c:1067:3 frame #25: 0x00000008139b6cad libtevent.so.0`tevent_common_invoke_fd_handler + 141 frame #26: 0x00000008139b9ade libtevent.so.0`___lldb_unnamed_symbol61$$libtevent.so.0 + 1934 frame #27: 0x00000008139b5ed1 libtevent.so.0`_tevent_loop_once + 225 frame #28: 0x00000008139b615b libtevent.so.0`tevent_common_loop_wait + 91 frame #29: 0x000000000102d2b6 smbd`smbd_parent_loop(ev_ctx=0x0000000816852060, parent=0x000000081684aa00) at server.c:1423:8 frame #30: 0x000000000102ac06 smbd`main(argc=3, argv=0x00007fffffffecf8) at server.c:2200:2 frame #31: 0x00000000010288c9 smbd`_start + 153 -> 6154 status = smbd_do_qfilepathinfo(conn, req, info_level, 6155 fsp, smb_fname, 6156 delete_pending, write_time_ts, 6157 ea_list, (lldb) p *conn (connection_struct) $0 = { next = 0x0000000000000000 prev = 0x0000000816857f60 sconn = 0x0000000816857660 tcon = 0x000000081696a640 cnum = 38149 params = 0x000000081687a610 force_user = false vuid_cache = 0x000000081686c8e0 printer = false ipc = false read_only = false share_access = 2032127 ts_res = TIMESTAMP_SET_NT_OR_BETTER connectpath = 0x0000000816902ae0 ""***anon***" origpath = 0x00000008168f1f60 "***anon***" cwd_fname = 0x00000008169a6400 tcon_done = true vfs_handles = 0x000000081696adc0 session_info = 0x0000000816849d60 force_group_gid = 1007 vuid = 32730 lastused = 1585653046 lastused_count = 1585654406 num_files_open = 3 num_smb_operations = 1361 encrypt_level = -1 encrypted_tid = false case_sensitive = false case_preserve = true short_case_preserve = true fs_capabilities = 3 base_share_dev = 2449487824 hide_list = 0x0000000000000000 veto_list = 0x0000000000000000 veto_oplock_list = 0x0000000000000000 aio_write_behind_list = 0x0000000000000000 pending_trans = 0x0000000000000000 spoolss_pipe = 0x0000000000000000 (lldb) p *req (smb_request) $2 = { cmd = '2' flags2 = 51271 smbpid = 4824 mid = 23872 seqnum = 0 vuid = 32730 tid = 38149 wct = '\x0f' vwv = 0x00000008169ea105 buflen = 233 buf = 0x00000008169ea125 <no value available> inbuf = 0x00000008169ea0e0 <no value available> outbuf = 0x0000000000000000 <no value available> unread_bytes = 0 encrypted = false conn = 0x0000000816857de0 sconn = 0x0000000816857660 xconn = 0x00000008168d7960 pcd = { handlers = 0x0000000000000000 context = 0x0000000000000000 } chain_fsp = 0x0000000000000000 async_priv = 0x0000000000000141 smb2req = 0x0000000000000000 priv_paths = 0x0000000000000000 chain = 0x0000000000000000 request_time = (tv_sec = 1585653082, tv_usec = 129278) posix_pathnames = false
That's hitting an smb_panic. Did you fully clean the tree before re-making ?
After full rebuild no crashes so far, I'll let you know if anything changes. Thanks for support.
I applied this patch and, after a few days, I can say it introduced no problem. I can't say it solved anything, however, since I was not getting this kind of crash anymore. I even tried to reproduce them in the lab, but I was not able to. Since a couple of months, it seems the previous crashes were replaced by the following kind: BACKTRACE: 28 stack frames: #0 0x80147b647 <log_stack_trace+0x37> at /usr/local/lib/samba4/libsamba-util.so.0 #1 0x804269afd <smb_panic_s3+0x4d> at /usr/local/lib/samba4/libsmbconf.so.0 #2 0x80147b437 <smb_panic+0x17> at /usr/local/lib/samba4/libsamba-util.so.0 #3 0x801994a83 <remove_pending_change_notify_requests_by_fid+0x243> at /usr/local/lib/samba4/private/libsmbd-base-samba4.so #4 0x80197d437 <smbd_smb2_request_process_notify+0x6e7> at /usr/local/lib/samba4/private/libsmbd-base-samba4.so #5 0x812dc8c0f <talloc_set_memlimit+0x8df> at /usr/local/lib/libtalloc.so.2 #6 0x812bb4ddd <tevent_req_received+0x8d> at /usr/local/lib/libtevent.so.0 #7 0x812bb4a39 <_tevent_req_create+0x279> at /usr/local/lib/libtevent.so.0 #8 0x812dc8c0f <talloc_set_memlimit+0x8df> at /usr/local/lib/libtalloc.so.2 #9 0x812dc8acf <talloc_set_memlimit+0x79f> at /usr/local/lib/libtalloc.so.2 #10 0x812dc8acf <talloc_set_memlimit+0x79f> at /usr/local/lib/libtalloc.so.2 #11 0x80198a56c <smbd_exit_server+0x17c> at /usr/local/lib/samba4/private/libsmbd-base-samba4.so #12 0x80198a87e <smbd_exit_server_cleanly+0xe> at /usr/local/lib/samba4/private/libsmbd-base-samba4.so #13 0x809d34a69 <exit_server_cleanly+0x19> at /usr/local/lib/samba4/private/libsmbd-shim-samba4.so #14 0x8019656de <smbd_server_connection_terminate_ex+0x11e> at /usr/local/lib/samba4/private/libsmbd-base-samba4.so #15 0x80196ad1f <smbd_smb2_process_negprot+0x1a0f> at /usr/local/lib/samba4/private/libsmbd-base-samba4.so #16 0x812bb3cad <tevent_common_invoke_fd_handler+0x8d> at /usr/local/lib/libtevent.so.0 #17 0x812bb6ade <tevent_context_same_loop+0xd4e> at /usr/local/lib/libtevent.so.0 #18 0x812bb2ed1 <_tevent_loop_once+0xe1> at /usr/local/lib/libtevent.so.0 #19 0x812bb315b <tevent_common_loop_wait+0x5b> at /usr/local/lib/libtevent.so.0 #20 0x801957497 <smbd_process+0x877> at /usr/local/lib/samba4/private/libsmbd-base-samba4.so #21 0x102ec2f <main+0x446f> at /usr/local/sbin/smbd #22 0x812bb3cad <tevent_common_invoke_fd_handler+0x8d> at /usr/local/lib/libtevent.so.0 #23 0x812bb6ade <tevent_context_same_loop+0xd4e> at /usr/local/lib/libtevent.so.0 #24 0x812bb2ed1 <_tevent_loop_once+0xe1> at /usr/local/lib/libtevent.so.0 #25 0x812bb315b <tevent_common_loop_wait+0x5b> at /usr/local/lib/libtevent.so.0 #26 0x102d08f <main+0x28cf> at /usr/local/sbin/smbd #27 0x102c5cf <main+0x1e0f> at /usr/local/sbin/smbd I'm not sure this is related, however: it speaks about notifications, but I'm not that exper. If anyone has any insight, perhaps [s]he could suggest whether I should open a different report. Thanks.
Unfortunately that back trace only hints at the problem. There is stack corruption here as there are no callers of talloc_set_memlimit within Samba (only in test code). Is there any way you can catch this in the act with a panic action script and get a good backtrace ?