Just found a couple of core dumps from smbd yesterday on one of my servers that seems to have occured at the same time. Unknown if they are related but since they happend at the same time... Core 1: (gdb) bt #0 0x0000000804cb4c2a in thr_kill () from /lib/libc.so.7 #1 0x0000000804cb3084 in raise () from /lib/libc.so.7 #2 0x0000000804c29279 in abort () from /lib/libc.so.7 #3 0x0000000802c64bde in dump_core () at ../../source3/lib/dumpcore.c:338 #4 0x0000000802c733bd in smb_panic_s3 (why=<optimized out>) at ../../source3/lib/util.c:704 #5 0x0000000801b4f590 in smb_panic (why=why@entry=0x7fffffffd510 "Signal 11: Segmentation fault") at ../../lib/util/fault.c:197 #6 0x0000000801b4f611 in fault_report (sig=11) at ../../lib/util/fault.c:81 #7 sig_fault (sig=11) at ../../lib/util/fault.c:92 #8 0x000000080869cb70 in ?? () from /lib/libthr.so.3 #9 0x000000080869c13f in ?? () from /lib/libthr.so.3 #10 <signal handler called> #11 openat_pathref_fsp (dirfsp=dirfsp@entry=0x0, smb_fname=smb_fname@entry=0x8171e6f80) at ../../source3/smbd/files.c:442 #12 0x00000008016bc974 in synthetic_pathref (mem_ctx=<optimized out>, dirfsp=dirfsp@entry=0x0, base_name=base_name@entry=0x817853058 "$RECYCLE.BIN", stream_name=stream_name@entry=0x0, psbuf=psbuf@entry=0x817407b98, twrp=twrp@entry=0, flags=0, _smb_fname=0x7fffffffe130) at ../../source3/smbd/files.c:671 #13 0x000000080171ef0c in rmdir_internals (fsp=0x817177a60, ctx=0x8178f8880) at ../../source3/smbd/close.c:1112 #14 close_directory (close_type=NORMAL_CLOSE, fsp=0x817177a60, req=0x81766dc60) at ../../source3/smbd/close.c:1383 #15 close_file (req=req@entry=0x81766dc60, fsp=fsp@entry=0x817177a60, close_type=close_type@entry=NORMAL_CLOSE) at ../../source3/smbd/close.c:1507 #16 0x0000000801758ffb in smbd_smb2_close (req=req@entry=0x8171e7260, fsp=0x817177a60, in_flags=<optimized out>, out_flags=0x81007ae52, out_creation_ts=0x81007ae58, out_last_access_ts=0x81007ae68, out_last_write_ts=0x81007ae78, out_change_ts=0x81007ae88, out_allocation_size=0x81007ae98, out_end_of_file=0x81007aea0, out_file_attributes=0x81007aea8) at ../../source3/smbd/smb2_close.c:277 #17 0x000000080175942c in smbd_smb2_close_send (in_flags=<optimized out>, in_fsp=0x817177a60, smb2req=0x8171e7260, ev=<optimized out>, mem_ctx=0x8171e7260) at ../../source3/smbd/smb2_close.c:393 #18 smbd_smb2_request_process_close (req=req@entry=0x8171e7260) at ../../source3/smbd/smb2_close.c:72 #19 0x000000080174d314 in smbd_smb2_request_dispatch (req=req@entry=0x8171e7260) at ../../source3/smbd/smb2_server.c:3400 #20 0x000000080174dfdf in smbd_smb2_io_handler (fde_flags=<optimized out>, xconn=0x8100e74e0) at ../../source3/smbd/smb2_server.c:5003 #21 smbd_smb2_connection_handler (ev=<optimized out>, fde=<optimized out>, flags=<optimized out>, private_data=<optimized out>) at ../../source3/smbd/smb2_server.c:5041 #22 0x000000080260c82d in tevent_common_invoke_fd_handler (fde=fde@entry=0x8100b2ac0, flags=<optimized out>, removed=removed@entry=0x0) at ../../lib/tevent/tevent_fd.c:142 #23 0x000000080260f00f in poll_event_loop_poll (tvalp=0x7fffffffe580, ev=0x8100b5060) at ../../lib/tevent/tevent_poll.c:569 #24 poll_event_loop_once (ev=0x8100b5060, location=<optimized out>) at ../../lib/tevent/tevent_poll.c:626 #25 0x000000080260bde4 in _tevent_loop_once (ev=ev@entry=0x8100b5060, location=location@entry=0x8018854b8 "../../source3/smbd/process.c:4246") at ../../lib/tevent/tevent.c:790 #26 0x000000080260bfd7 in tevent_common_loop_wait (ev=0x8100b5060, location=0x8018854b8 "../../source3/smbd/process.c:4246") at ../../lib/tevent/tevent.c:913 #27 0x000000080260c039 in _tevent_loop_wait (ev=ev@entry=0x8100b5060, location=location@entry=0x8018854b8 "../../source3/smbd/process.c:4246") at ../../lib/tevent/tevent.c:932 #28 0x000000080173b054 in smbd_process (ev_ctx=ev_ctx@entry=0x8100b5060, msg_ctx=msg_ctx@entry=0x8100ae140, dce_ctx=dce_ctx@entry=0x81009eda0, sock_fd=sock_fd@entry=50, interactive=interactive@entry=false) at ../../source3/smbd/process.c:4246 #29 0x000000000102e7c2 in smbd_accept_connection (ev=0x8100b5060, fde=<optimized out>, flags=<optimized out>, private_data=<optimized out>) at ../../source3/smbd/server.c:1021 #30 0x000000080260c82d in tevent_common_invoke_fd_handler (fde=fde@entry=0x8100b24a0, flags=<optimized out>, removed=removed@entry=0x0) at ../../lib/tevent/tevent_fd.c:142 #31 0x000000080260f00f in poll_event_loop_poll (tvalp=0x7fffffffe820, ev=0x8100b5060) at ../../lib/tevent/tevent_poll.c:569 #32 poll_event_loop_once (ev=0x8100b5060, location=<optimized out>) at ../../lib/tevent/tevent_poll.c:626 #33 0x000000080260bde4 in _tevent_loop_once (ev=ev@entry=0x8100b5060, location=location@entry=0x1036e68 "../../source3/smbd/server.c:1365") at ../../lib/tevent/tevent.c:790 #34 0x000000080260bfd7 in tevent_common_loop_wait (ev=0x8100b5060, location=0x1036e68 "../../source3/smbd/server.c:1365") at ../../lib/tevent/tevent.c:913 (gdb) frame 12 #12 0x00000008016bc974 in synthetic_pathref (mem_ctx=<optimized out>, dirfsp=dirfsp@entry=0x0, base_name=base_name@entry=0x817853058 "$RECYCLE.BIN", stream_name=stream_name@entry=0x0, psbuf=psbuf@entry=0x817407b98, twrp=twrp@entry=0, flags=0, _smb_fname=0x7fffffffe130) at ../../source3/smbd/files.c:671 671 status = openat_pathref_fsp(dirfsp, smb_fname); (gdb) print dirfsp $1 = (struct files_struct *) 0x0 (gdb) print smb_fname $2 = (struct smb_filename *) 0x8171e6f80 (gdb) print *smb_fname $3 = {base_name = 0x8171e70c0 "$RECYCLE.BIN", stream_name = 0x0, flags = 0, st = {st_ex_dev = 7658512858151762472, st_ex_ino = 1798, st_ex_file_id = 1798, st_ex_mode = 16832, st_ex_nlink = 2, st_ex_uid = 1000094230, st_ex_gid = 100001003, st_ex_rdev = 18446744073709551615, st_ex_size = 0, st_ex_atime = { tv_sec = 1635769586, tv_nsec = 417923000}, st_ex_mtime = {tv_sec = 1635769586, tv_nsec = 427618000}, st_ex_ctime = {tv_sec = 1635769586, tv_nsec = 427618000}, st_ex_btime = {tv_sec = 1635769586, tv_nsec = 417923000}, st_ex_itime = {tv_sec = 1635769586, tv_nsec = 417923000}, st_ex_blksize = 131072, st_ex_blocks = 1, st_ex_flags = 2048, st_ex_iflags = 6}, twrp = 0, fsp = 0x0, fsp_link = 0x0} Core 2: (gdb) bt #0 0x0000000804cb4c2a in thr_kill () from /lib/libc.so.7 #1 0x0000000804cb3084 in raise () from /lib/libc.so.7 #2 0x0000000804c29279 in abort () from /lib/libc.so.7 #3 0x0000000802c64bde in dump_core () at ../../source3/lib/dumpcore.c:338 #4 0x0000000802c733bd in smb_panic_s3 (why=<optimized out>) at ../../source3/lib/util.c:704 #5 0x0000000801b4f590 in smb_panic (why=why@entry=0x805385408 "assert failed: rec->value_valid") at ../../lib/util/fault.c:197 #6 0x00000008053813b8 in dbwrap_record_get_value (rec=<optimized out>) at ../../lib/dbwrap/dbwrap.c:82 #7 0x00000008017698c8 in smbXsrv_client_global_store (global=0x8100ab360) at ../../source3/smbd/smbXsrv_client.c:367 #8 smb2srv_client_mc_negprot_next (req=req@entry=0x8100ef500) at ../../source3/smbd/smbXsrv_client.c:497 #9 0x000000080176b35d in smb2srv_client_mc_negprot_send (mem_ctx=mem_ctx@entry=0x810112680, ev=0x8100b5060, smb2req=smb2req@entry=0x8100d1250) at ../../source3/smbd/smbXsrv_client.c:451 #10 0x00000008017509fe in smbd_smb2_request_process_negprot (req=req@entry=0x8100d1250) at ../../source3/smbd/smb2_negprot.c:801 #11 0x000000080174d2c0 in smbd_smb2_request_dispatch (req=req@entry=0x8100d1250) at ../../source3/smbd/smb2_server.c:3360 #12 0x000000080174f463 in smbd_smb2_process_negprot (xconn=xconn@entry=0x8100e74e0, expected_seq_low=expected_seq_low@entry=0, inpdu=inpdu@entry=0x8100d10e4 "\376SMB@", size=size@entry=260) at ../../source3/smbd/smb2_server.c:4580 #13 0x000000080173808d in process_smb (xconn=xconn@entry=0x8100e74e0, inbuf=0x8100d10e0 "", nread=264, unread_bytes=0, seqnum=0, encrypted=<optimized out>, deferred_pcd=0x0) at ../../source3/smbd/process.c:1973 #14 0x0000000801739334 in smbd_server_connection_read_handler (xconn=xconn@entry=0x8100e74e0, fd=<optimized out>) at ../../source3/smbd/process.c:2616 #15 0x0000000801739541 in smbd_server_connection_handler (ev=<optimized out>, fde=<optimized out>, flags=<optimized out>, private_data=<optimized out>) at ../../source3/smbd/process.c:2643 #16 0x000000080260c82d in tevent_common_invoke_fd_handler (fde=fde@entry=0x8100b2580, flags=<optimized out>, removed=removed@entry=0x0) at ../../lib/tevent/tevent_fd.c:142 #17 0x000000080260f00f in poll_event_loop_poll (tvalp=0x7fffffffe580, ev=0x8100b5060) at ../../lib/tevent/tevent_poll.c:569 #18 poll_event_loop_once (ev=0x8100b5060, location=<optimized out>) at ../../lib/tevent/tevent_poll.c:626 #19 0x000000080260bde4 in _tevent_loop_once (ev=ev@entry=0x8100b5060, location=location@entry=0x8018854b8 "../../source3/smbd/process.c:4246") at ../../lib/tevent/tevent.c:790 #20 0x000000080260bfd7 in tevent_common_loop_wait (ev=0x8100b5060, location=0x8018854b8 "../../source3/smbd/process.c:4246") at ../../lib/tevent/tevent.c:913 #21 0x000000080260c039 in _tevent_loop_wait (ev=ev@entry=0x8100b5060, location=location@entry=0x8018854b8 "../../source3/smbd/process.c:4246") at ../../lib/tevent/tevent.c:932 #22 0x000000080173b054 in smbd_process (ev_ctx=ev_ctx@entry=0x8100b5060, msg_ctx=msg_ctx@entry=0x8100ae140, dce_ctx=dce_ctx@entry=0x81009eda0, sock_fd=sock_fd@entry=50, interactive=interactive@entry=false) at ../../source3/smbd/process.c:4246 #23 0x000000000102e7c2 in smbd_accept_connection (ev=0x8100b5060, fde=<optimized out>, flags=<optimized out>, private_data=<optimized out>) at ../../source3/smbd/server.c:1021 #24 0x000000080260c82d in tevent_common_invoke_fd_handler (fde=fde@entry=0x8100b24a0, flags=<optimized out>, removed=removed@entry=0x0) at ../../lib/tevent/tevent_fd.c:142
Oh, I see the problem. dirfsp is being left as NULL and never initialized inside rmdir_internals().
Created attachment 16921 [details] raw patch for 4.15.next This is different to what will go into master, as that needs the patches from: https://bugzilla.samba.org/show_bug.cgi?id=14878 https://bugzilla.samba.org/show_bug.cgi?id=14879 first, but this should get you going.
FYI, this is a fix for core #1. Core #2 is actually a different bug we already know about, being tracked here: https://bugzilla.samba.org/show_bug.cgi?id=14882
Thanks. It doesn't seem to happen too often but it'll be good to have a fix for it until it get's fixed in a released version. Building a patched version now... Now back to the never-ending fight with getting the ACL handling right...
I think the reasons it's intermittent, is it is a race condition between one client setting the "delete on close" bit on a directory, for which it scans the directory looking for files that cannot be deleted, and another client creating a file within the target directory *after* that scan has been done.
MR is here: https://gitlab.com/samba-team/samba/-/merge_requests/2237
This bug was referenced in samba master: bbdcd66c048fee39629aeff450b50d049806e2f7
Created attachment 16941 [details] git-am fix for 4.15.next. Cherry-picked from master.
Comment on attachment 16941 [details] git-am fix for 4.15.next. This patch has a bug. I wrote a test that reveals it. Thank goodness :-). I'll push to CI first
Additional MR needed for this bug (adds regression test, then fix): https://gitlab.com/samba-team/samba/-/merge_requests/2239 From the MR: ------------------------------------------------------------------------------- I realized I could exploit a Samba difference from Windows to test the rmdir_internal() race condition of bug 14892 so I wrote a test just to ensure we don't regress. The test does: 1). Create "DEL_ON_CLOSE_DIR" - returns handle fnum 2). Set DELETE-ON-CLOSE bit on handle fnum 3). Create file "DEL_ON_CLOSE_DIR\testfile" (this would fail on Windows with DELETE_PENDING, but works against stock smbd). 4). Close fnum - we should get NT_STATUS_DIRECTORY_NOT_EMPTY. With bbdcd66c reverted (dirfsp is being used uninitialized inside rmdir_internals()) this crashes as you'd expect (runs into the uninitialized dirfsp). But with bbdcd66c applied the test still failed by returning NT_STATUS_OK in step (4) above. Turns out the STAT/LSTAT calls inside the "scan directory for non-veto files"/"delete veto file loops" are overwriting the 'int ret' variable used to return error in rmdir_internal(). This patch (including the test, naturally) ensures we don't overwrite 'ret' in the directory scanning loops (use 'int retval' for the STAT/LSTAT calls, which is already being used in one existing place but this subtlety was missed when the STAT/LSTAT calls were cleaned up to use 'ret = STAT()/LSTAT()' as a helper variable (older versions of this code just did 'if (LSTAT(..)!=0' so avoided this problem). I'm intending to clean up rmdir_internal() to use NT_STATUS internally instead of relying on ret=-1,errno for errors, but that's a later patch I'll need to look at more carefully. ------------------------------------------------------------------------------- Once this is in master I'll re-submit a new cherry-pick patch for 4.15.next that includes bbdcd66c followed by the test and fix in MR: 2239.
This bug was referenced in samba master: adfad6390962022277cc6aacaa388af86e46b71c 141f3f5f9a5ef556cc7864b2afbf8ad48b7ebe77
Created attachment 16951 [details] git-am fix for 4.15.next Cherry-picked from master. Note to Jule: Please note the patches for these bugs *MUST* be applied in the following order: https://bugzilla.samba.org/show_bug.cgi?id=14878 https://bugzilla.samba.org/show_bug.cgi?id=14879 https://bugzilla.samba.org/show_bug.cgi?id=14892 in order to apply correctly. Thanks ! Jeremy.
Reassigning to Jule for inclusion in 4.15.
Pushed to autobuild-v4-15-test.
This bug was referenced in samba v4-15-test: 89903ed1e3210ebbade1bf325ba16187a3c2e753 728c9b83564cef7e721488855cf352e32a5ca123 ad6af1bb8316783a9b31934184863d78fa3eeb27
Closing out bug report. Thanks!
This bug was referenced in samba v4-15-stable (Release samba-4.15.3): 89903ed1e3210ebbade1bf325ba16187a3c2e753 728c9b83564cef7e721488855cf352e32a5ca123 ad6af1bb8316783a9b31934184863d78fa3eeb27