Created attachment 16687 [details] Related files We've migrated a simple fileshare ( smb.conf => security = user ) from Redhat 6.10 (samba 3.x to CentOS 8) and got trouble with samba since then. It seems, too many parallel read crashes smbd. There is only one share and one user (nclanonymous). This user connects the share (readonly) from 500 Windows Clients and executes some batch files lying on the share. The script main script (28kb, 600 lines) is executed once an hour and the start time differs between the clients. Since CentOS updates samba to 4.13.x the smbd crashs/cores each hour. I've started some investigation and all version > 4.12 are affected, since the first 4.13rc releases. The error is simple to reproduce. I wrote a powershell script which "fakes" a batch file read behaviour (open file, Read 8191 bytes, close file, pos to next line, open file, read 8191 bytes and so on...) Setup: tar -xvzf samba-4.14.6.tar.gz ./configure --prefix=/opt/TESTsamba --disable-python --without-winbind --without-ads --without-ldap --without-quotas --without-acl-support --without-dnsupdate --without-automount --disable-glusterfs --disable-cephfs --without-ad-dc --with-shared-modules='!vfs_snapper' make make install The shared folder: mkdir -p /storage/sdc1/test/scripts/ and the file to open: /storage/sdc1/test/scripts/main.cmd In this test main.cmd this is simple a text copy (DOS format \r\n) of https://www.samba.org/samba/docs/current/man-html/smb.conf.5.html. The origin file was a Windows cmd file. For the test I used the main.cmd file with ~5800 lines: wc -l /storage/sdc1/test/scripts/main.cmd 5859 /storage/sdc1/test/scripts/main.cmd Start test: Version check: /opt/TESTsamba/sbin/smbd -V Version 4.14.6 Systemd start command line: [Service] Type=forking PIDFile=/opt/TESTsamba/var/run/smbd.pid EnvironmentFile=-/opt/TESTsamba/etc/samba ExecStart=/opt/TESTsamba/sbin/smbd -D --configfile=/opt/TESTsamba/etc/smb.conf TCP Dump command line: tcpdump -i eth0 port 445 -w /tmp/trace.cap On two Windows 10 machines I started my script (Simulate.ps1) at the same time which simulate reading a batch file (96 parallel on each): net use \\myTestSambaServer\test$ /user:nclanonymous .\Simulate.ps1 -file \\myTestSambaServer\test$\scripts\main.cmd -startNumberOfJobs 96 .\Simulate.ps1 -file \\myTestSambaServer\test$\scripts\main.cmd -startNumberOfJobs 96 Attention: This opens 96 powershell Windows. Make sure you have enough RAM available :-) After ~ 1 min this results in Jul 16 13:37:59 myTestSambaServer smbd[1919]: [2021/07/16 13:37:59.169424, 0, pid=1919, effective(0, 0), real(0, 0)] ../../lib/util/become_daemon.c:136(daemon_ready) Jul 16 13:37:59 myTestSambaServer smbd[1919]: daemon_ready: daemon 'smbd' finished starting up and ready to serve connections Jul 16 13:38:26 myTestSambaServer smbd[1925]: [2021/07/16 13:38:26.141247, 0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:26 myTestSambaServer smbd[1925]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:26 myTestSambaServer smbd[1924]: [2021/07/16 13:38:26.190340, 0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:26 myTestSambaServer smbd[1924]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:26 myTestSambaServer smbd[1925]: [2021/07/16 13:38:26.316617, 0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:26 myTestSambaServer smbd[1925]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:26 myTestSambaServer smbd[1924]: [2021/07/16 13:38:26.346205, 0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:26 myTestSambaServer smbd[1924]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:26 myTestSambaServer smbd[1925]: [2021/07/16 13:38:26.442810, 0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:26 myTestSambaServer smbd[1925]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:26 myTestSambaServer smbd[1924]: [2021/07/16 13:38:26.471659, 0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:26 myTestSambaServer smbd[1924]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:26 myTestSambaServer smbd[1925]: [2021/07/16 13:38:26.619849, 0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:309(close_remove_share_mode) Jul 16 13:38:26 myTestSambaServer smbd[1925]: close_remove_share_mode: Could not get share mode lock for file scripts/main.cmd Jul 16 13:38:26 myTestSambaServer smbd[1924]: [2021/07/16 13:38:26.690228, 0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:26 myTestSambaServer smbd[1924]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:26 myTestSambaServer smbd[1925]: [2021/07/16 13:38:26.730566, 0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:26 myTestSambaServer smbd[1925]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:26 myTestSambaServer smbd[1924]: [2021/07/16 13:38:26.799636, 0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:26 myTestSambaServer smbd[1924]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:26 myTestSambaServer smbd[1925]: [2021/07/16 13:38:26.851500, 0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:26 myTestSambaServer smbd[1925]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:26 myTestSambaServer smbd[1924]: [2021/07/16 13:38:26.924460, 0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:26 myTestSambaServer smbd[1924]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:27 myTestSambaServer smbd[1925]: [2021/07/16 13:38:27.035959, 0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:27 myTestSambaServer smbd[1925]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:27 myTestSambaServer smbd[1925]: [2021/07/16 13:38:27.051619, 0, pid=1925, effective(11009, 65534), real(11009, 0), class=locking] ../../source3/locking/share_mode_lock.c:961(share_mode_lock_destructor) Jul 16 13:38:27 myTestSambaServer smbd[1925]: share_mode_lock_destructor: share_mode_data_store failed: NT_STATUS_NOT_LOCKED Jul 16 13:38:27 myTestSambaServer audit[1925]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 pid=1925 comm="smbd" exe="/opt/TESTsamba/sbin/smbd" sig=6 res=1 Jul 16 13:38:27 myTestSambaServer smbd[1925]: [2021/07/16 13:38:27.051697, 0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../lib/util/fault.c:172(smb_panic_log) Jul 16 13:38:27 myTestSambaServer smbd[1925]: =============================================================== Jul 16 13:38:27 myTestSambaServer smbd[1925]: [2021/07/16 13:38:27.051740, 0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../lib/util/fault.c:176(smb_panic_log) Jul 16 13:38:27 myTestSambaServer smbd[1925]: INTERNAL ERROR: Could not store share mode data Jul 16 13:38:27 myTestSambaServer smbd[1925]: in pid 1925 (4.14.6) Jul 16 13:38:27 myTestSambaServer smbd[1925]: [2021/07/16 13:38:27.051797, 0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../lib/util/fault.c:181(smb_panic_log) Jul 16 13:38:27 myTestSambaServer smbd[1925]: If you are running a recent Samba version, and if you think this problem is not yet fixed in the latest versions, please consider reporting this bug, see https://wiki.samba.org/index.php/Bug_Reporting Jul 16 13:38:27 myTestSambaServer smbd[1925]: [2021/07/16 13:38:27.051842, 0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../lib/util/fault.c:182(smb_panic_log) Jul 16 13:38:27 myTestSambaServer smbd[1925]: =============================================================== Jul 16 13:38:27 myTestSambaServer smbd[1925]: [2021/07/16 13:38:27.051883, 0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../lib/util/fault.c:184(smb_panic_log) Jul 16 13:38:27 myTestSambaServer smbd[1925]: PANIC (pid 1925): Could not store share mode data Jul 16 13:38:27 myTestSambaServer smbd[1925]: in 4.14.6 Jul 16 13:38:27 myTestSambaServer smbd[1925]: [2021/07/16 13:38:27.054571, 0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../lib/util/fault.c:288(log_stack_trace) Jul 16 13:38:27 myTestSambaServer smbd[1925]: BACKTRACE: 39 stack frames: Jul 16 13:38:27 myTestSambaServer smbd[1925]: #0 /opt/TESTsamba/lib/libsamba-util.so.0(log_stack_trace+0x2e) [0x7ff1afa1e861] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #1 /opt/TESTsamba/lib/libsamba-util.so.0(smb_panic_log+0x1b1) [0x7ff1afa1e7e6] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #2 /opt/TESTsamba/lib/libsamba-util.so.0(smb_panic+0x18) [0x7ff1afa1e801] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #3 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x8c0b1) [0x7ff1af4c10b1] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #4 /opt/TESTsamba/lib/private/libtalloc.so.2(+0x397c) [0x7ff1ae60197c] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #5 /opt/TESTsamba/lib/private/libtalloc.so.2(+0x3cf8) [0x7ff1ae601cf8] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #6 /opt/TESTsamba/lib/private/libtalloc.so.2(_talloc_free+0x105) [0x7ff1ae60305e] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #7 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x21c141) [0x7ff1af651141] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #8 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x220c0b) [0x7ff1af655c0b] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #9 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(create_file_default+0x4f1) [0x7ff1af656805] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #10 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x1909a8) [0x7ff1af5c59a8] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #11 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(smb_vfs_call_create_file+0x9b) [0x7ff1af661f2a] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #12 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x274b1b) [0x7ff1af6a9b1b] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #13 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(smbd_smb2_request_process_create+0x76e) [0x7ff1af6a75ad] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #14 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(smbd_smb2_request_dispatch+0x146b) [0x7ff1af699929] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #15 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x26984e) [0x7ff1af69e84e] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #16 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x26997d) [0x7ff1af69e97d] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #17 /opt/TESTsamba/lib/private/libtevent.so.0(tevent_common_invoke_fd_handler+0x101) [0x7ff1aec1c095] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #18 /opt/TESTsamba/lib/private/libtevent.so.0(+0x11afc) [0x7ff1aec26afc] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #19 /opt/TESTsamba/lib/private/libtevent.so.0(+0x121df) [0x7ff1aec271df] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #20 /opt/TESTsamba/lib/private/libtevent.so.0(+0xe941) [0x7ff1aec23941] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #21 /opt/TESTsamba/lib/private/libtevent.so.0(_tevent_loop_once+0x11e) [0x7ff1aec1b2b4] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #22 /opt/TESTsamba/lib/private/libtevent.so.0(tevent_common_loop_wait+0x25) [0x7ff1aec1b606] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #23 /opt/TESTsamba/lib/private/libtevent.so.0(+0xe9e3) [0x7ff1aec239e3] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #24 /opt/TESTsamba/lib/private/libtevent.so.0(_tevent_loop_wait+0x2b) [0x7ff1aec1b6a9] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #25 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(smbd_process+0xcd0) [0x7ff1af6801a6] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #26 /opt/TESTsamba/sbin/smbd(+0xd185) [0x55dadfb38185] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #27 /opt/TESTsamba/lib/private/libtevent.so.0(tevent_common_invoke_fd_handler+0x101) [0x7ff1aec1c095] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #28 /opt/TESTsamba/lib/private/libtevent.so.0(+0x11afc) [0x7ff1aec26afc] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #29 /opt/TESTsamba/lib/private/libtevent.so.0(+0x121df) [0x7ff1aec271df] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #30 /opt/TESTsamba/lib/private/libtevent.so.0(+0xe941) [0x7ff1aec23941] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #31 /opt/TESTsamba/lib/private/libtevent.so.0(_tevent_loop_once+0x11e) [0x7ff1aec1b2b4] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #32 /opt/TESTsamba/lib/private/libtevent.so.0(tevent_common_loop_wait+0x25) [0x7ff1aec1b606] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #33 /opt/TESTsamba/lib/private/libtevent.so.0(+0xe9e3) [0x7ff1aec239e3] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #34 /opt/TESTsamba/lib/private/libtevent.so.0(_tevent_loop_wait+0x2b) [0x7ff1aec1b6a9] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #35 /opt/TESTsamba/sbin/smbd(+0xdea7) [0x55dadfb38ea7] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #36 /opt/TESTsamba/sbin/smbd(main+0x1a71) [0x55dadfb3b221] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #37 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7ff1abebabf7] Jul 16 13:38:27 myTestSambaServer smbd[1925]: #38 /opt/TESTsamba/sbin/smbd(_start+0x2a) [0x55dadfb3114a] Jul 16 13:38:27 myTestSambaServer smbd[1925]: [2021/07/16 13:38:27.055490, 0, pid=1925, effective(0, 0), real(0, 0)] ../../source3/lib/dumpcore.c:318(dump_core) Jul 16 13:38:27 myTestSambaServer smbd[1925]: coredump is handled by helper binary specified at /proc/sys/kernel/core_pattern Jul 16 13:38:27 myTestSambaServer smbd[1925]: Jul 16 13:38:28 myTestSambaServer smbd[1924]: [2021/07/16 13:38:28.517959, 0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:28 myTestSambaServer smbd[1924]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:28 myTestSambaServer smbd[1931]: [2021/07/16 13:38:28.630492, 0, pid=1931, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:309(close_remove_share_mode) Jul 16 13:38:28 myTestSambaServer smbd[1931]: close_remove_share_mode: Could not get share mode lock for file scripts/main.cmd Jul 16 13:38:28 myTestSambaServer smbd[1924]: [2021/07/16 13:38:28.722931, 0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:28 myTestSambaServer smbd[1924]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:28 myTestSambaServer smbd[1931]: [2021/07/16 13:38:28.787486, 0, pid=1931, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:28 myTestSambaServer smbd[1931]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:29 myTestSambaServer smbd[1931]: [2021/07/16 13:38:29.029613, 0, pid=1931, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:29 myTestSambaServer smbd[1931]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:29 myTestSambaServer audit: ANOM_PROMISCUOUS dev=eth0 prom=0 old_prom=256 auid=0 uid=0 gid=0 ses=2 Jul 16 13:38:29 myTestSambaServer kernel: device eth0 left promiscuous mode Jul 16 13:38:29 myTestSambaServer smbd[1924]: [2021/07/16 13:38:29.065437, 0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:29 myTestSambaServer smbd[1924]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:29 myTestSambaServer smbd[1931]: [2021/07/16 13:38:29.121192, 0, pid=1931, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:309(close_remove_share_mode) Jul 16 13:38:29 myTestSambaServer smbd[1931]: close_remove_share_mode: Could not get share mode lock for file scripts/main.cmd Jul 16 13:38:29 myTestSambaServer smbd[1924]: [2021/07/16 13:38:29.159015, 0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:29 myTestSambaServer smbd[1924]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:29 myTestSambaServer smbd[1931]: [2021/07/16 13:38:29.279852, 0, pid=1931, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:29 myTestSambaServer smbd[1931]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:29 myTestSambaServer smbd[1924]: [2021/07/16 13:38:29.346339, 0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:29 myTestSambaServer smbd[1924]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:29 myTestSambaServer smbd[1931]: [2021/07/16 13:38:29.402292, 0, pid=1931, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:29 myTestSambaServer smbd[1931]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:29 myTestSambaServer smbd[1924]: [2021/07/16 13:38:29.456223, 0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:29 myTestSambaServer smbd[1924]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:29 myTestSambaServer smbd[1931]: [2021/07/16 13:38:29.535445, 0, pid=1931, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:29 myTestSambaServer smbd[1931]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:29 myTestSambaServer smbd[1924]: [2021/07/16 13:38:29.628028, 0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:29 myTestSambaServer smbd[1924]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:29 myTestSambaServer smbd[1931]: [2021/07/16 13:38:29.669333, 0, pid=1931, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:29 myTestSambaServer smbd[1931]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:29 myTestSambaServer smbd[1924]: [2021/07/16 13:38:29.721911, 0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:29 myTestSambaServer smbd[1924]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:29 myTestSambaServer smbd[1931]: [2021/07/16 13:38:29.817451, 0, pid=1931, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:29 myTestSambaServer smbd[1931]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:29 myTestSambaServer smbd[1924]: [2021/07/16 13:38:29.847012, 0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 16 13:38:29 myTestSambaServer smbd[1924]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd Jul 16 13:38:30 myTestSambaServer smbd[1931]: [2021/07/16 13:38:30.014143, 0, pid=1931, effective(11009, 65534), real(11009, 0)] I attached a 7z with the related files: samba_var.tar.gz -> Archive of /opt/TESTsamba/var smb.conf -> /opt/TESTsamba/etc/smb.conf _opt_TESTsamba_sbin_smbd.0.crash -> coredump written by apport.service trace.cap -> network trace simulate.ps1 -> Powershell script to simulate reading a batch file main.cmd -> File reed by simulate.ps1 Another hint. On the original machine (CentOS 8): After the first coredumps are occured, the time to fully read the whole main.cmd file over samba is from 30sec up to mulitple minutes (for 28kb !!) . While I can other files, for example a install.wim of 4GB, copy with 200MB/sec from the same share.
Can you add "panic action = /bin/sleep 999999" to the [global] section of your smb.conf and rebuild with symbols. When you get the crash you should be able to attach to the parent process of the sleep with gdb and get a full backtrace with symbols. That would really help track this down, thanks !
Here is a backtrace: (gdb) backtrace #0 0x00007fed51271492 in __GI___waitpid (pid=26451, stat_loc=stat_loc@entry=0x7ffd840c66b8, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30 #1 0x00007fed511dc177 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:149 #2 0x00007fed536b6d51 in smb_panic_s3 (why=0x7fed504ca115 "lock order violation") at ../../source3/lib/util.c:840 #3 0x00007fed54d1281d in smb_panic (why=0x7fed504ca115 "lock order violation") at ../../lib/util/fault.c:197 #4 0x00007fed504c468a in dbwrap_lock_order_lock (db_name=0x55c293cb07f0 "/opt/TESTsamba/var/lock/smbXsrv_open_global.tdb", lock_order=DBWRAP_LOCK_ORDER_1) at ../../lib/dbwrap/dbwrap.c:166 #5 0x00007fed504c4a2c in dbwrap_check_lock_order (db=0x55c293cb1a70, mem_ctx=0x55c293cd1700) at ../../lib/dbwrap/dbwrap.c:231 #6 0x00007fed504c4a95 in dbwrap_fetch_locked_internal (db=0x55c293cb1a70, mem_ctx=0x55c293cd1700, key=..., db_fn=0x7fed504c8ce2 <db_tdb_fetch_locked>) at ../../lib/dbwrap/dbwrap.c:246 #7 0x00007fed504c4b64 in dbwrap_fetch_locked (db=0x55c293cb1a70, mem_ctx=0x55c293cd1700, key=...) at ../../lib/dbwrap/dbwrap.c:265 #8 0x00007fed549c2b0d in smbXsrv_open_global_fetch_locked (db=0x55c293cb1a70, id=3084737156, mem_ctx=0x55c293cd1700) at ../../source3/smbd/smbXsrv_open.c:167 #9 0x00007fed549c4ed1 in smbXsrv_open_close (op=0x55c293ce40d0, now=0) at ../../source3/smbd/smbXsrv_open.c:1076 #10 0x00007fed549c4233 in smbXsrv_open_destructor (op=0x55c293ce40d0) at ../../source3/smbd/smbXsrv_open.c:803 #11 0x00007fed538f597c in _tc_free_internal (tc=0x55c293ce4070, location=0x7fed54a646e8 "../../source3/smbd/files.c:1076") at ../../lib/talloc/talloc.c:1158 #12 0x00007fed538f5cf8 in _talloc_free_internal (ptr=0x55c293ce40d0, location=0x7fed54a646e8 "../../source3/smbd/files.c:1076") at ../../lib/talloc/talloc.c:1248 #13 0x00007fed538f705e in _talloc_free (ptr=0x55c293ce40d0, location=0x7fed54a646e8 "../../source3/smbd/files.c:1076") at ../../lib/talloc/talloc.c:1792 #14 0x00007fed548c3ab6 in file_free (req=0x55c293ce1860, fsp=0x55c293cdd7d0) at ../../source3/smbd/files.c:1076 #15 0x00007fed5494e6b7 in close_file (req=0x55c293ce1860, fsp=0x55c293cdd7d0, close_type=ERROR_CLOSE) at ../../source3/smbd/close.c:1341 #16 0x00007fed5494a2b9 in create_file_unixpath (conn=0x55c293cbb050, req=0x55c293ce1860, smb_fname=0x55c293cb1390, access_mask=1179785, share_access=3, create_disposition=1, create_options=4194368, file_attributes=0, oplock_request=2, lease=0x0, allocation_size=0, private_flags=0, sd=0x0, ea_list=0x0, result=0x7ffd840c6e20, pinfo=0x7ffd840c6e04) at ../../source3/smbd/open.c:6094 #17 0x00007fed5494a805 in create_file_default (conn=0x55c293cbb050, req=0x55c293ce1860, smb_fname=0x55c293cb1390, access_mask=1179785, share_access=3, create_disposition=1, create_options=4194368, file_attributes=0, oplock_request=2, lease=0x0, allocation_size=0, private_flags=0, sd=0x0, ea_list=0x0, result=0x55c293ce47e8, pinfo=0x55c293ce47fc, in_context_blobs=0x7ffd840c71a8, out_context_blobs=0x55c293cdbee0) at ../../source3/smbd/open.c:6241 #18 0x00007fed548b99a8 in vfswrap_create_file (handle=0x55c293cdb4d0, req=0x55c293ce1860, smb_fname=0x55c293cb1390, access_mask=1179785, share_access=3, create_disposition=1, create_options=4194368, file_attributes=0, oplock_request=2, lease=0x0, allocation_size=0, private_flags=0, sd=0x0, ea_list=0x0, result=0x55c293ce47e8, pinfo=0x55c293ce47fc, in_context_blobs=0x7ffd840c71a8, out_context_blobs=0x55c293cdbee0) at ../../source3/modules/vfs_default.c:759 #19 0x00007fed54955f2a in smb_vfs_call_create_file (handle=0x55c293cdb4d0, req=0x55c293ce1860, smb_fname=0x55c293cb1390, access_mask=1179785, share_access=3, create_disposition=1, create_options=4194368, file_attributes=0, oplock_request=2, lease=0x0, allocation_size=0, private_flags=0, sd=0x0, ea_list=0x0, result=0x55c293ce47e8, pinfo=0x55c293ce47fc, in_context_blobs=0x7ffd840c71a8, out_context_blobs=0x55c293cdbee0) at ../../source3/smbd/vfs.c:1858 #20 0x00007fed5499db1b in smbd_smb2_create_send (mem_ctx=0x55c293ce0f70, ev=0x55c293c89100, smb2req=0x55c293ce0f70, in_oplock_level=9 '\t', in_impersonation_level=0, in_desired_access=1179785, in_file_attributes=0, in_share_access=3, in_create_disposition=1, in_create_options=4194368, in_name=0x55c293ce13e0 "scripts\\main.cmd", in_context_blobs=...) at ../../source3/smbd/smb2_create.c:991 #21 0x00007fed5499b5ad in smbd_smb2_request_process_create (smb2req=0x55c293ce0f70) at ../../source3/smbd/smb2_create.c:268 #22 0x00007fed5498d929 in smbd_smb2_request_dispatch (req=0x55c293ce0f70) at ../../source3/smbd/smb2_server.c:3303 #23 0x00007fed5499284e in smbd_smb2_io_handler (xconn=0x55c293c86570, fde_flags=1) at ../../source3/smbd/smb2_server.c:4913 #24 0x00007fed5499297d in smbd_smb2_connection_handler (ev=0x55c293c89100, fde=0x55c293cb72f0, flags=1, private_data=0x55c293c86570) at ../../source3/smbd/smb2_server.c:4951 #25 0x00007fed53f10095 in tevent_common_invoke_fd_handler (fde=0x55c293cb72f0, flags=1, removed=0x0) at ../../lib/tevent/tevent_fd.c:138 #26 0x00007fed53f1aafc in epoll_event_loop (epoll_ev=0x55c293c9cb50, tvalp=0x7ffd840c7520) at ../../lib/tevent/tevent_epoll.c:736 #27 0x00007fed53f1b1df in epoll_event_loop_once (ev=0x55c293c89100, location=0x7fed54a92aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent_epoll.c:937 #28 0x00007fed53f17941 in std_event_loop_once (ev=0x55c293c89100, location=0x7fed54a92aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent_standard.c:110 #29 0x00007fed53f0f2b4 in _tevent_loop_once (ev=0x55c293c89100, location=0x7fed54a92aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent.c:772 #30 0x00007fed53f0f606 in tevent_common_loop_wait (ev=0x55c293c89100, location=0x7fed54a92aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent.c:895 #31 0x00007fed53f179e3 in std_event_loop_wait (ev=0x55c293c89100, location=0x7fed54a92aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent_standard.c:141 #32 0x00007fed53f0f6a9 in _tevent_loop_wait (ev=0x55c293c89100, location=0x7fed54a92aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent.c:914 #33 0x00007fed549741a6 in smbd_process (ev_ctx=0x55c293c89100, msg_ctx=0x55c293c84780, dce_ctx=0x55c293c84650, sock_fd=48, interactive=false) at ../../source3/smbd/process.c:4232 #34 0x000055c293467185 in smbd_accept_connection (ev=0x55c293c89100, fde=0x55c293cc3ab0, flags=1, private_data=0x55c293cbb430) at ../../source3/smbd/server.c:1020 #35 0x00007fed53f10095 in tevent_common_invoke_fd_handler (fde=0x55c293cc3ab0, flags=1, removed=0x0) at ../../lib/tevent/tevent_fd.c:138 #36 0x00007fed53f1aafc in epoll_event_loop (epoll_ev=0x55c293c9b210, tvalp=0x7ffd840c78d0) at ../../lib/tevent/tevent_epoll.c:736 #37 0x00007fed53f1b1df in epoll_event_loop_once (ev=0x55c293c89100, location=0x55c29346f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent_epoll.c:937 #38 0x00007fed53f17941 in std_event_loop_once (ev=0x55c293c89100, location=0x55c29346f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent_standard.c:110 #39 0x00007fed53f0f2b4 in _tevent_loop_once (ev=0x55c293c89100, location=0x55c29346f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent.c:772 #40 0x00007fed53f0f606 in tevent_common_loop_wait (ev=0x55c293c89100, location=0x55c29346f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent.c:895 #41 0x00007fed53f179e3 in std_event_loop_wait (ev=0x55c293c89100, location=0x55c29346f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent_standard.c:141 #42 0x00007fed53f0f6a9 in _tevent_loop_wait (ev=0x55c293c89100, location=0x55c29346f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent.c:914 #43 0x000055c293467ea7 in smbd_parent_loop (ev_ctx=0x55c293c89100, parent=0x55c293c9f040) at ../../source3/smbd/server.c:1367 #44 0x000055c29346a221 in main (argc=3, argv=0x7ffd840c7ef8) at ../../source3/smbd/server.c:2220
Created attachment 16688 [details] Simulate.ps1 reupload In the 1st version I removed too much (debug) code.
Created attachment 16690 [details] git-am fix for 4.14.next. Can you test this patch ? I think it's in the right area and exposes a place where a destructor can have been set without the correct cleanup path being flagged.
Patch applied, smbd still crashes but with a different error. Here is the backtrace: #0 0x00007f91572b2492 in __GI___waitpid (pid=26850, stat_loc=stat_loc@entry=0x7ffd1500b028, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30 #1 0x00007f915721d177 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:149 #2 0x00007f91596f7d51 in smb_panic_s3 (why=0x7f915aa43954 "Could not unlock share mode\n") at ../../source3/lib/util.c:840 #3 0x00007f915ad5381d in smb_panic (why=0x7f915aa43954 "Could not unlock share mode\n") at ../../lib/util/fault.c:197 #4 0x00007f915a7f6202 in share_mode_lock_destructor (lck=0x55c490102b30) at ../../source3/locking/share_mode_lock.c:973 #5 0x00007f915993697c in _tc_free_internal (tc=0x55c490102ad0, location=0x7f915aac6c08 "../../source3/smbd/close.c:522") at ../../lib/talloc/talloc.c:1158 #6 0x00007f9159936cf8 in _talloc_free_internal (ptr=0x55c490102b30, location=0x7f915aac6c08 "../../source3/smbd/close.c:522") at ../../lib/talloc/talloc.c:1248 #7 0x00007f915993805e in _talloc_free (ptr=0x55c490102b30, location=0x7f915aac6c08 "../../source3/smbd/close.c:522") at ../../lib/talloc/talloc.c:1792 #8 0x00007f915a98d21d in close_remove_share_mode (fsp=0x55c4900a1060, close_type=NORMAL_CLOSE) at ../../source3/smbd/close.c:522 #9 0x00007f915a98db7e in close_normal_file (req=0x55c4900fab50, fsp=0x55c4900a1060, close_type=NORMAL_CLOSE) at ../../source3/smbd/close.c:791 #10 0x00007f915a98f700 in close_file (req=0x55c4900fab50, fsp=0x55c4900a1060, close_type=NORMAL_CLOSE) at ../../source3/smbd/close.c:1346 #11 0x00007f915a9e2505 in smbd_smb2_close (req=0x55c4900fa3e0, fsp=0x55c4900a1060, in_flags=1, out_flags=0x55c4900fa9a2, out_creation_ts=0x55c4900fa9a8, out_last_access_ts=0x55c4900fa9b8, out_last_write_ts=0x55c4900fa9c8, out_change_ts=0x55c4900fa9d8, out_allocation_size=0x55c4900fa9e8, out_end_of_file=0x55c4900fa9f0, out_file_attributes=0x55c4900fa9f8) at ../../source3/smbd/smb2_close.c:277 #12 0x00007f915a9e28d7 in smbd_smb2_close_send (mem_ctx=0x55c4900fa3e0, ev=0x55c4900a3100, smb2req=0x55c4900fa3e0, in_fsp=0x55c4900a1060, in_flags=1) at ../../source3/smbd/smb2_close.c:393 #13 0x00007f915a9e1abc in smbd_smb2_request_process_close (req=0x55c4900fa3e0) at ../../source3/smbd/smb2_close.c:72 #14 0x00007f915a9ce943 in smbd_smb2_request_dispatch (req=0x55c4900fa3e0) at ../../source3/smbd/smb2_server.c:3309 #15 0x00007f915a9d384e in smbd_smb2_io_handler (xconn=0x55c4900a0570, fde_flags=1) at ../../source3/smbd/smb2_server.c:4913 #16 0x00007f915a9d397d in smbd_smb2_connection_handler (ev=0x55c4900a3100, fde=0x55c4900d12f0, flags=1, private_data=0x55c4900a0570) at ../../source3/smbd/smb2_server.c:4951 #17 0x00007f9159f51095 in tevent_common_invoke_fd_handler (fde=0x55c4900d12f0, flags=1, removed=0x0) at ../../lib/tevent/tevent_fd.c:138 #18 0x00007f9159f5bafc in epoll_event_loop (epoll_ev=0x55c4900b6b50, tvalp=0x7ffd1500b980) at ../../lib/tevent/tevent_epoll.c:736 #19 0x00007f9159f5c1df in epoll_event_loop_once (ev=0x55c4900a3100, location=0x7f915aad3aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent_epoll.c:937 #20 0x00007f9159f58941 in std_event_loop_once (ev=0x55c4900a3100, location=0x7f915aad3aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent_standard.c:110 #21 0x00007f9159f502b4 in _tevent_loop_once (ev=0x55c4900a3100, location=0x7f915aad3aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent.c:772 #22 0x00007f9159f50606 in tevent_common_loop_wait (ev=0x55c4900a3100, location=0x7f915aad3aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent.c:895 #23 0x00007f9159f589e3 in std_event_loop_wait (ev=0x55c4900a3100, location=0x7f915aad3aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent_standard.c:141 #24 0x00007f9159f506a9 in _tevent_loop_wait (ev=0x55c4900a3100, location=0x7f915aad3aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent.c:914 #25 0x00007f915a9b51a6 in smbd_process (ev_ctx=0x55c4900a3100, msg_ctx=0x55c49009e780, dce_ctx=0x55c49009e650, sock_fd=48, interactive=false) at ../../source3/smbd/process.c:4232 #26 0x000055c48f7b8185 in smbd_accept_connection (ev=0x55c4900a3100, fde=0x55c4900ddab0, flags=1, private_data=0x55c4900d5430) at ../../source3/smbd/server.c:1020 #27 0x00007f9159f51095 in tevent_common_invoke_fd_handler (fde=0x55c4900ddab0, flags=1, removed=0x0) at ../../lib/tevent/tevent_fd.c:138 #28 0x00007f9159f5bafc in epoll_event_loop (epoll_ev=0x55c4900b5210, tvalp=0x7ffd1500bd30) at ../../lib/tevent/tevent_epoll.c:736 #29 0x00007f9159f5c1df in epoll_event_loop_once (ev=0x55c4900a3100, location=0x55c48f7c01a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent_epoll.c:937 #30 0x00007f9159f58941 in std_event_loop_once (ev=0x55c4900a3100, location=0x55c48f7c01a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent_standard.c:110 #31 0x00007f9159f502b4 in _tevent_loop_once (ev=0x55c4900a3100, location=0x55c48f7c01a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent.c:772 #32 0x00007f9159f50606 in tevent_common_loop_wait (ev=0x55c4900a3100, location=0x55c48f7c01a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent.c:895 #33 0x00007f9159f589e3 in std_event_loop_wait (ev=0x55c4900a3100, location=0x55c48f7c01a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent_standard.c:141 #34 0x00007f9159f506a9 in _tevent_loop_wait (ev=0x55c4900a3100, location=0x55c48f7c01a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent.c:914 #35 0x000055c48f7b8ea7 in smbd_parent_loop (ev_ctx=0x55c4900a3100, parent=0x55c4900b9040) at ../../source3/smbd/server.c:1367 #36 0x000055c48f7bb221 in main (argc=3, argv=0x7ffd1500c358) at ../../source3/smbd/server.c:2220
syslog says smbd[26843]: [2021/07/20 13:41:05.568285, 0] ../../source3/smbd/close.c:519(close_remove_share_mode) smbd[26843]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd smbd[26845]: [2021/07/20 13:41:05.581593, 0] ../../source3/smbd/close.c:519(close_remove_share_mode) smbd[26845]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd smbd[26843]: [2021/07/20 13:41:05.677774, 0] ../../source3/smbd/close.c:519(close_remove_share_mode) smbd[26843]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd smbd[26845]: [2021/07/20 13:41:05.689613, 0] ../../source3/smbd/close.c:519(close_remove_share_mode) smbd[26845]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd smbd[26843]: [2021/07/20 13:41:05.786966, 0] ../../source3/smbd/close.c:519(close_remove_share_mode) smbd[26843]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
OK, well this is progress. The first patch certainly fixes a bug in the "failure to open" case (calling close_file(...,ERROR_CLOSE)). This problem is different. It's an issue on calling close_file(...,NORMAL_CLOSE). I'll look closer.
Is there a possibility you can give me a new wireshark trace with my patch in: https://attachments.samba.org/attachment.cgi?id=16690 applied ? It's deliberately calling smb_panic on an assertion fail, because it can't find a share mode entry for what should be an existing open file. I'd love to see the SMB2create (open) call that corresponds to this close to allow me to visually walk through the open code path to see if there's a case where we successfully return an open handle but haven't set a share mode entry (this should never happen of course, but then again we shouldn't crash either :-).
Oh, can you also turn off encryption on the share before getting the wireshark trace please ? The last trace wasn't useful to examine as all I get is: "Encrypted SMB3", which doesn't help if I want to look inside the packets :-).
Set server smb encrypt = off smb encrypt = off Trace2.cap attached. First occurence when smbd cores is: Jul 20 21:58:48 \myTestSambaServer smbd[27185]: [2021/07/20 21:58:48.236866, 0] ../../source3/smbd/close.c:519(close_remove_share_mode) Jul 20 21:58:48 \myTestSambaServer smbd[27185]: close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Created attachment 16693 [details] unencrypted trace
I don't see a TCP RST packet in that trace. Normally when smbd cores the server will send a TCP RST packet to the client to tell it the connection has gone away.
I also noticed that in your smb.conf you have: oplocks = No level2 oplocks = No smb2 leases = No This means that the client must to an open/read/close every time it's opening the same file (scripts\main.cmd). Is there any reason you need this ? It's going to make Samba really slow. Have you tried removing these lines and letting the client cache the file via a lease ?
(In reply to Jeremy Allison from comment #12) I think this happens because I still have the panic action = /bin/sleep 999999 option active and the smbd process has startet the sleep process as child and does not exit
(In reply to Jeremy Allison from comment #13) I assumed that this error is lock related. Therefore I played around with those options. I removed oplocks = No level2 oplocks = No smb2 leases = No but error is the same.
(In reply to Michael Albert from comment #14) Yes, you are correct - I forgot that. Can you allow it to crash so I can see the wireshark trace of the exact close request that causes the crash ?
(In reply to Michael Albert from comment #14) Hmmm. Thinking some more I'm not sure this is the case. Your wireshark trace ends on an SMB2Read request, the crash occurs on SMB2close. I don't see an SMB2close ending the trace. There's something not matching here. If this is easily reproducible for you, can you get a debug level 10 log (it might be large) ?
Created attachment 16695 [details] Extra debug for 4.14.x Can you try reproducing with this patch and send me the wireshark trace + debug log. If adds lots more debugging around the share modes (makes normal debugging log at level zero actually). There have been changes in this area, so I'd be interested if you can reproduce this crash with the recently release 4.15.rc1. Thanks !
Created attachment 16696 [details] Trace and log for Debug patch [PATCH 1/2] s3: smbd: Ensure we set is_fsa inside fsp_bind_smb() as soon as we have called smbXsrv_open_create(). [PATCH 2/2] JRATEST: Add debug to try and track down Added trace3.cap and log.smbd.
Added trace3.cap and log.smbd with the applied debug patch. Panic is occured at 10:13:34 Jul 22 10:13:34 myTestSambaServer smbd[14494]: [2021/07/22 10:13:34.688967, 0] ../../lib/util/fault.c:172(smb_panic_log) Jul 22 10:13:34 myTestSambaServer smbd[14494]: [2021/07/22 10:13:34.689022, 0] ../../lib/util/fault.c:176(smb_panic_log) Jul 22 10:13:34 myTestSambaServer smbd[14494]: [2021/07/22 10:13:34.689116, 0] ../../lib/util/fault.c:181(smb_panic_log) Jul 22 10:13:34 myTestSambaServer smbd[14494]: [2021/07/22 10:13:34.689245, 0] ../../lib/util/fault.c:182(smb_panic_log) Jul 22 10:13:34 myTestSambaServer smbd[14494]: [2021/07/22 10:13:34.689266, 0] ../../lib/util/fault.c:184(smb_panic_log) Jul 22 10:13:34 myTestSambaServer smbd[14494]: #1 /opt/TESTsamba/lib/libsamba-util.so.0(smb_panic_log+0x1b1) [0x7f7e321197e6] Jul 22 10:13:34 myTestSambaServer smbd[14494]: #2 /opt/TESTsamba/lib/libsamba-util.so.0(smb_panic+0x18) [0x7f7e32119801] Jul 22 10:13:34 myTestSambaServer smbd[14494]: [2021/07/22 10:13:34.694433, 0] ../../source3/lib/util.c:839(smb_panic_s3) Jul 22 10:13:34 myTestSambaServer smbd[14494]: smb_panic(): calling panic action [/bin/sleep 999999]
~# /opt/TESTsamba/sbin/smbd -V Version 4.15.0rc1 has the same problem. syslog: Jul 22 10:44:44 ad613419pc smbd[31447]: [2021/07/22 10:44:44.790120, 0] ../../lib/util/fault.c:288(log_stack_trace) Jul 22 10:44:44 ad613419pc smbd[31447]: BACKTRACE: 39 stack frames: Jul 22 10:44:44 ad613419pc smbd[31447]: #0 /opt/TESTsamba/lib/libsamba-util.so.0(log_stack_trace+0x2e) [0x7fd1c3effa71] Jul 22 10:44:44 ad613419pc smbd[31447]: #1 /opt/TESTsamba/lib/libsamba-util.so.0(smb_panic_log+0x1b1) [0x7fd1c3eff9f6] Jul 22 10:44:44 ad613419pc smbd[31447]: #2 /opt/TESTsamba/lib/libsamba-util.so.0(smb_panic+0x18) [0x7fd1c3effa11] Jul 22 10:44:44 ad613419pc smbd[31447]: #3 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x8c899) [0x7fd1c41fe899] Jul 22 10:44:44 ad613419pc smbd[31447]: #4 /opt/TESTsamba/lib/private/libtalloc.so.2(+0x39dc) [0x7fd1c3ad19dc] Jul 22 10:44:44 ad613419pc smbd[31447]: #5 /opt/TESTsamba/lib/private/libtalloc.so.2(+0x3d58) [0x7fd1c3ad1d58] Jul 22 10:44:44 ad613419pc smbd[31447]: #6 /opt/TESTsamba/lib/private/libtalloc.so.2(_talloc_free+0x105) [0x7fd1c3ad30be] Jul 22 10:44:44 ad613419pc smbd[31447]: #7 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x21d1c8) [0x7fd1c438f1c8] Jul 22 10:44:44 ad613419pc smbd[31447]: #8 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x221be8) [0x7fd1c4393be8] Jul 22 10:44:44 ad613419pc smbd[31447]: #9 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(create_file_default+0x4f1) [0x7fd1c43948aa] Jul 22 10:44:44 ad613419pc smbd[31447]: #10 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x190446) [0x7fd1c4302446] Jul 22 10:44:44 ad613419pc smbd[31447]: #11 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(smb_vfs_call_create_file+0x9b) [0x7fd1c43a0aae] Jul 22 10:44:44 ad613419pc smbd[31447]: #12 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x276464) [0x7fd1c43e8464] Jul 22 10:44:44 ad613419pc smbd[31447]: #13 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(smbd_smb2_request_process_create+0x76e) [0x7fd1c43e5e10] Jul 22 10:44:44 ad613419pc smbd[31447]: #14 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(smbd_smb2_request_dispatch+0x15a2) [0x7fd1c43d822e] Jul 22 10:44:44 ad613419pc smbd[31447]: #15 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x26b180) [0x7fd1c43dd180] Jul 22 10:44:44 ad613419pc smbd[31447]: #16 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x26b2af) [0x7fd1c43dd2af] Jul 22 10:44:44 ad613419pc smbd[31447]: #17 /opt/TESTsamba/lib/private/libtevent.so.0(tevent_common_invoke_fd_handler+0x11d) [0x7fd1c347fb1c] Jul 22 10:44:44 ad613419pc smbd[31447]: #18 /opt/TESTsamba/lib/private/libtevent.so.0(+0x1281d) [0x7fd1c348a81d] Jul 22 10:44:44 ad613419pc smbd[31447]: #19 /opt/TESTsamba/lib/private/libtevent.so.0(+0x12f00) [0x7fd1c348af00] Jul 22 10:44:44 ad613419pc smbd[31447]: #20 /opt/TESTsamba/lib/private/libtevent.so.0(+0xf690) [0x7fd1c3487690] Jul 22 10:44:44 ad613419pc smbd[31447]: #21 /opt/TESTsamba/lib/private/libtevent.so.0(_tevent_loop_once+0x11e) [0x7fd1c347e947] Jul 22 10:44:44 ad613419pc smbd[31447]: #22 /opt/TESTsamba/lib/private/libtevent.so.0(tevent_common_loop_wait+0x25) [0x7fd1c347ec99] Jul 22 10:44:44 ad613419pc smbd[31447]: #23 /opt/TESTsamba/lib/private/libtevent.so.0(+0xf732) [0x7fd1c3487732] Jul 22 10:44:44 ad613419pc smbd[31447]: #24 /opt/TESTsamba/lib/private/libtevent.so.0(_tevent_loop_wait+0x2b) [0x7fd1c347ed3c] Jul 22 10:44:44 ad613419pc smbd[31447]: #25 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(smbd_process+0xcd0) [0x7fd1c43be46a] Jul 22 10:44:44 ad613419pc smbd[31447]: #26 /opt/TESTsamba/sbin/smbd(+0xd90d) [0x559646eeb90d] Jul 22 10:44:44 ad613419pc smbd[31447]: #27 /opt/TESTsamba/lib/private/libtevent.so.0(tevent_common_invoke_fd_handler+0x11d) [0x7fd1c347fb1c] Jul 22 10:44:44 ad613419pc smbd[31447]: #28 /opt/TESTsamba/lib/private/libtevent.so.0(+0x1281d) [0x7fd1c348a81d] Jul 22 10:44:44 ad613419pc smbd[31447]: #29 /opt/TESTsamba/lib/private/libtevent.so.0(+0x12f00) [0x7fd1c348af00] Jul 22 10:44:44 ad613419pc smbd[31447]: #30 /opt/TESTsamba/lib/private/libtevent.so.0(+0xf690) [0x7fd1c3487690] Jul 22 10:44:44 ad613419pc smbd[31447]: #31 /opt/TESTsamba/lib/private/libtevent.so.0(_tevent_loop_once+0x11e) [0x7fd1c347e947] Jul 22 10:44:44 ad613419pc smbd[31447]: #32 /opt/TESTsamba/lib/private/libtevent.so.0(tevent_common_loop_wait+0x25) [0x7fd1c347ec99] Jul 22 10:44:44 ad613419pc smbd[31447]: #33 /opt/TESTsamba/lib/private/libtevent.so.0(+0xf732) [0x7fd1c3487732] Jul 22 10:44:44 ad613419pc smbd[31447]: #34 /opt/TESTsamba/lib/private/libtevent.so.0(_tevent_loop_wait+0x2b) [0x7fd1c347ed3c] Jul 22 10:44:44 ad613419pc smbd[31447]: #35 /opt/TESTsamba/sbin/smbd(+0xe634) [0x559646eec634] Jul 22 10:44:44 ad613419pc smbd[31447]: #36 /opt/TESTsamba/sbin/smbd(main+0x1c3f) [0x559646eeeb69] Jul 22 10:44:44 ad613419pc smbd[31447]: #37 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7fd1c0761bf7] Jul 22 10:44:44 ad613419pc smbd[31447]: #38 /opt/TESTsamba/sbin/smbd(_start+0x2a) [0x559646ee46da] Jul 22 10:44:44 ad613419pc smbd[31447]: [2021/07/22 10:44:44.791649, 0] ../../source3/lib/util.c:693(smb_panic_s3) Jul 22 10:44:44 ad613419pc smbd[31447]: smb_panic(): calling panic action [/bin/sleep 999999] (gdb) bt #0 0x00007fd1c0824492 in __GI___waitpid (pid=31451, stat_loc=stat_loc@entry=0x7ffed1e1d968, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30 #1 0x00007fd1c078f177 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:149 #2 0x00007fd1c2e370f5 in smb_panic_s3 (why=0x7fd1c444f948 "Could not store share mode data\n") at ../../source3/lib/util.c:694 #3 0x00007fd1c3effa2d in smb_panic (why=0x7fd1c444f948 "Could not store share mode data\n") at ../../lib/util/fault.c:197 #4 0x00007fd1c41fe899 in share_mode_lock_destructor (lck=0x559648fb5750) at ../../source3/locking/share_mode_lock.c:979 #5 0x00007fd1c3ad19dc in _tc_free_internal (tc=0x559648fb56f0, location=0x7fd1c44d0548 "../../source3/smbd/open.c:4010") at ../../lib/talloc/talloc.c:1158 #6 0x00007fd1c3ad1d58 in _talloc_free_internal (ptr=0x559648fb5750, location=0x7fd1c44d0548 "../../source3/smbd/open.c:4010") at ../../lib/talloc/talloc.c:1248 #7 0x00007fd1c3ad30be in _talloc_free (ptr=0x559648fb5750, location=0x7fd1c44d0548 "../../source3/smbd/open.c:4010") at ../../lib/talloc/talloc.c:1792 #8 0x00007fd1c438f1c8 in open_file_ntcreate (conn=0x559648fb3320, req=0x559648fb4c50, access_mask=1179785, share_access=3, create_disposition=1, create_options=4194368, new_dos_attributes=0, oplock_request=0, lease=0x559648fca1c0, private_flags=0, parent_dir_fname=0x559648fc92c0, smb_fname_atname=0x559648fc9550, pinfo=0x7ffed1e1deb4, fsp=0x559648fa16c0) at ../../source3/smbd/open.c:4010 #9 0x00007fd1c4393be8 in create_file_unixpath (conn=0x559648fb3320, req=0x559648fb4c50, smb_fname=0x559648f8f470, access_mask=1179785, share_access=3, create_disposition=1, create_options=4194368, file_attributes=0, oplock_request=256, lease=0x559648fca1c0, allocation_size=0, private_flags=0, sd=0x0, ea_list=0x0, result=0x7ffed1e1e040, pinfo=0x7ffed1e1e024) at ../../source3/smbd/open.c:5898 #10 0x00007fd1c43948aa in create_file_default (conn=0x559648fb3320, req=0x559648fb4c50, smb_fname=0x559648f8f470, access_mask=1179785, share_access=3, create_disposition=1, create_options=4194368, file_attributes=0, oplock_request=256, lease=0x559648fca1c0, allocation_size=0, private_flags=0, sd=0x0, ea_list=0x0, result=0x559648fca138, pinfo=0x559648fca14c, in_context_blobs=0x7ffed1e1e3c8, out_context_blobs=0x559648fc7060) at ../../source3/smbd/open.c:6214 #11 0x00007fd1c4302446 in vfswrap_create_file (handle=0x559648fcc670, req=0x559648fb4c50, smb_fname=0x559648f8f470, access_mask=1179785, share_access=3, create_disposition=1, create_options=4194368, file_attributes=0, oplock_request=256, lease=0x559648fca1c0, allocation_size=0, private_flags=0, sd=0x0, ea_list=0x0, result=0x559648fca138, pinfo=0x559648fca14c, in_context_blobs=0x7ffed1e1e3c8, out_context_blobs=0x559648fc7060) at ../../source3/modules/vfs_default.c:755 #12 0x00007fd1c43a0aae in smb_vfs_call_create_file (handle=0x559648fcc670, req=0x559648fb4c50, smb_fname=0x559648f8f470, access_mask=1179785, share_access=3, create_disposition=1, create_options=4194368, file_attributes=0, oplock_request=256, lease=0x559648fca1c0, allocation_size=0, private_flags=0, sd=0x0, ea_list=0x0, result=0x559648fca138, pinfo=0x559648fca14c, in_context_blobs=0x7ffed1e1e3c8, out_context_blobs=0x559648fc7060) at ../../source3/smbd/vfs.c:1873 #13 0x00007fd1c43e8464 in smbd_smb2_create_send (mem_ctx=0x559648fd0360, ev=0x559648f793f0, smb2req=0x559648fd0360, in_oplock_level=255 '\377', in_impersonation_level=0, in_desired_access=1179785, in_file_attributes=0, in_share_access=3, in_create_disposition=1, in_create_options=4194368, in_name=0x559648fd07f0 "scripts\\main.cmd", in_context_blobs=...) at ../../source3/smbd/smb2_create.c:1013 #14 0x00007fd1c43e5e10 in smbd_smb2_request_process_create (smb2req=0x559648fd0360) at ../../source3/smbd/smb2_create.c:268 #15 0x00007fd1c43d822e in smbd_smb2_request_dispatch (req=0x559648fd0360) at ../../source3/smbd/smb2_server.c:3394 #16 0x00007fd1c43dd180 in smbd_smb2_io_handler (xconn=0x559648fb1750, fde_flags=1) at ../../source3/smbd/smb2_server.c:5003 #17 0x00007fd1c43dd2af in smbd_smb2_connection_handler (ev=0x559648f793f0, fde=0x559648fb1070, flags=1, private_data=0x559648fb1750) at ../../source3/smbd/smb2_server.c:5041 #18 0x00007fd1c347fb1c in tevent_common_invoke_fd_handler (fde=0x559648fb1070, flags=1, removed=0x0) at ../../lib/tevent/tevent_fd.c:142 #19 0x00007fd1c348a81d in epoll_event_loop (epoll_ev=0x559648f8c060, tvalp=0x7ffed1e1e750) at ../../lib/tevent/tevent_epoll.c:736 #20 0x00007fd1c348af00 in epoll_event_loop_once (ev=0x559648f793f0, location=0x7fd1c44dfdb8 "../../source3/smbd/process.c:4246") at ../../lib/tevent/tevent_epoll.c:937 #21 0x00007fd1c3487690 in std_event_loop_once (ev=0x559648f793f0, location=0x7fd1c44dfdb8 "../../source3/smbd/process.c:4246") at ../../lib/tevent/tevent_standard.c:110 #22 0x00007fd1c347e947 in _tevent_loop_once (ev=0x559648f793f0, location=0x7fd1c44dfdb8 "../../source3/smbd/process.c:4246") at ../../lib/tevent/tevent.c:790 #23 0x00007fd1c347ec99 in tevent_common_loop_wait (ev=0x559648f793f0, location=0x7fd1c44dfdb8 "../../source3/smbd/process.c:4246") at ../../lib/tevent/tevent.c:913 #24 0x00007fd1c3487732 in std_event_loop_wait (ev=0x559648f793f0, location=0x7fd1c44dfdb8 "../../source3/smbd/process.c:4246") at ../../lib/tevent/tevent_standard.c:141 #25 0x00007fd1c347ed3c in _tevent_loop_wait (ev=0x559648f793f0, location=0x7fd1c44dfdb8 "../../source3/smbd/process.c:4246") at ../../lib/tevent/tevent.c:932 #26 0x00007fd1c43be46a in smbd_process (ev_ctx=0x559648f793f0, msg_ctx=0x559648f72940, dce_ctx=0x559648f72f30, sock_fd=48, interactive=false) at ../../source3/smbd/process.c:4246 #27 0x0000559646eeb90d in smbd_accept_connection (ev=0x559648f793f0, fde=0x559648fb7580, flags=1, private_data=0x559648facc40) at ../../source3/smbd/server.c:1021 #28 0x00007fd1c347fb1c in tevent_common_invoke_fd_handler (fde=0x559648fb7580, flags=1, removed=0x0) at ../../lib/tevent/tevent_fd.c:142 #29 0x00007fd1c348a81d in epoll_event_loop (epoll_ev=0x559648f8a000, tvalp=0x7ffed1e1eb00) at ../../lib/tevent/tevent_epoll.c:736 #30 0x00007fd1c348af00 in epoll_event_loop_once (ev=0x559648f793f0, location=0x559646ef3b20 "../../source3/smbd/server.c:1365") at ../../lib/tevent/tevent_epoll.c:937 #31 0x00007fd1c3487690 in std_event_loop_once (ev=0x559648f793f0, location=0x559646ef3b20 "../../source3/smbd/server.c:1365") at ../../lib/tevent/tevent_standard.c:110 #32 0x00007fd1c347e947 in _tevent_loop_once (ev=0x559648f793f0, location=0x559646ef3b20 "../../source3/smbd/server.c:1365") at ../../lib/tevent/tevent.c:790 #33 0x00007fd1c347ec99 in tevent_common_loop_wait (ev=0x559648f793f0, location=0x559646ef3b20 "../../source3/smbd/server.c:1365") at ../../lib/tevent/tevent.c:913 #34 0x00007fd1c3487732 in std_event_loop_wait (ev=0x559648f793f0, location=0x559646ef3b20 "../../source3/smbd/server.c:1365") at ../../lib/tevent/tevent_standard.c:141 #35 0x00007fd1c347ed3c in _tevent_loop_wait (ev=0x559648f793f0, location=0x559646ef3b20 "../../source3/smbd/server.c:1365") at ../../lib/tevent/tevent.c:932 #36 0x0000559646eec634 in smbd_parent_loop (ev_ctx=0x559648f793f0, parent=0x559648f8e4e0) at ../../source3/smbd/server.c:1365 #37 0x0000559646eeeb69 in main (argc=3, argv=0x7ffed1e1f128) at ../../source3/smbd/server.c:2234
FYI, I'll try and take a look at this today, but I'm out on vacation from tomorrow (July 23rd) - August 2nd. I'm adding Ralph and Metze to this bug in case they have time to follow-up whilst I'm out. Thanks for your patience in working on this with us.
Oh that log.smbd is very interesting. Specifically: 1195724 set_file_oplock: granted oplock on file scripts/main.cmd, 2049:1179655:0/7963, tv_sec = 60f928ae, tv_usec = 9b0cd 1195725 [2021/07/22 10:13:34.688219, 5] ../../lib/dbwrap/dbwrap.c:148(dbwrap_lock_order_lock) 1195726 dbwrap_lock_order_lock: check lock order 1 for /opt/TESTsamba/var/lock/locking.tdb 1195727 [2021/07/22 10:13:34.688216, 1] ../../source3/lib/g_lock.c:1000(g_lock_writev_data) 1195728 g_lock_writev_data: g_lock_writev_data_fn failed: NT_STATUS_NOT_LOCKED 1195729 [2021/07/22 10:13:34.688545, 0] ../../source3/locking/share_mode_lock.c:2166(share_mode_entry_do) 1195730 [2021/07/22 10:13:34.688554, 0] ../../source3/locking/share_mode_lock.c:2166(share_mode_entry_do) 1195731 share_mode_entry_do: JRATEST: num_share_modes=0 1195732 [2021/07/22 10:13:34.688607, 0] ../../source3/locking/share_mode_lock.c:2179(share_mode_entry_do) 1195733 share_mode_entry_do: JRATEST: num_share_modes=0 1195734 share_mode_entry_do: JRATEST: Did not find share mode entry for 7963 1195735 [2021/07/22 10:13:34.688632, 0] ../../source3/smbd/close.c:519(close_remove_share_mode) 1195736 [2021/07/22 10:13:34.688633, 0] ../../source3/locking/share_mode_lock.c:2179(share_mode_entry_do) 1195737 close_remove_share_mode: Could not delete share entry for file scripts/main.cmd 1195738 share_mode_entry_do: JRATEST: Did not find share mode entry for 306 1195739 [2021/07/22 10:13:34.688771, 2] ../../source3/smbd/close.c:824(close_normal_file) 1195740 [2021/07/22 10:13:34.688774, 0] ../../source3/smbd/close.c:519(close_remove_share_mode) 1195741 nclanonymous closed file scripts/main.cmd (numopen=3) NT_STATUS_OK 1195742 close_remove_share_mode: Could not delete share entry for file scripts/main.cmd 1195743 [2021/07/22 10:13:34.688883, 5] ../../lib/dbwrap/dbwrap.c:148(dbwrap_lock_order_lock) 1195744 dbwrap_lock_order_lock: check lock order 1 for /opt/TESTsamba/var/lock/smbXsrv_open_global.tdb 1195745 [2021/07/22 10:13:34.688913, 0] ../../lib/dbwrap/dbwrap.c:164(dbwrap_lock_order_lock) 1195746 dbwrap_lock_order_lock: Lock order violation: Trying /opt/TESTsamba/var/lock/smbXsrv_open_global.tdb at 1 while /opt/TESTsamba/var/lock/lock 1195747 [2021/07/22 10:13:34.688940, 0] ../../lib/dbwrap/dbwrap.c:131(debug_lock_order) 1195748 [2021/07/22 10:13:34.688942, 5] ../../lib/dbwrap/dbwrap.c:180(dbwrap_lock_order_unlock) 1195749 lock order: 1:/opt/TESTsamba/var/lock/locking.tdb 2:<none> 3:<none> 4:<none> 1195750 dbwrap_lock_order_unlock: release lock order 1 for /opt/TESTsamba/var/lock/locking.tdb 1195751 [2021/07/22 10:13:34.688967, 0] ../../lib/util/fault.c:172(smb_panic_log) 1195752 =============================================================== 1195753 [2021/07/22 10:13:34.689022, 0] ../../lib/util/fault.c:176(smb_panic_log) 1195754 [2021/07/22 10:13:34.689015, 2] ../../source3/smbd/close.c:824(close_normal_file) 1195755 nclanonymous closed file scripts/main.cmd (numopen=0) NT_STATUS_OK 1195756 INTERNAL ERROR: lock order violation in pid 14494 (4.14.6)
Ok, enjoy your holidays :-)
OK, I've gotten so desperate I've tried to reproduce this using your Simulate.ps1 script on a Windows 10 laptop. I don't have much ram so I'm using -startNumberOfJobs 20. I've left this running for 20+ minutes or so against current master (with my patch from: https://bugzilla.samba.org/attachment.cgi?id=16690 applied, as I think that's needed) and I haven't had any crashes at all. Can you check if it reproduces for you with startNumberOfJobs 20 ?
Created attachment 16697 [details] git-am fix for 4.14.next. Also applies to master. Prevents DBG_ERR message from being logged when we're closing a file out early in the error path without having created a share mode (file not found etc.).
What Linux kernel are you running on your box ? I'm starting to suspect an error in your kernel in the pthread_mutex_lock code we now use by default inside lib/tdb/mutex.c My kernel here is Ubuntu 20.04LTS Linux 5.8.0-59-generic. If you can build yourself, can you try building with --disable-tdb-mutex-locking ? That will fall back to the old style fcntl tdb locking instead.
(In reply to Jeremy Allison from comment #27) 'dbwrap_tdb_mutexes:*=no' in the global section should also do the trick without recompiling, at least for everything going via dbwrap_open()
(In reply to Stefan Metzmacher from comment #28) Thanks Metze, I thought there must be a user-config way to do this but didn't have time to find it as I'm trying to wind-down my TO-DO's for vacation. Michael, Metze's config file change is obviously quicker to use than rebuilding everything to check if it's a tdb mutex issue :-). Thanks once again for your help Metze !
Created attachment 16698 [details] git-am fix for master, 4.14.next, 4.15.rc2 Also applies to master. Prevents DBG_ERR message from being logged when we're closing a file or directory out early in the error path without having created a share mode (file/directory not found etc.).
(In reply to Jeremy Allison from comment #25) You have to start this script on at least 2 Windows boxes. With only one I also cannot reproduce this issue.
(In reply to Stefan Metzmacher from comment #28) I'm on Ubuntu 16.04LTS 4.15.0-150-generic #155-Ubuntu SMP Sat Jul 3 13:37:31 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux I tried both. dbwrap_tdb_mutexes:*=no and recompiling with the --disable-tdb-mutex-locking option. smbd crashes again. This behaviour is different: The panic isn't log to syslog. Panic happens at 09:00:53 :~# ps -eo pid,lstart,cmd -q 21623 PID STARTED CMD 21623 Fri Jul 23 09:00:53 2021 sh -c /bin/sleep 999999 I attached the log.smbd (gdb) bt #0 0x00007f4da8f61492 in __GI___waitpid (pid=21623, stat_loc=stat_loc@entry=0x7fff5fcfae48, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30 #1 0x00007f4da8ecc177 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:149 #2 0x00007f4dab3a6d4e in smb_panic_s3 (why=0x7f4dac6f2914 "Could not unlock share mode\n") at ../../source3/lib/util.c:840 #3 0x00007f4daca0281d in smb_panic (why=0x7f4dac6f2914 "Could not unlock share mode\n") at ../../lib/util/fault.c:197 #4 0x00007f4dac4a5202 in share_mode_lock_destructor (lck=0x55ec5a6ed0f0) at ../../source3/locking/share_mode_lock.c:973 #5 0x00007f4dab5e597c in _tc_free_internal (tc=0x55ec5a6ed090, location=0x7f4dac775ce8 "../../source3/smbd/close.c:522") at ../../lib/talloc/talloc.c:1158 #6 0x00007f4dab5e5cf8 in _talloc_free_internal (ptr=0x55ec5a6ed0f0, location=0x7f4dac775ce8 "../../source3/smbd/close.c:522") at ../../lib/talloc/talloc.c:1248 #7 0x00007f4dab5e705e in _talloc_free (ptr=0x55ec5a6ed0f0, location=0x7f4dac775ce8 "../../source3/smbd/close.c:522") at ../../lib/talloc/talloc.c:1792 #8 0x00007f4dac63c1f0 in close_remove_share_mode (fsp=0x55ec5a697cb0, close_type=NORMAL_CLOSE) at ../../source3/smbd/close.c:522 #9 0x00007f4dac63cb51 in close_normal_file (req=0x55ec5a6ea550, fsp=0x55ec5a697cb0, close_type=NORMAL_CLOSE) at ../../source3/smbd/close.c:791 #10 0x00007f4dac63e6d3 in close_file (req=0x55ec5a6ea550, fsp=0x55ec5a697cb0, close_type=NORMAL_CLOSE) at ../../source3/smbd/close.c:1346 #11 0x00007f4dac6914d8 in smbd_smb2_close (req=0x55ec5a6e9e10, fsp=0x55ec5a697cb0, in_flags=1, out_flags=0x55ec5a6ea3a2, out_creation_ts=0x55ec5a6ea3a8, out_last_access_ts=0x55ec5a6ea3b8, out_last_write_ts=0x55ec5a6ea3c8, out_change_ts=0x55ec5a6ea3d8, out_allocation_size=0x55ec5a6ea3e8, out_end_of_file=0x55ec5a6ea3f0, out_file_attributes=0x55ec5a6ea3f8) at ../../source3/smbd/smb2_close.c:277 #12 0x00007f4dac6918aa in smbd_smb2_close_send (mem_ctx=0x55ec5a6e9e10, ev=0x55ec5a69b100, smb2req=0x55ec5a6e9e10, in_fsp=0x55ec5a697cb0, in_flags=1) at ../../source3/smbd/smb2_close.c:393 #13 0x00007f4dac690a8f in smbd_smb2_request_process_close (req=0x55ec5a6e9e10) at ../../source3/smbd/smb2_close.c:72 #14 0x00007f4dac67d916 in smbd_smb2_request_dispatch (req=0x55ec5a6e9e10) at ../../source3/smbd/smb2_server.c:3309 #15 0x00007f4dac682821 in smbd_smb2_io_handler (xconn=0x55ec5a698570, fde_flags=1) at ../../source3/smbd/smb2_server.c:4913 #16 0x00007f4dac682950 in smbd_smb2_connection_handler (ev=0x55ec5a69b100, fde=0x55ec5a6aef40, flags=1, private_data=0x55ec5a698570) at ../../source3/smbd/smb2_server.c:4951 #17 0x00007f4dabc00095 in tevent_common_invoke_fd_handler (fde=0x55ec5a6aef40, flags=1, removed=0x0) at ../../lib/tevent/tevent_fd.c:138 #18 0x00007f4dabc0aafc in epoll_event_loop (epoll_ev=0x55ec5a6aedd0, tvalp=0x7fff5fcfb7a0) at ../../lib/tevent/tevent_epoll.c:736 #19 0x00007f4dabc0b1df in epoll_event_loop_once (ev=0x55ec5a69b100, location=0x7f4dac782b88 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent_epoll.c:937 #20 0x00007f4dabc07941 in std_event_loop_once (ev=0x55ec5a69b100, location=0x7f4dac782b88 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent_standard.c:110 #21 0x00007f4dabbff2b4 in _tevent_loop_once (ev=0x55ec5a69b100, location=0x7f4dac782b88 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent.c:772 #22 0x00007f4dabbff606 in tevent_common_loop_wait (ev=0x55ec5a69b100, location=0x7f4dac782b88 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent.c:895 #23 0x00007f4dabc079e3 in std_event_loop_wait (ev=0x55ec5a69b100, location=0x7f4dac782b88 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent_standard.c:141 #24 0x00007f4dabbff6a9 in _tevent_loop_wait (ev=0x55ec5a69b100, location=0x7f4dac782b88 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent.c:914 #25 0x00007f4dac664179 in smbd_process (ev_ctx=0x55ec5a69b100, msg_ctx=0x55ec5a696780, dce_ctx=0x55ec5a696650, sock_fd=47, interactive=false) at ../../source3/smbd/process.c:4232 #26 0x000055ec59837185 in smbd_accept_connection (ev=0x55ec5a69b100, fde=0x55ec5a6d8200, flags=1, private_data=0x55ec5a6d2340) at ../../source3/smbd/server.c:1020 #27 0x00007f4dabc00095 in tevent_common_invoke_fd_handler (fde=0x55ec5a6d8200, flags=1, removed=0x0) at ../../lib/tevent/tevent_fd.c:138 #28 0x00007f4dabc0aafc in epoll_event_loop (epoll_ev=0x55ec5a6ad210, tvalp=0x7fff5fcfbb50) at ../../lib/tevent/tevent_epoll.c:736 #29 0x00007f4dabc0b1df in epoll_event_loop_once (ev=0x55ec5a69b100, location=0x55ec5983f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent_epoll.c:937 #30 0x00007f4dabc07941 in std_event_loop_once (ev=0x55ec5a69b100, location=0x55ec5983f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent_standard.c:110 #31 0x00007f4dabbff2b4 in _tevent_loop_once (ev=0x55ec5a69b100, location=0x55ec5983f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent.c:772 #32 0x00007f4dabbff606 in tevent_common_loop_wait (ev=0x55ec5a69b100, location=0x55ec5983f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent.c:895 #33 0x00007f4dabc079e3 in std_event_loop_wait (ev=0x55ec5a69b100, location=0x55ec5983f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent_standard.c:141 #34 0x00007f4dabbff6a9 in _tevent_loop_wait (ev=0x55ec5a69b100, location=0x55ec5983f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent.c:914 #35 0x000055ec59837ea7 in smbd_parent_loop (ev_ctx=0x55ec5a69b100, parent=0x55ec5a6b12c0) at ../../source3/smbd/server.c:1367 #36 0x000055ec5983a221 in main (argc=3, argv=0x7fff5fcfc178) at ../../source3/smbd/server.c:2220 Will also try it on Ubuntu 20.04LTS (hopefully today)
Created attachment 16700 [details] log for dbwrap_tdb_mutexes:*=no and recompiling with --disable-tdb-mutex-locking
I tested 4.14.6 with Extra debug for 4.14.x (5.65 KB, patch) appied on Ubuntu: # uname -a Linux mySambaTestServer 5.8.0-63-generic #71~20.04.1-Ubuntu SMP Thu Jul 15 17:46:08 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux I couldn't crash smbd on this system.
OK, just got back from vacation and trying to catch up :-). Can I confirm that with this patch: https://bugzilla.samba.org/attachment.cgi?id=16698 applied to 4.14.6 running on Ubuntu 20.04 with this kernel: Linux mySambaTestServer 5.8.0-63-generic #71~20.04.1-Ubuntu SMP Thu Jul 15 17:46:08 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux with multiple Windows clients, you do *NOT* see any crashes ? Is this build using tdb mutexes or not ? I'm trying to determine if there is a kernel for which this bug does not *ever* manifest. If this is so, then it isn't a Samba bug, but a glibc/kernel one. FYI, I don't have multiple Windows clients, so it's hard to reproduce this locally for me.
(In reply to Jeremy Allison from comment #35) Ok, short Summary :-) Not affected all Samba Versions <= 4.12.x regartless from the Kernel/glic/distribution Affected: All Samba versions >= 4.13.x on Ubuntu 16.04 and CentOS 8 regardless if "dbwrap_tdb_mutexes:*=no and recompiling with the --disable-tdb-mutex-locking" is set or not. Not affected: Ubuntu 20.04
(In reply to Michael Albert from comment #36) Yes this patch https://bugzilla.samba.org/attachment.cgi?id=16698 was applied on all tests
So this looks to me like a kernel/glibc issue on Ubuntu 16.04 and CentOS 8. If the problem just "goes away" on Ubuntu 20.04 then it can't be a logic error in the Samba code - it would manifest in all circumstances then. Ralph, Metze - I'm going to submit the patch in: https://bugzilla.samba.org/attachment.cgi?id=16698 to ci, and try and get it into next releases. After that, I think we need to point Red Hat kernel engineers at this to try and figure out what is going wrong futex-wise.
Oh, unfortunately the fix in: https://bugzilla.samba.org/attachment.cgi?id=16698 setting the is_fsa is incorrect :-(. Breaks streams file close tests. I'll need to re-work this. Hold on this space :-).
The problem is I was only setting fsp->fsp_flags.is_fsa inside fsp_bind_smb() once the smbXsrv_open_create() succeeded. When opening the base_fsp for a stream open, we pass in req==NULL (meaning internal open for base_fsp) which skips setting fsp->fsp_flags.is_fsa. I'm starting to think 9c277b27dec568677c0b407497da6eb95ae7cb8d is incorrect..
No, 9c277b27dec568677c0b407497da6eb95ae7cb8d is good - it's my code that's wrong :-). Still working on it :-).
Michael, can you do me a favor ? Can you try running your tests against stock 4.14.6 running on Ubuntu 20.04 *WITHOUT* the patch in: https://bugzilla.samba.org/attachment.cgi?id=16698 applied and see if they pass ? I'm trying to figure out if I'm chasing a ghost here. The original problem that the is_fsa modification was trying to fix was a locking order violation when create_file_unixpath() errors out after calling fsp_bind_smb(), or file_new() but before creating a share mode locking entry. I've been trying to reproduce this locally by forcing a 'goto fail;' call after the fsp_bind_smb() or file_new() for a specific canary filename, then running an existing test that uses that filename to see if I get the same smb_panic lock order violation that you did. I don't, which leads me to believe the original lock order violation problem may have been an artifact of previous locking failures that the process was getting when running on CentOS 8 and Ubuntu 16.04.
(In reply to Jeremy Allison from comment #42) Could also NOT crash smbd with Stock 4.14.6 (without patches) on Ubuntu 20.04
Ah, as I suspected - my patch was a band-aid on top of the broken locking in CentOS8 and Ubuntu 16.04 and so not really needed. The existing code is already good. Not sure where to go with this - I think it's worth leaving open at least for a short while in the hopes I can get Red Hat engineers to take a look at it. But if it's already fixed in their latest releases there's not much hope that it is a priority, sorry.
(In reply to Jeremy Allison from comment #44) Jeremy, tried it on Redhat 8 (4.18.0-305.10.2.el8_4.x86_64 #1 SMP Mon Jul 12 04:43:18 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux) , build stock samba 4.14.6 without any patch and couldn't crash it. May be they backported some features to their default kernel?
That kind of confirms it. I'm going to close this one out as I don't think it's a Samba issue, and Red Hat 8 also doesn't suffer from this problem.
(In reply to Jeremy Allison from comment #46) Do we at least know what the problem is?
share mode locking is failing on these systems on 4.14 and above. I can't afford to spend more time on it unfortunately when updated systems work :-(.