Bug 14763 - smbd cores on many parallel reads
Summary: smbd cores on many parallel reads
Status: RESOLVED INVALID
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.14.6
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Jeremy Allison
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-07-16 16:51 UTC by Michael Albert
Modified: 2021-08-06 20:36 UTC (History)
2 users (show)

See Also:


Attachments
Related files (3.76 MB, application/x-7z-compressed)
2021-07-16 16:51 UTC, Michael Albert
no flags Details
Simulate.ps1 reupload (1.92 KB, text/x-matlab)
2021-07-19 09:04 UTC, Michael Albert
no flags Details
git-am fix for 4.14.next. (1.43 KB, patch)
2021-07-19 16:36 UTC, Jeremy Allison
no flags Details
unencrypted trace (514.46 KB, application/x-7z-compressed)
2021-07-20 20:06 UTC, Michael Albert
no flags Details
Extra debug for 4.14.x (5.65 KB, patch)
2021-07-21 21:19 UTC, Jeremy Allison
no flags Details
Trace and log for Debug patch (4.90 MB, application/x-7z-compressed)
2021-07-22 08:19 UTC, Michael Albert
no flags Details
git-am fix for 4.14.next. (2.63 KB, patch)
2021-07-22 23:38 UTC, Jeremy Allison
no flags Details
git-am fix for master, 4.14.next, 4.15.rc2 (3.86 KB, patch)
2021-07-23 02:27 UTC, Jeremy Allison
no flags Details
log for dbwrap_tdb_mutexes:*=no and recompiling with --disable-tdb-mutex-locking (7.45 MB, application/x-7z-compressed)
2021-07-23 07:11 UTC, Michael Albert
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Albert 2021-07-16 16:51:48 UTC
Created attachment 16687 [details]
Related files

We've migrated a simple fileshare ( smb.conf => security = user ) from Redhat 6.10 (samba 3.x to CentOS 8) and got trouble with samba since then.

It seems, too many parallel read crashes smbd.

There is only one share and one user (nclanonymous). This user connects the share (readonly) from 500 Windows Clients and executes some batch files lying on the share. The script main script (28kb, 600 lines) is executed once an hour and the start time differs between the clients.

Since CentOS updates samba to 4.13.x the smbd crashs/cores each hour. I've started some investigation and all version > 4.12 are affected, since the first 4.13rc releases.

The error is simple to reproduce. I wrote a powershell script which "fakes" a batch file read behaviour (open file, Read 8191 bytes, close file, pos to next line, open file, read 8191 bytes and so on...)

Setup:


tar -xvzf samba-4.14.6.tar.gz
./configure --prefix=/opt/TESTsamba --disable-python --without-winbind --without-ads --without-ldap --without-quotas --without-acl-support --without-dnsupdate --without-automount --disable-glusterfs --disable-cephfs --without-ad-dc --with-shared-modules='!vfs_snapper'
make
make install


The shared folder:

mkdir -p /storage/sdc1/test/scripts/	

and the file to open:

/storage/sdc1/test/scripts/main.cmd 

In this test main.cmd this is  simple a text copy (DOS format \r\n) of https://www.samba.org/samba/docs/current/man-html/smb.conf.5.html. The origin file was a Windows cmd file. 
For the test I used the main.cmd file with ~5800  lines:
wc -l /storage/sdc1/test/scripts/main.cmd
5859 /storage/sdc1/test/scripts/main.cmd

Start test:

Version check:
/opt/TESTsamba/sbin/smbd -V
Version 4.14.6


Systemd start command line:
[Service]
Type=forking
PIDFile=/opt/TESTsamba/var/run/smbd.pid
EnvironmentFile=-/opt/TESTsamba/etc/samba
ExecStart=/opt/TESTsamba/sbin/smbd -D --configfile=/opt/TESTsamba/etc/smb.conf

TCP Dump command line:
tcpdump -i eth0 port 445 -w /tmp/trace.cap

On two Windows 10 machines I started my script (Simulate.ps1) at the same time which simulate reading a batch file (96 parallel on each):

net use \\myTestSambaServer\test$ /user:nclanonymous

.\Simulate.ps1 -file \\myTestSambaServer\test$\scripts\main.cmd -startNumberOfJobs 96

.\Simulate.ps1 -file \\myTestSambaServer\test$\scripts\main.cmd -startNumberOfJobs 96

Attention: This opens 96 powershell Windows. Make sure you have enough RAM available :-)

After ~ 1 min this results in 

Jul 16 13:37:59 myTestSambaServer smbd[1919]: [2021/07/16 13:37:59.169424,  0, pid=1919, effective(0, 0), real(0, 0)] ../../lib/util/become_daemon.c:136(daemon_ready)
Jul 16 13:37:59 myTestSambaServer smbd[1919]:   daemon_ready: daemon 'smbd' finished starting up and ready to serve connections
Jul 16 13:38:26 myTestSambaServer smbd[1925]: [2021/07/16 13:38:26.141247,  0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:26 myTestSambaServer smbd[1925]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:26 myTestSambaServer smbd[1924]: [2021/07/16 13:38:26.190340,  0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:26 myTestSambaServer smbd[1924]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:26 myTestSambaServer smbd[1925]: [2021/07/16 13:38:26.316617,  0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:26 myTestSambaServer smbd[1925]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:26 myTestSambaServer smbd[1924]: [2021/07/16 13:38:26.346205,  0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:26 myTestSambaServer smbd[1924]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:26 myTestSambaServer smbd[1925]: [2021/07/16 13:38:26.442810,  0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:26 myTestSambaServer smbd[1925]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:26 myTestSambaServer smbd[1924]: [2021/07/16 13:38:26.471659,  0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:26 myTestSambaServer smbd[1924]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:26 myTestSambaServer smbd[1925]: [2021/07/16 13:38:26.619849,  0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:309(close_remove_share_mode)
Jul 16 13:38:26 myTestSambaServer smbd[1925]:   close_remove_share_mode: Could not get share mode lock for file scripts/main.cmd
Jul 16 13:38:26 myTestSambaServer smbd[1924]: [2021/07/16 13:38:26.690228,  0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:26 myTestSambaServer smbd[1924]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:26 myTestSambaServer smbd[1925]: [2021/07/16 13:38:26.730566,  0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:26 myTestSambaServer smbd[1925]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:26 myTestSambaServer smbd[1924]: [2021/07/16 13:38:26.799636,  0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:26 myTestSambaServer smbd[1924]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:26 myTestSambaServer smbd[1925]: [2021/07/16 13:38:26.851500,  0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:26 myTestSambaServer smbd[1925]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:26 myTestSambaServer smbd[1924]: [2021/07/16 13:38:26.924460,  0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:26 myTestSambaServer smbd[1924]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:27 myTestSambaServer smbd[1925]: [2021/07/16 13:38:27.035959,  0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:27 myTestSambaServer smbd[1925]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:27 myTestSambaServer smbd[1925]: [2021/07/16 13:38:27.051619,  0, pid=1925, effective(11009, 65534), real(11009, 0), class=locking] ../../source3/locking/share_mode_lock.c:961(share_mode_lock_destructor)
Jul 16 13:38:27 myTestSambaServer smbd[1925]:   share_mode_lock_destructor: share_mode_data_store failed: NT_STATUS_NOT_LOCKED
Jul 16 13:38:27 myTestSambaServer audit[1925]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 pid=1925 comm="smbd" exe="/opt/TESTsamba/sbin/smbd" sig=6 res=1
Jul 16 13:38:27 myTestSambaServer smbd[1925]: [2021/07/16 13:38:27.051697,  0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../lib/util/fault.c:172(smb_panic_log)
Jul 16 13:38:27 myTestSambaServer smbd[1925]:   ===============================================================
Jul 16 13:38:27 myTestSambaServer smbd[1925]: [2021/07/16 13:38:27.051740,  0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../lib/util/fault.c:176(smb_panic_log)
Jul 16 13:38:27 myTestSambaServer smbd[1925]:   INTERNAL ERROR: Could not store share mode data
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    in pid 1925 (4.14.6)
Jul 16 13:38:27 myTestSambaServer smbd[1925]: [2021/07/16 13:38:27.051797,  0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../lib/util/fault.c:181(smb_panic_log)
Jul 16 13:38:27 myTestSambaServer smbd[1925]:   If you are running a recent Samba version, and if you think this problem is not yet fixed in the latest versions, please consider reporting this bug, see https://wiki.samba.org/index.php/Bug_Reporting
Jul 16 13:38:27 myTestSambaServer smbd[1925]: [2021/07/16 13:38:27.051842,  0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../lib/util/fault.c:182(smb_panic_log)
Jul 16 13:38:27 myTestSambaServer smbd[1925]:   ===============================================================
Jul 16 13:38:27 myTestSambaServer smbd[1925]: [2021/07/16 13:38:27.051883,  0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../lib/util/fault.c:184(smb_panic_log)
Jul 16 13:38:27 myTestSambaServer smbd[1925]:   PANIC (pid 1925): Could not store share mode data
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    in 4.14.6
Jul 16 13:38:27 myTestSambaServer smbd[1925]: [2021/07/16 13:38:27.054571,  0, pid=1925, effective(11009, 65534), real(11009, 0)] ../../lib/util/fault.c:288(log_stack_trace)
Jul 16 13:38:27 myTestSambaServer smbd[1925]:   BACKTRACE: 39 stack frames:
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #0 /opt/TESTsamba/lib/libsamba-util.so.0(log_stack_trace+0x2e) [0x7ff1afa1e861]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #1 /opt/TESTsamba/lib/libsamba-util.so.0(smb_panic_log+0x1b1) [0x7ff1afa1e7e6]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #2 /opt/TESTsamba/lib/libsamba-util.so.0(smb_panic+0x18) [0x7ff1afa1e801]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #3 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x8c0b1) [0x7ff1af4c10b1]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #4 /opt/TESTsamba/lib/private/libtalloc.so.2(+0x397c) [0x7ff1ae60197c]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #5 /opt/TESTsamba/lib/private/libtalloc.so.2(+0x3cf8) [0x7ff1ae601cf8]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #6 /opt/TESTsamba/lib/private/libtalloc.so.2(_talloc_free+0x105) [0x7ff1ae60305e]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #7 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x21c141) [0x7ff1af651141]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #8 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x220c0b) [0x7ff1af655c0b]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #9 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(create_file_default+0x4f1) [0x7ff1af656805]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #10 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x1909a8) [0x7ff1af5c59a8]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #11 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(smb_vfs_call_create_file+0x9b) [0x7ff1af661f2a]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #12 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x274b1b) [0x7ff1af6a9b1b]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #13 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(smbd_smb2_request_process_create+0x76e) [0x7ff1af6a75ad]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #14 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(smbd_smb2_request_dispatch+0x146b) [0x7ff1af699929]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #15 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x26984e) [0x7ff1af69e84e]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #16 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x26997d) [0x7ff1af69e97d]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #17 /opt/TESTsamba/lib/private/libtevent.so.0(tevent_common_invoke_fd_handler+0x101) [0x7ff1aec1c095]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #18 /opt/TESTsamba/lib/private/libtevent.so.0(+0x11afc) [0x7ff1aec26afc]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #19 /opt/TESTsamba/lib/private/libtevent.so.0(+0x121df) [0x7ff1aec271df]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #20 /opt/TESTsamba/lib/private/libtevent.so.0(+0xe941) [0x7ff1aec23941]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #21 /opt/TESTsamba/lib/private/libtevent.so.0(_tevent_loop_once+0x11e) [0x7ff1aec1b2b4]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #22 /opt/TESTsamba/lib/private/libtevent.so.0(tevent_common_loop_wait+0x25) [0x7ff1aec1b606]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #23 /opt/TESTsamba/lib/private/libtevent.so.0(+0xe9e3) [0x7ff1aec239e3]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #24 /opt/TESTsamba/lib/private/libtevent.so.0(_tevent_loop_wait+0x2b) [0x7ff1aec1b6a9]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #25 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(smbd_process+0xcd0) [0x7ff1af6801a6]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #26 /opt/TESTsamba/sbin/smbd(+0xd185) [0x55dadfb38185]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #27 /opt/TESTsamba/lib/private/libtevent.so.0(tevent_common_invoke_fd_handler+0x101) [0x7ff1aec1c095]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #28 /opt/TESTsamba/lib/private/libtevent.so.0(+0x11afc) [0x7ff1aec26afc]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #29 /opt/TESTsamba/lib/private/libtevent.so.0(+0x121df) [0x7ff1aec271df]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #30 /opt/TESTsamba/lib/private/libtevent.so.0(+0xe941) [0x7ff1aec23941]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #31 /opt/TESTsamba/lib/private/libtevent.so.0(_tevent_loop_once+0x11e) [0x7ff1aec1b2b4]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #32 /opt/TESTsamba/lib/private/libtevent.so.0(tevent_common_loop_wait+0x25) [0x7ff1aec1b606]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #33 /opt/TESTsamba/lib/private/libtevent.so.0(+0xe9e3) [0x7ff1aec239e3]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #34 /opt/TESTsamba/lib/private/libtevent.so.0(_tevent_loop_wait+0x2b) [0x7ff1aec1b6a9]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #35 /opt/TESTsamba/sbin/smbd(+0xdea7) [0x55dadfb38ea7]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #36 /opt/TESTsamba/sbin/smbd(main+0x1a71) [0x55dadfb3b221]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #37 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7ff1abebabf7]
Jul 16 13:38:27 myTestSambaServer smbd[1925]:    #38 /opt/TESTsamba/sbin/smbd(_start+0x2a) [0x55dadfb3114a]
Jul 16 13:38:27 myTestSambaServer smbd[1925]: [2021/07/16 13:38:27.055490,  0, pid=1925, effective(0, 0), real(0, 0)] ../../source3/lib/dumpcore.c:318(dump_core)
Jul 16 13:38:27 myTestSambaServer smbd[1925]:   coredump is handled by helper binary specified at /proc/sys/kernel/core_pattern
Jul 16 13:38:27 myTestSambaServer smbd[1925]:
Jul 16 13:38:28 myTestSambaServer smbd[1924]: [2021/07/16 13:38:28.517959,  0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:28 myTestSambaServer smbd[1924]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:28 myTestSambaServer smbd[1931]: [2021/07/16 13:38:28.630492,  0, pid=1931, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:309(close_remove_share_mode)
Jul 16 13:38:28 myTestSambaServer smbd[1931]:   close_remove_share_mode: Could not get share mode lock for file scripts/main.cmd
Jul 16 13:38:28 myTestSambaServer smbd[1924]: [2021/07/16 13:38:28.722931,  0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:28 myTestSambaServer smbd[1924]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:28 myTestSambaServer smbd[1931]: [2021/07/16 13:38:28.787486,  0, pid=1931, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:28 myTestSambaServer smbd[1931]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:29 myTestSambaServer smbd[1931]: [2021/07/16 13:38:29.029613,  0, pid=1931, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:29 myTestSambaServer smbd[1931]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:29 myTestSambaServer audit: ANOM_PROMISCUOUS dev=eth0 prom=0 old_prom=256 auid=0 uid=0 gid=0 ses=2
Jul 16 13:38:29 myTestSambaServer kernel: device eth0 left promiscuous mode
Jul 16 13:38:29 myTestSambaServer smbd[1924]: [2021/07/16 13:38:29.065437,  0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:29 myTestSambaServer smbd[1924]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:29 myTestSambaServer smbd[1931]: [2021/07/16 13:38:29.121192,  0, pid=1931, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:309(close_remove_share_mode)
Jul 16 13:38:29 myTestSambaServer smbd[1931]:   close_remove_share_mode: Could not get share mode lock for file scripts/main.cmd
Jul 16 13:38:29 myTestSambaServer smbd[1924]: [2021/07/16 13:38:29.159015,  0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:29 myTestSambaServer smbd[1924]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:29 myTestSambaServer smbd[1931]: [2021/07/16 13:38:29.279852,  0, pid=1931, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:29 myTestSambaServer smbd[1931]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:29 myTestSambaServer smbd[1924]: [2021/07/16 13:38:29.346339,  0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:29 myTestSambaServer smbd[1924]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:29 myTestSambaServer smbd[1931]: [2021/07/16 13:38:29.402292,  0, pid=1931, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:29 myTestSambaServer smbd[1931]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:29 myTestSambaServer smbd[1924]: [2021/07/16 13:38:29.456223,  0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:29 myTestSambaServer smbd[1924]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:29 myTestSambaServer smbd[1931]: [2021/07/16 13:38:29.535445,  0, pid=1931, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:29 myTestSambaServer smbd[1931]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:29 myTestSambaServer smbd[1924]: [2021/07/16 13:38:29.628028,  0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:29 myTestSambaServer smbd[1924]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:29 myTestSambaServer smbd[1931]: [2021/07/16 13:38:29.669333,  0, pid=1931, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:29 myTestSambaServer smbd[1931]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:29 myTestSambaServer smbd[1924]: [2021/07/16 13:38:29.721911,  0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:29 myTestSambaServer smbd[1924]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:29 myTestSambaServer smbd[1931]: [2021/07/16 13:38:29.817451,  0, pid=1931, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:29 myTestSambaServer smbd[1931]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:29 myTestSambaServer smbd[1924]: [2021/07/16 13:38:29.847012,  0, pid=1924, effective(11009, 65534), real(11009, 0)] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 16 13:38:29 myTestSambaServer smbd[1924]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Jul 16 13:38:30 myTestSambaServer smbd[1931]: [2021/07/16 13:38:30.014143,  0, pid=1931, effective(11009, 65534), real(11009, 0)]


I attached a 7z with the related files:

samba_var.tar.gz -> Archive of /opt/TESTsamba/var
smb.conf -> /opt/TESTsamba/etc/smb.conf
 _opt_TESTsamba_sbin_smbd.0.crash -> coredump written by apport.service
trace.cap  -> network trace 
simulate.ps1 -> Powershell script to simulate reading a batch file
main.cmd -> File reed by simulate.ps1

Another hint. On the original machine (CentOS 8): After the first coredumps are occured, the time to fully read the whole main.cmd file over samba is from 30sec up to mulitple minutes (for 28kb !!) . While I can other files, for example a install.wim of 4GB, copy with 200MB/sec from the same share.
Comment 1 Jeremy Allison 2021-07-16 19:47:08 UTC
Can you add "panic action = /bin/sleep 999999" to the [global] section of your smb.conf and rebuild with symbols. When you get the crash you should be able to attach to the parent process of the sleep with gdb and get a full backtrace with symbols.

That would really help track this down, thanks !
Comment 2 Michael Albert 2021-07-19 08:25:37 UTC
Here is a backtrace:
(gdb) backtrace
#0  0x00007fed51271492 in __GI___waitpid (pid=26451, stat_loc=stat_loc@entry=0x7ffd840c66b8, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x00007fed511dc177 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:149
#2  0x00007fed536b6d51 in smb_panic_s3 (why=0x7fed504ca115 "lock order violation") at ../../source3/lib/util.c:840
#3  0x00007fed54d1281d in smb_panic (why=0x7fed504ca115 "lock order violation") at ../../lib/util/fault.c:197
#4  0x00007fed504c468a in dbwrap_lock_order_lock (db_name=0x55c293cb07f0 "/opt/TESTsamba/var/lock/smbXsrv_open_global.tdb", lock_order=DBWRAP_LOCK_ORDER_1)
    at ../../lib/dbwrap/dbwrap.c:166
#5  0x00007fed504c4a2c in dbwrap_check_lock_order (db=0x55c293cb1a70, mem_ctx=0x55c293cd1700) at ../../lib/dbwrap/dbwrap.c:231
#6  0x00007fed504c4a95 in dbwrap_fetch_locked_internal (db=0x55c293cb1a70, mem_ctx=0x55c293cd1700, key=..., db_fn=0x7fed504c8ce2 <db_tdb_fetch_locked>)
    at ../../lib/dbwrap/dbwrap.c:246
#7  0x00007fed504c4b64 in dbwrap_fetch_locked (db=0x55c293cb1a70, mem_ctx=0x55c293cd1700, key=...) at ../../lib/dbwrap/dbwrap.c:265
#8  0x00007fed549c2b0d in smbXsrv_open_global_fetch_locked (db=0x55c293cb1a70, id=3084737156, mem_ctx=0x55c293cd1700) at ../../source3/smbd/smbXsrv_open.c:167
#9  0x00007fed549c4ed1 in smbXsrv_open_close (op=0x55c293ce40d0, now=0) at ../../source3/smbd/smbXsrv_open.c:1076
#10 0x00007fed549c4233 in smbXsrv_open_destructor (op=0x55c293ce40d0) at ../../source3/smbd/smbXsrv_open.c:803
#11 0x00007fed538f597c in _tc_free_internal (tc=0x55c293ce4070, location=0x7fed54a646e8 "../../source3/smbd/files.c:1076") at ../../lib/talloc/talloc.c:1158
#12 0x00007fed538f5cf8 in _talloc_free_internal (ptr=0x55c293ce40d0, location=0x7fed54a646e8 "../../source3/smbd/files.c:1076") at ../../lib/talloc/talloc.c:1248
#13 0x00007fed538f705e in _talloc_free (ptr=0x55c293ce40d0, location=0x7fed54a646e8 "../../source3/smbd/files.c:1076") at ../../lib/talloc/talloc.c:1792
#14 0x00007fed548c3ab6 in file_free (req=0x55c293ce1860, fsp=0x55c293cdd7d0) at ../../source3/smbd/files.c:1076
#15 0x00007fed5494e6b7 in close_file (req=0x55c293ce1860, fsp=0x55c293cdd7d0, close_type=ERROR_CLOSE) at ../../source3/smbd/close.c:1341
#16 0x00007fed5494a2b9 in create_file_unixpath (conn=0x55c293cbb050, req=0x55c293ce1860, smb_fname=0x55c293cb1390, access_mask=1179785, share_access=3,
    create_disposition=1, create_options=4194368, file_attributes=0, oplock_request=2, lease=0x0, allocation_size=0, private_flags=0, sd=0x0, ea_list=0x0,
    result=0x7ffd840c6e20, pinfo=0x7ffd840c6e04) at ../../source3/smbd/open.c:6094
#17 0x00007fed5494a805 in create_file_default (conn=0x55c293cbb050, req=0x55c293ce1860, smb_fname=0x55c293cb1390, access_mask=1179785, share_access=3, create_disposition=1,
    create_options=4194368, file_attributes=0, oplock_request=2, lease=0x0, allocation_size=0, private_flags=0, sd=0x0, ea_list=0x0, result=0x55c293ce47e8,
    pinfo=0x55c293ce47fc, in_context_blobs=0x7ffd840c71a8, out_context_blobs=0x55c293cdbee0) at ../../source3/smbd/open.c:6241
#18 0x00007fed548b99a8 in vfswrap_create_file (handle=0x55c293cdb4d0, req=0x55c293ce1860, smb_fname=0x55c293cb1390, access_mask=1179785, share_access=3,
    create_disposition=1, create_options=4194368, file_attributes=0, oplock_request=2, lease=0x0, allocation_size=0, private_flags=0, sd=0x0, ea_list=0x0,
    result=0x55c293ce47e8, pinfo=0x55c293ce47fc, in_context_blobs=0x7ffd840c71a8, out_context_blobs=0x55c293cdbee0) at ../../source3/modules/vfs_default.c:759
#19 0x00007fed54955f2a in smb_vfs_call_create_file (handle=0x55c293cdb4d0, req=0x55c293ce1860, smb_fname=0x55c293cb1390, access_mask=1179785, share_access=3,
    create_disposition=1, create_options=4194368, file_attributes=0, oplock_request=2, lease=0x0, allocation_size=0, private_flags=0, sd=0x0, ea_list=0x0,
    result=0x55c293ce47e8, pinfo=0x55c293ce47fc, in_context_blobs=0x7ffd840c71a8, out_context_blobs=0x55c293cdbee0) at ../../source3/smbd/vfs.c:1858
#20 0x00007fed5499db1b in smbd_smb2_create_send (mem_ctx=0x55c293ce0f70, ev=0x55c293c89100, smb2req=0x55c293ce0f70, in_oplock_level=9 '\t', in_impersonation_level=0,
    in_desired_access=1179785, in_file_attributes=0, in_share_access=3, in_create_disposition=1, in_create_options=4194368, in_name=0x55c293ce13e0 "scripts\\main.cmd",
    in_context_blobs=...) at ../../source3/smbd/smb2_create.c:991
#21 0x00007fed5499b5ad in smbd_smb2_request_process_create (smb2req=0x55c293ce0f70) at ../../source3/smbd/smb2_create.c:268
#22 0x00007fed5498d929 in smbd_smb2_request_dispatch (req=0x55c293ce0f70) at ../../source3/smbd/smb2_server.c:3303
#23 0x00007fed5499284e in smbd_smb2_io_handler (xconn=0x55c293c86570, fde_flags=1) at ../../source3/smbd/smb2_server.c:4913
#24 0x00007fed5499297d in smbd_smb2_connection_handler (ev=0x55c293c89100, fde=0x55c293cb72f0, flags=1, private_data=0x55c293c86570) at ../../source3/smbd/smb2_server.c:4951
#25 0x00007fed53f10095 in tevent_common_invoke_fd_handler (fde=0x55c293cb72f0, flags=1, removed=0x0) at ../../lib/tevent/tevent_fd.c:138
#26 0x00007fed53f1aafc in epoll_event_loop (epoll_ev=0x55c293c9cb50, tvalp=0x7ffd840c7520) at ../../lib/tevent/tevent_epoll.c:736
#27 0x00007fed53f1b1df in epoll_event_loop_once (ev=0x55c293c89100, location=0x7fed54a92aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent_epoll.c:937
#28 0x00007fed53f17941 in std_event_loop_once (ev=0x55c293c89100, location=0x7fed54a92aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent_standard.c:110
#29 0x00007fed53f0f2b4 in _tevent_loop_once (ev=0x55c293c89100, location=0x7fed54a92aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent.c:772
#30 0x00007fed53f0f606 in tevent_common_loop_wait (ev=0x55c293c89100, location=0x7fed54a92aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent.c:895
#31 0x00007fed53f179e3 in std_event_loop_wait (ev=0x55c293c89100, location=0x7fed54a92aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent_standard.c:141
#32 0x00007fed53f0f6a9 in _tevent_loop_wait (ev=0x55c293c89100, location=0x7fed54a92aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent.c:914
#33 0x00007fed549741a6 in smbd_process (ev_ctx=0x55c293c89100, msg_ctx=0x55c293c84780, dce_ctx=0x55c293c84650, sock_fd=48, interactive=false)
    at ../../source3/smbd/process.c:4232
#34 0x000055c293467185 in smbd_accept_connection (ev=0x55c293c89100, fde=0x55c293cc3ab0, flags=1, private_data=0x55c293cbb430) at ../../source3/smbd/server.c:1020
#35 0x00007fed53f10095 in tevent_common_invoke_fd_handler (fde=0x55c293cc3ab0, flags=1, removed=0x0) at ../../lib/tevent/tevent_fd.c:138
#36 0x00007fed53f1aafc in epoll_event_loop (epoll_ev=0x55c293c9b210, tvalp=0x7ffd840c78d0) at ../../lib/tevent/tevent_epoll.c:736
#37 0x00007fed53f1b1df in epoll_event_loop_once (ev=0x55c293c89100, location=0x55c29346f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent_epoll.c:937
#38 0x00007fed53f17941 in std_event_loop_once (ev=0x55c293c89100, location=0x55c29346f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent_standard.c:110
#39 0x00007fed53f0f2b4 in _tevent_loop_once (ev=0x55c293c89100, location=0x55c29346f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent.c:772
#40 0x00007fed53f0f606 in tevent_common_loop_wait (ev=0x55c293c89100, location=0x55c29346f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent.c:895
#41 0x00007fed53f179e3 in std_event_loop_wait (ev=0x55c293c89100, location=0x55c29346f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent_standard.c:141
#42 0x00007fed53f0f6a9 in _tevent_loop_wait (ev=0x55c293c89100, location=0x55c29346f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent.c:914
#43 0x000055c293467ea7 in smbd_parent_loop (ev_ctx=0x55c293c89100, parent=0x55c293c9f040) at ../../source3/smbd/server.c:1367
#44 0x000055c29346a221 in main (argc=3, argv=0x7ffd840c7ef8) at ../../source3/smbd/server.c:2220
Comment 3 Michael Albert 2021-07-19 09:04:57 UTC
Created attachment 16688 [details]
Simulate.ps1 reupload

In the 1st version I removed too much (debug) code.
Comment 4 Jeremy Allison 2021-07-19 16:36:09 UTC
Created attachment 16690 [details]
git-am fix for 4.14.next.

Can you test this patch ? I think it's in the right area and exposes a place where a destructor can have been set without the correct cleanup path being flagged.
Comment 5 Michael Albert 2021-07-20 11:51:44 UTC
Patch applied, smbd still crashes but with a different error. Here is the backtrace:
#0  0x00007f91572b2492 in __GI___waitpid (pid=26850, stat_loc=stat_loc@entry=0x7ffd1500b028, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x00007f915721d177 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:149
#2  0x00007f91596f7d51 in smb_panic_s3 (why=0x7f915aa43954 "Could not unlock share mode\n") at ../../source3/lib/util.c:840
#3  0x00007f915ad5381d in smb_panic (why=0x7f915aa43954 "Could not unlock share mode\n") at ../../lib/util/fault.c:197
#4  0x00007f915a7f6202 in share_mode_lock_destructor (lck=0x55c490102b30) at ../../source3/locking/share_mode_lock.c:973
#5  0x00007f915993697c in _tc_free_internal (tc=0x55c490102ad0, location=0x7f915aac6c08 "../../source3/smbd/close.c:522") at ../../lib/talloc/talloc.c:1158
#6  0x00007f9159936cf8 in _talloc_free_internal (ptr=0x55c490102b30, location=0x7f915aac6c08 "../../source3/smbd/close.c:522") at ../../lib/talloc/talloc.c:1248
#7  0x00007f915993805e in _talloc_free (ptr=0x55c490102b30, location=0x7f915aac6c08 "../../source3/smbd/close.c:522") at ../../lib/talloc/talloc.c:1792
#8  0x00007f915a98d21d in close_remove_share_mode (fsp=0x55c4900a1060, close_type=NORMAL_CLOSE) at ../../source3/smbd/close.c:522
#9  0x00007f915a98db7e in close_normal_file (req=0x55c4900fab50, fsp=0x55c4900a1060, close_type=NORMAL_CLOSE) at ../../source3/smbd/close.c:791
#10 0x00007f915a98f700 in close_file (req=0x55c4900fab50, fsp=0x55c4900a1060, close_type=NORMAL_CLOSE) at ../../source3/smbd/close.c:1346
#11 0x00007f915a9e2505 in smbd_smb2_close (req=0x55c4900fa3e0, fsp=0x55c4900a1060, in_flags=1, out_flags=0x55c4900fa9a2, out_creation_ts=0x55c4900fa9a8,
    out_last_access_ts=0x55c4900fa9b8, out_last_write_ts=0x55c4900fa9c8, out_change_ts=0x55c4900fa9d8, out_allocation_size=0x55c4900fa9e8, out_end_of_file=0x55c4900fa9f0,
    out_file_attributes=0x55c4900fa9f8) at ../../source3/smbd/smb2_close.c:277
#12 0x00007f915a9e28d7 in smbd_smb2_close_send (mem_ctx=0x55c4900fa3e0, ev=0x55c4900a3100, smb2req=0x55c4900fa3e0, in_fsp=0x55c4900a1060, in_flags=1)
    at ../../source3/smbd/smb2_close.c:393
#13 0x00007f915a9e1abc in smbd_smb2_request_process_close (req=0x55c4900fa3e0) at ../../source3/smbd/smb2_close.c:72
#14 0x00007f915a9ce943 in smbd_smb2_request_dispatch (req=0x55c4900fa3e0) at ../../source3/smbd/smb2_server.c:3309
#15 0x00007f915a9d384e in smbd_smb2_io_handler (xconn=0x55c4900a0570, fde_flags=1) at ../../source3/smbd/smb2_server.c:4913
#16 0x00007f915a9d397d in smbd_smb2_connection_handler (ev=0x55c4900a3100, fde=0x55c4900d12f0, flags=1, private_data=0x55c4900a0570) at ../../source3/smbd/smb2_server.c:4951
#17 0x00007f9159f51095 in tevent_common_invoke_fd_handler (fde=0x55c4900d12f0, flags=1, removed=0x0) at ../../lib/tevent/tevent_fd.c:138
#18 0x00007f9159f5bafc in epoll_event_loop (epoll_ev=0x55c4900b6b50, tvalp=0x7ffd1500b980) at ../../lib/tevent/tevent_epoll.c:736
#19 0x00007f9159f5c1df in epoll_event_loop_once (ev=0x55c4900a3100, location=0x7f915aad3aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent_epoll.c:937
#20 0x00007f9159f58941 in std_event_loop_once (ev=0x55c4900a3100, location=0x7f915aad3aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent_standard.c:110
#21 0x00007f9159f502b4 in _tevent_loop_once (ev=0x55c4900a3100, location=0x7f915aad3aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent.c:772
#22 0x00007f9159f50606 in tevent_common_loop_wait (ev=0x55c4900a3100, location=0x7f915aad3aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent.c:895
#23 0x00007f9159f589e3 in std_event_loop_wait (ev=0x55c4900a3100, location=0x7f915aad3aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent_standard.c:141
#24 0x00007f9159f506a9 in _tevent_loop_wait (ev=0x55c4900a3100, location=0x7f915aad3aa8 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent.c:914
#25 0x00007f915a9b51a6 in smbd_process (ev_ctx=0x55c4900a3100, msg_ctx=0x55c49009e780, dce_ctx=0x55c49009e650, sock_fd=48, interactive=false)
    at ../../source3/smbd/process.c:4232
#26 0x000055c48f7b8185 in smbd_accept_connection (ev=0x55c4900a3100, fde=0x55c4900ddab0, flags=1, private_data=0x55c4900d5430) at ../../source3/smbd/server.c:1020
#27 0x00007f9159f51095 in tevent_common_invoke_fd_handler (fde=0x55c4900ddab0, flags=1, removed=0x0) at ../../lib/tevent/tevent_fd.c:138
#28 0x00007f9159f5bafc in epoll_event_loop (epoll_ev=0x55c4900b5210, tvalp=0x7ffd1500bd30) at ../../lib/tevent/tevent_epoll.c:736
#29 0x00007f9159f5c1df in epoll_event_loop_once (ev=0x55c4900a3100, location=0x55c48f7c01a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent_epoll.c:937
#30 0x00007f9159f58941 in std_event_loop_once (ev=0x55c4900a3100, location=0x55c48f7c01a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent_standard.c:110
#31 0x00007f9159f502b4 in _tevent_loop_once (ev=0x55c4900a3100, location=0x55c48f7c01a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent.c:772
#32 0x00007f9159f50606 in tevent_common_loop_wait (ev=0x55c4900a3100, location=0x55c48f7c01a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent.c:895
#33 0x00007f9159f589e3 in std_event_loop_wait (ev=0x55c4900a3100, location=0x55c48f7c01a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent_standard.c:141
#34 0x00007f9159f506a9 in _tevent_loop_wait (ev=0x55c4900a3100, location=0x55c48f7c01a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent.c:914
#35 0x000055c48f7b8ea7 in smbd_parent_loop (ev_ctx=0x55c4900a3100, parent=0x55c4900b9040) at ../../source3/smbd/server.c:1367
#36 0x000055c48f7bb221 in main (argc=3, argv=0x7ffd1500c358) at ../../source3/smbd/server.c:2220
Comment 6 Michael Albert 2021-07-20 12:00:32 UTC
syslog says

smbd[26843]: [2021/07/20 13:41:05.568285,  0] ../../source3/smbd/close.c:519(close_remove_share_mode)
smbd[26843]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
smbd[26845]: [2021/07/20 13:41:05.581593,  0] ../../source3/smbd/close.c:519(close_remove_share_mode)
smbd[26845]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
smbd[26843]: [2021/07/20 13:41:05.677774,  0] ../../source3/smbd/close.c:519(close_remove_share_mode)
smbd[26843]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
smbd[26845]: [2021/07/20 13:41:05.689613,  0] ../../source3/smbd/close.c:519(close_remove_share_mode)
smbd[26845]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
smbd[26843]: [2021/07/20 13:41:05.786966,  0] ../../source3/smbd/close.c:519(close_remove_share_mode)
smbd[26843]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Comment 7 Jeremy Allison 2021-07-20 16:09:24 UTC
OK, well this is progress. The first patch certainly fixes a bug in the "failure to open" case (calling close_file(...,ERROR_CLOSE)).

This problem is different. It's an issue on calling close_file(...,NORMAL_CLOSE). I'll look closer.
Comment 8 Jeremy Allison 2021-07-20 16:44:18 UTC
Is there a possibility you can give me a new wireshark trace with my patch in:

https://attachments.samba.org/attachment.cgi?id=16690

applied ? It's deliberately calling smb_panic on an assertion fail, because it can't find a share mode entry for what should be an existing open file.

I'd love to see the SMB2create (open) call that corresponds to this close to allow me to visually walk through the open code path to see if there's a case where we successfully return an open handle but haven't set a share mode entry (this should never happen of course, but then again we shouldn't crash either :-).
Comment 9 Jeremy Allison 2021-07-20 16:57:21 UTC
Oh, can you also turn off encryption on the share before getting the wireshark trace please ?

The last trace wasn't useful to examine as all I get is: "Encrypted SMB3", which doesn't help if I want to look inside the packets :-).
Comment 10 Michael Albert 2021-07-20 20:05:30 UTC
Set

    server smb encrypt = off
    smb encrypt = off

Trace2.cap attached. First occurence when smbd cores is:
Jul 20 21:58:48 \myTestSambaServer smbd[27185]: [2021/07/20 21:58:48.236866,  0] ../../source3/smbd/close.c:519(close_remove_share_mode)
Jul 20 21:58:48 \myTestSambaServer smbd[27185]:   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
Comment 11 Michael Albert 2021-07-20 20:06:02 UTC
Created attachment 16693 [details]
unencrypted trace
Comment 12 Jeremy Allison 2021-07-20 20:38:59 UTC
I don't see a TCP RST packet in that trace. Normally when smbd cores the server will send a TCP RST packet to the client to tell it the connection has gone away.
Comment 13 Jeremy Allison 2021-07-20 20:46:48 UTC
I also noticed that in your smb.conf you have:

    oplocks = No
    level2 oplocks = No
    smb2 leases = No

This means that the client must to an open/read/close every time it's opening the same file (scripts\main.cmd). Is there any reason you need this ? It's going to make Samba really slow.

Have you tried removing these lines and letting the client cache the file via a lease ?
Comment 14 Michael Albert 2021-07-21 05:38:51 UTC
(In reply to Jeremy Allison from comment #12)
I think this happens because I still have the 

panic action = /bin/sleep 999999

option active and the smbd process has startet the sleep process as child and does not exit
Comment 15 Michael Albert 2021-07-21 06:11:00 UTC
(In reply to Jeremy Allison from comment #13)
I assumed that this error is lock related. Therefore I played around with those options. I removed

    oplocks = No
    level2 oplocks = No
    smb2 leases = No

but error is the same.
Comment 16 Jeremy Allison 2021-07-21 16:36:54 UTC
(In reply to Michael Albert from comment #14)

Yes, you are correct - I forgot that. Can you allow it to crash so I can see the wireshark trace of the exact close request that causes the crash ?
Comment 17 Jeremy Allison 2021-07-21 17:47:28 UTC
(In reply to Michael Albert from comment #14)

Hmmm. Thinking some more I'm not sure this is the case. Your wireshark trace ends on an SMB2Read request, the crash occurs on SMB2close. I don't see an SMB2close ending the trace. There's something not matching here.

If this is easily reproducible for you, can you get a debug level 10 log (it might be large) ?
Comment 18 Jeremy Allison 2021-07-21 21:19:52 UTC
Created attachment 16695 [details]
Extra debug for 4.14.x

Can you try reproducing with this patch and send me the wireshark trace + debug log. If adds lots more debugging around the share modes (makes normal debugging log at level zero actually).

There have been changes in this area, so I'd be interested if you can reproduce this crash with the recently release 4.15.rc1.

Thanks !
Comment 19 Michael Albert 2021-07-22 08:19:12 UTC
Created attachment 16696 [details]
Trace and log for Debug patch

[PATCH 1/2] s3: smbd: Ensure we set is_fsa inside fsp_bind_smb() as
 soon as we have called smbXsrv_open_create().
[PATCH 2/2] JRATEST: Add debug to try and track down

Added trace3.cap and log.smbd.
Comment 20 Michael Albert 2021-07-22 08:20:59 UTC
Added trace3.cap and log.smbd with the applied debug patch.

Panic is occured at 10:13:34

Jul 22 10:13:34 myTestSambaServer smbd[14494]: [2021/07/22 10:13:34.688967,  0] ../../lib/util/fault.c:172(smb_panic_log)
Jul 22 10:13:34 myTestSambaServer smbd[14494]: [2021/07/22 10:13:34.689022,  0] ../../lib/util/fault.c:176(smb_panic_log)
Jul 22 10:13:34 myTestSambaServer smbd[14494]: [2021/07/22 10:13:34.689116,  0] ../../lib/util/fault.c:181(smb_panic_log)
Jul 22 10:13:34 myTestSambaServer smbd[14494]: [2021/07/22 10:13:34.689245,  0] ../../lib/util/fault.c:182(smb_panic_log)
Jul 22 10:13:34 myTestSambaServer smbd[14494]: [2021/07/22 10:13:34.689266,  0] ../../lib/util/fault.c:184(smb_panic_log)
Jul 22 10:13:34 myTestSambaServer smbd[14494]:    #1 /opt/TESTsamba/lib/libsamba-util.so.0(smb_panic_log+0x1b1) [0x7f7e321197e6]
Jul 22 10:13:34 myTestSambaServer smbd[14494]:    #2 /opt/TESTsamba/lib/libsamba-util.so.0(smb_panic+0x18) [0x7f7e32119801]
Jul 22 10:13:34 myTestSambaServer smbd[14494]: [2021/07/22 10:13:34.694433,  0] ../../source3/lib/util.c:839(smb_panic_s3)
Jul 22 10:13:34 myTestSambaServer smbd[14494]:   smb_panic(): calling panic action [/bin/sleep 999999]
Comment 21 Michael Albert 2021-07-22 08:47:29 UTC
~# /opt/TESTsamba/sbin/smbd -V
Version 4.15.0rc1

has the same problem. 

syslog:
Jul 22 10:44:44 ad613419pc smbd[31447]: [2021/07/22 10:44:44.790120,  0] ../../lib/util/fault.c:288(log_stack_trace)
Jul 22 10:44:44 ad613419pc smbd[31447]:   BACKTRACE: 39 stack frames:
Jul 22 10:44:44 ad613419pc smbd[31447]:    #0 /opt/TESTsamba/lib/libsamba-util.so.0(log_stack_trace+0x2e) [0x7fd1c3effa71]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #1 /opt/TESTsamba/lib/libsamba-util.so.0(smb_panic_log+0x1b1) [0x7fd1c3eff9f6]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #2 /opt/TESTsamba/lib/libsamba-util.so.0(smb_panic+0x18) [0x7fd1c3effa11]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #3 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x8c899) [0x7fd1c41fe899]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #4 /opt/TESTsamba/lib/private/libtalloc.so.2(+0x39dc) [0x7fd1c3ad19dc]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #5 /opt/TESTsamba/lib/private/libtalloc.so.2(+0x3d58) [0x7fd1c3ad1d58]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #6 /opt/TESTsamba/lib/private/libtalloc.so.2(_talloc_free+0x105) [0x7fd1c3ad30be]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #7 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x21d1c8) [0x7fd1c438f1c8]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #8 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x221be8) [0x7fd1c4393be8]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #9 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(create_file_default+0x4f1) [0x7fd1c43948aa]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #10 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x190446) [0x7fd1c4302446]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #11 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(smb_vfs_call_create_file+0x9b) [0x7fd1c43a0aae]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #12 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x276464) [0x7fd1c43e8464]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #13 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(smbd_smb2_request_process_create+0x76e) [0x7fd1c43e5e10]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #14 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(smbd_smb2_request_dispatch+0x15a2) [0x7fd1c43d822e]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #15 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x26b180) [0x7fd1c43dd180]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #16 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(+0x26b2af) [0x7fd1c43dd2af]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #17 /opt/TESTsamba/lib/private/libtevent.so.0(tevent_common_invoke_fd_handler+0x11d) [0x7fd1c347fb1c]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #18 /opt/TESTsamba/lib/private/libtevent.so.0(+0x1281d) [0x7fd1c348a81d]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #19 /opt/TESTsamba/lib/private/libtevent.so.0(+0x12f00) [0x7fd1c348af00]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #20 /opt/TESTsamba/lib/private/libtevent.so.0(+0xf690) [0x7fd1c3487690]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #21 /opt/TESTsamba/lib/private/libtevent.so.0(_tevent_loop_once+0x11e) [0x7fd1c347e947]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #22 /opt/TESTsamba/lib/private/libtevent.so.0(tevent_common_loop_wait+0x25) [0x7fd1c347ec99]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #23 /opt/TESTsamba/lib/private/libtevent.so.0(+0xf732) [0x7fd1c3487732]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #24 /opt/TESTsamba/lib/private/libtevent.so.0(_tevent_loop_wait+0x2b) [0x7fd1c347ed3c]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #25 /opt/TESTsamba/lib/private/libsmbd-base-samba4.so(smbd_process+0xcd0) [0x7fd1c43be46a]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #26 /opt/TESTsamba/sbin/smbd(+0xd90d) [0x559646eeb90d]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #27 /opt/TESTsamba/lib/private/libtevent.so.0(tevent_common_invoke_fd_handler+0x11d) [0x7fd1c347fb1c]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #28 /opt/TESTsamba/lib/private/libtevent.so.0(+0x1281d) [0x7fd1c348a81d]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #29 /opt/TESTsamba/lib/private/libtevent.so.0(+0x12f00) [0x7fd1c348af00]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #30 /opt/TESTsamba/lib/private/libtevent.so.0(+0xf690) [0x7fd1c3487690]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #31 /opt/TESTsamba/lib/private/libtevent.so.0(_tevent_loop_once+0x11e) [0x7fd1c347e947]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #32 /opt/TESTsamba/lib/private/libtevent.so.0(tevent_common_loop_wait+0x25) [0x7fd1c347ec99]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #33 /opt/TESTsamba/lib/private/libtevent.so.0(+0xf732) [0x7fd1c3487732]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #34 /opt/TESTsamba/lib/private/libtevent.so.0(_tevent_loop_wait+0x2b) [0x7fd1c347ed3c]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #35 /opt/TESTsamba/sbin/smbd(+0xe634) [0x559646eec634]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #36 /opt/TESTsamba/sbin/smbd(main+0x1c3f) [0x559646eeeb69]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #37 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7fd1c0761bf7]
Jul 22 10:44:44 ad613419pc smbd[31447]:    #38 /opt/TESTsamba/sbin/smbd(_start+0x2a) [0x559646ee46da]
Jul 22 10:44:44 ad613419pc smbd[31447]: [2021/07/22 10:44:44.791649,  0] ../../source3/lib/util.c:693(smb_panic_s3)
Jul 22 10:44:44 ad613419pc smbd[31447]:   smb_panic(): calling panic action [/bin/sleep 999999]





(gdb) bt
#0  0x00007fd1c0824492 in __GI___waitpid (pid=31451, stat_loc=stat_loc@entry=0x7ffed1e1d968, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x00007fd1c078f177 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:149
#2  0x00007fd1c2e370f5 in smb_panic_s3 (why=0x7fd1c444f948 "Could not store share mode data\n") at ../../source3/lib/util.c:694
#3  0x00007fd1c3effa2d in smb_panic (why=0x7fd1c444f948 "Could not store share mode data\n") at ../../lib/util/fault.c:197
#4  0x00007fd1c41fe899 in share_mode_lock_destructor (lck=0x559648fb5750) at ../../source3/locking/share_mode_lock.c:979
#5  0x00007fd1c3ad19dc in _tc_free_internal (tc=0x559648fb56f0, location=0x7fd1c44d0548 "../../source3/smbd/open.c:4010") at ../../lib/talloc/talloc.c:1158
#6  0x00007fd1c3ad1d58 in _talloc_free_internal (ptr=0x559648fb5750, location=0x7fd1c44d0548 "../../source3/smbd/open.c:4010") at ../../lib/talloc/talloc.c:1248
#7  0x00007fd1c3ad30be in _talloc_free (ptr=0x559648fb5750, location=0x7fd1c44d0548 "../../source3/smbd/open.c:4010") at ../../lib/talloc/talloc.c:1792
#8  0x00007fd1c438f1c8 in open_file_ntcreate (conn=0x559648fb3320, req=0x559648fb4c50, access_mask=1179785, share_access=3, create_disposition=1,
    create_options=4194368, new_dos_attributes=0, oplock_request=0, lease=0x559648fca1c0, private_flags=0, parent_dir_fname=0x559648fc92c0,
    smb_fname_atname=0x559648fc9550, pinfo=0x7ffed1e1deb4, fsp=0x559648fa16c0) at ../../source3/smbd/open.c:4010
#9  0x00007fd1c4393be8 in create_file_unixpath (conn=0x559648fb3320, req=0x559648fb4c50, smb_fname=0x559648f8f470, access_mask=1179785, share_access=3,
    create_disposition=1, create_options=4194368, file_attributes=0, oplock_request=256, lease=0x559648fca1c0, allocation_size=0, private_flags=0, sd=0x0,
    ea_list=0x0, result=0x7ffed1e1e040, pinfo=0x7ffed1e1e024) at ../../source3/smbd/open.c:5898
#10 0x00007fd1c43948aa in create_file_default (conn=0x559648fb3320, req=0x559648fb4c50, smb_fname=0x559648f8f470, access_mask=1179785, share_access=3,
    create_disposition=1, create_options=4194368, file_attributes=0, oplock_request=256, lease=0x559648fca1c0, allocation_size=0, private_flags=0, sd=0x0,
    ea_list=0x0, result=0x559648fca138, pinfo=0x559648fca14c, in_context_blobs=0x7ffed1e1e3c8, out_context_blobs=0x559648fc7060) at ../../source3/smbd/open.c:6214
#11 0x00007fd1c4302446 in vfswrap_create_file (handle=0x559648fcc670, req=0x559648fb4c50, smb_fname=0x559648f8f470, access_mask=1179785, share_access=3,
    create_disposition=1, create_options=4194368, file_attributes=0, oplock_request=256, lease=0x559648fca1c0, allocation_size=0, private_flags=0, sd=0x0,
    ea_list=0x0, result=0x559648fca138, pinfo=0x559648fca14c, in_context_blobs=0x7ffed1e1e3c8, out_context_blobs=0x559648fc7060)
    at ../../source3/modules/vfs_default.c:755
#12 0x00007fd1c43a0aae in smb_vfs_call_create_file (handle=0x559648fcc670, req=0x559648fb4c50, smb_fname=0x559648f8f470, access_mask=1179785, share_access=3,
    create_disposition=1, create_options=4194368, file_attributes=0, oplock_request=256, lease=0x559648fca1c0, allocation_size=0, private_flags=0, sd=0x0,
    ea_list=0x0, result=0x559648fca138, pinfo=0x559648fca14c, in_context_blobs=0x7ffed1e1e3c8, out_context_blobs=0x559648fc7060) at ../../source3/smbd/vfs.c:1873
#13 0x00007fd1c43e8464 in smbd_smb2_create_send (mem_ctx=0x559648fd0360, ev=0x559648f793f0, smb2req=0x559648fd0360, in_oplock_level=255 '\377',
    in_impersonation_level=0, in_desired_access=1179785, in_file_attributes=0, in_share_access=3, in_create_disposition=1, in_create_options=4194368,
    in_name=0x559648fd07f0 "scripts\\main.cmd", in_context_blobs=...) at ../../source3/smbd/smb2_create.c:1013
#14 0x00007fd1c43e5e10 in smbd_smb2_request_process_create (smb2req=0x559648fd0360) at ../../source3/smbd/smb2_create.c:268
#15 0x00007fd1c43d822e in smbd_smb2_request_dispatch (req=0x559648fd0360) at ../../source3/smbd/smb2_server.c:3394
#16 0x00007fd1c43dd180 in smbd_smb2_io_handler (xconn=0x559648fb1750, fde_flags=1) at ../../source3/smbd/smb2_server.c:5003
#17 0x00007fd1c43dd2af in smbd_smb2_connection_handler (ev=0x559648f793f0, fde=0x559648fb1070, flags=1, private_data=0x559648fb1750)
    at ../../source3/smbd/smb2_server.c:5041
#18 0x00007fd1c347fb1c in tevent_common_invoke_fd_handler (fde=0x559648fb1070, flags=1, removed=0x0) at ../../lib/tevent/tevent_fd.c:142
#19 0x00007fd1c348a81d in epoll_event_loop (epoll_ev=0x559648f8c060, tvalp=0x7ffed1e1e750) at ../../lib/tevent/tevent_epoll.c:736
#20 0x00007fd1c348af00 in epoll_event_loop_once (ev=0x559648f793f0, location=0x7fd1c44dfdb8 "../../source3/smbd/process.c:4246")
    at ../../lib/tevent/tevent_epoll.c:937
#21 0x00007fd1c3487690 in std_event_loop_once (ev=0x559648f793f0, location=0x7fd1c44dfdb8 "../../source3/smbd/process.c:4246")
    at ../../lib/tevent/tevent_standard.c:110
#22 0x00007fd1c347e947 in _tevent_loop_once (ev=0x559648f793f0, location=0x7fd1c44dfdb8 "../../source3/smbd/process.c:4246") at ../../lib/tevent/tevent.c:790
#23 0x00007fd1c347ec99 in tevent_common_loop_wait (ev=0x559648f793f0, location=0x7fd1c44dfdb8 "../../source3/smbd/process.c:4246")
    at ../../lib/tevent/tevent.c:913
#24 0x00007fd1c3487732 in std_event_loop_wait (ev=0x559648f793f0, location=0x7fd1c44dfdb8 "../../source3/smbd/process.c:4246")
    at ../../lib/tevent/tevent_standard.c:141
#25 0x00007fd1c347ed3c in _tevent_loop_wait (ev=0x559648f793f0, location=0x7fd1c44dfdb8 "../../source3/smbd/process.c:4246") at ../../lib/tevent/tevent.c:932
#26 0x00007fd1c43be46a in smbd_process (ev_ctx=0x559648f793f0, msg_ctx=0x559648f72940, dce_ctx=0x559648f72f30, sock_fd=48, interactive=false)
    at ../../source3/smbd/process.c:4246
#27 0x0000559646eeb90d in smbd_accept_connection (ev=0x559648f793f0, fde=0x559648fb7580, flags=1, private_data=0x559648facc40) at ../../source3/smbd/server.c:1021
#28 0x00007fd1c347fb1c in tevent_common_invoke_fd_handler (fde=0x559648fb7580, flags=1, removed=0x0) at ../../lib/tevent/tevent_fd.c:142
#29 0x00007fd1c348a81d in epoll_event_loop (epoll_ev=0x559648f8a000, tvalp=0x7ffed1e1eb00) at ../../lib/tevent/tevent_epoll.c:736
#30 0x00007fd1c348af00 in epoll_event_loop_once (ev=0x559648f793f0, location=0x559646ef3b20 "../../source3/smbd/server.c:1365")
    at ../../lib/tevent/tevent_epoll.c:937
#31 0x00007fd1c3487690 in std_event_loop_once (ev=0x559648f793f0, location=0x559646ef3b20 "../../source3/smbd/server.c:1365")
    at ../../lib/tevent/tevent_standard.c:110
#32 0x00007fd1c347e947 in _tevent_loop_once (ev=0x559648f793f0, location=0x559646ef3b20 "../../source3/smbd/server.c:1365") at ../../lib/tevent/tevent.c:790
#33 0x00007fd1c347ec99 in tevent_common_loop_wait (ev=0x559648f793f0, location=0x559646ef3b20 "../../source3/smbd/server.c:1365") at ../../lib/tevent/tevent.c:913
#34 0x00007fd1c3487732 in std_event_loop_wait (ev=0x559648f793f0, location=0x559646ef3b20 "../../source3/smbd/server.c:1365")
    at ../../lib/tevent/tevent_standard.c:141
#35 0x00007fd1c347ed3c in _tevent_loop_wait (ev=0x559648f793f0, location=0x559646ef3b20 "../../source3/smbd/server.c:1365") at ../../lib/tevent/tevent.c:932
#36 0x0000559646eec634 in smbd_parent_loop (ev_ctx=0x559648f793f0, parent=0x559648f8e4e0) at ../../source3/smbd/server.c:1365
#37 0x0000559646eeeb69 in main (argc=3, argv=0x7ffed1e1f128) at ../../source3/smbd/server.c:2234
Comment 22 Jeremy Allison 2021-07-22 16:06:47 UTC
FYI, I'll try and take a look at this today, but I'm out on vacation from tomorrow (July 23rd) - August 2nd. I'm adding Ralph and Metze to this bug in case they have time to follow-up whilst I'm out. Thanks for your patience in working on this with us.
Comment 23 Jeremy Allison 2021-07-22 18:15:39 UTC
Oh that log.smbd is very interesting. Specifically:

1195724   set_file_oplock: granted oplock on file scripts/main.cmd, 2049:1179655:0/7963, tv_sec = 60f928ae, tv_usec = 9b0cd
1195725 [2021/07/22 10:13:34.688219,  5] ../../lib/dbwrap/dbwrap.c:148(dbwrap_lock_order_lock)
1195726   dbwrap_lock_order_lock: check lock order 1 for /opt/TESTsamba/var/lock/locking.tdb
1195727 [2021/07/22 10:13:34.688216,  1] ../../source3/lib/g_lock.c:1000(g_lock_writev_data)
1195728   g_lock_writev_data: g_lock_writev_data_fn failed: NT_STATUS_NOT_LOCKED
1195729 [2021/07/22 10:13:34.688545,  0] ../../source3/locking/share_mode_lock.c:2166(share_mode_entry_do)
1195730 [2021/07/22 10:13:34.688554,  0] ../../source3/locking/share_mode_lock.c:2166(share_mode_entry_do)
1195731   share_mode_entry_do: JRATEST: num_share_modes=0
1195732 [2021/07/22 10:13:34.688607,  0] ../../source3/locking/share_mode_lock.c:2179(share_mode_entry_do)
1195733   share_mode_entry_do: JRATEST: num_share_modes=0
1195734   share_mode_entry_do: JRATEST: Did not find share mode entry for 7963
1195735 [2021/07/22 10:13:34.688632,  0] ../../source3/smbd/close.c:519(close_remove_share_mode)
1195736 [2021/07/22 10:13:34.688633,  0] ../../source3/locking/share_mode_lock.c:2179(share_mode_entry_do)
1195737   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
1195738   share_mode_entry_do: JRATEST: Did not find share mode entry for 306
1195739 [2021/07/22 10:13:34.688771,  2] ../../source3/smbd/close.c:824(close_normal_file)
1195740 [2021/07/22 10:13:34.688774,  0] ../../source3/smbd/close.c:519(close_remove_share_mode)
1195741   nclanonymous closed file scripts/main.cmd (numopen=3) NT_STATUS_OK
1195742   close_remove_share_mode: Could not delete share entry for file scripts/main.cmd
1195743 [2021/07/22 10:13:34.688883,  5] ../../lib/dbwrap/dbwrap.c:148(dbwrap_lock_order_lock)
1195744   dbwrap_lock_order_lock: check lock order 1 for /opt/TESTsamba/var/lock/smbXsrv_open_global.tdb
1195745 [2021/07/22 10:13:34.688913,  0] ../../lib/dbwrap/dbwrap.c:164(dbwrap_lock_order_lock)
1195746   dbwrap_lock_order_lock: Lock order violation: Trying /opt/TESTsamba/var/lock/smbXsrv_open_global.tdb at 1 while /opt/TESTsamba/var/lock/lock
1195747 [2021/07/22 10:13:34.688940,  0] ../../lib/dbwrap/dbwrap.c:131(debug_lock_order)
1195748 [2021/07/22 10:13:34.688942,  5] ../../lib/dbwrap/dbwrap.c:180(dbwrap_lock_order_unlock)
1195749   lock order:  1:/opt/TESTsamba/var/lock/locking.tdb 2:<none> 3:<none> 4:<none>
1195750   dbwrap_lock_order_unlock: release lock order 1 for /opt/TESTsamba/var/lock/locking.tdb
1195751 [2021/07/22 10:13:34.688967,  0] ../../lib/util/fault.c:172(smb_panic_log)
1195752   ===============================================================
1195753 [2021/07/22 10:13:34.689022,  0] ../../lib/util/fault.c:176(smb_panic_log)
1195754 [2021/07/22 10:13:34.689015,  2] ../../source3/smbd/close.c:824(close_normal_file)
1195755   nclanonymous closed file scripts/main.cmd (numopen=0) NT_STATUS_OK
1195756   INTERNAL ERROR: lock order violation in pid 14494 (4.14.6)
Comment 24 Michael Albert 2021-07-22 18:59:41 UTC
Ok, enjoy your holidays :-)
Comment 25 Jeremy Allison 2021-07-22 23:12:44 UTC
OK, I've gotten so desperate I've tried to reproduce this using your Simulate.ps1 script on a Windows 10 laptop. I don't have much ram so I'm using -startNumberOfJobs 20. I've left this running for 20+ minutes or so against current master (with my patch from:

https://bugzilla.samba.org/attachment.cgi?id=16690

applied, as I think that's needed) and I haven't had any crashes at all.

Can you check if it reproduces for you with startNumberOfJobs 20 ?
Comment 26 Jeremy Allison 2021-07-22 23:38:08 UTC
Created attachment 16697 [details]
git-am fix for 4.14.next.

Also applies to master. Prevents DBG_ERR message from being logged when we're closing a file out early in the error path without having created a share mode (file not found etc.).
Comment 27 Jeremy Allison 2021-07-22 23:43:23 UTC
What Linux kernel are you running on your box ? I'm starting to suspect an error  in your kernel in the pthread_mutex_lock code we now use by default inside lib/tdb/mutex.c

My kernel here is Ubuntu 20.04LTS Linux 5.8.0-59-generic.

If you can build yourself, can you try building with --disable-tdb-mutex-locking ? That will fall back to the old style fcntl tdb locking instead.
Comment 28 Stefan Metzmacher 2021-07-22 23:48:12 UTC
(In reply to Jeremy Allison from comment #27)

'dbwrap_tdb_mutexes:*=no' in the global section should also do the trick
without recompiling, at least for everything going via dbwrap_open()
Comment 29 Jeremy Allison 2021-07-22 23:53:49 UTC
(In reply to Stefan Metzmacher from comment #28)

Thanks Metze, I thought there must be a user-config way to do this but didn't have time to find it as I'm trying to wind-down my TO-DO's for vacation.

Michael, Metze's config file change is obviously quicker to use than rebuilding everything to check if it's a tdb mutex issue :-). Thanks once again for your help Metze !
Comment 30 Jeremy Allison 2021-07-23 02:27:32 UTC
Created attachment 16698 [details]
git-am fix for master, 4.14.next, 4.15.rc2

Also applies to master. Prevents DBG_ERR message from being logged when we're closing a file or directory out early in the error path without having created a share mode (file/directory not found etc.).
Comment 31 Michael Albert 2021-07-23 05:53:09 UTC
(In reply to Jeremy Allison from comment #25)
You have to start this script on at least 2 Windows boxes. With only one I also cannot reproduce this issue.
Comment 32 Michael Albert 2021-07-23 07:10:17 UTC
(In reply to Stefan Metzmacher from comment #28)
I'm on Ubuntu 16.04LTS 
 4.15.0-150-generic #155-Ubuntu SMP Sat Jul 3 13:37:31 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

I tried both. dbwrap_tdb_mutexes:*=no and recompiling with the --disable-tdb-mutex-locking option.

smbd crashes again. This behaviour is different: The panic isn't log to syslog. 

Panic happens at 09:00:53

:~# ps -eo pid,lstart,cmd -q 21623
  PID                  STARTED CMD
21623 Fri Jul 23 09:00:53 2021 sh -c /bin/sleep 999999

I attached the log.smbd 


(gdb) bt
#0  0x00007f4da8f61492 in __GI___waitpid (pid=21623, stat_loc=stat_loc@entry=0x7fff5fcfae48, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x00007f4da8ecc177 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:149
#2  0x00007f4dab3a6d4e in smb_panic_s3 (why=0x7f4dac6f2914 "Could not unlock share mode\n") at ../../source3/lib/util.c:840
#3  0x00007f4daca0281d in smb_panic (why=0x7f4dac6f2914 "Could not unlock share mode\n") at ../../lib/util/fault.c:197
#4  0x00007f4dac4a5202 in share_mode_lock_destructor (lck=0x55ec5a6ed0f0) at ../../source3/locking/share_mode_lock.c:973
#5  0x00007f4dab5e597c in _tc_free_internal (tc=0x55ec5a6ed090, location=0x7f4dac775ce8 "../../source3/smbd/close.c:522") at ../../lib/talloc/talloc.c:1158
#6  0x00007f4dab5e5cf8 in _talloc_free_internal (ptr=0x55ec5a6ed0f0, location=0x7f4dac775ce8 "../../source3/smbd/close.c:522") at ../../lib/talloc/talloc.c:1248
#7  0x00007f4dab5e705e in _talloc_free (ptr=0x55ec5a6ed0f0, location=0x7f4dac775ce8 "../../source3/smbd/close.c:522") at ../../lib/talloc/talloc.c:1792
#8  0x00007f4dac63c1f0 in close_remove_share_mode (fsp=0x55ec5a697cb0, close_type=NORMAL_CLOSE) at ../../source3/smbd/close.c:522
#9  0x00007f4dac63cb51 in close_normal_file (req=0x55ec5a6ea550, fsp=0x55ec5a697cb0, close_type=NORMAL_CLOSE) at ../../source3/smbd/close.c:791
#10 0x00007f4dac63e6d3 in close_file (req=0x55ec5a6ea550, fsp=0x55ec5a697cb0, close_type=NORMAL_CLOSE) at ../../source3/smbd/close.c:1346
#11 0x00007f4dac6914d8 in smbd_smb2_close (req=0x55ec5a6e9e10, fsp=0x55ec5a697cb0, in_flags=1, out_flags=0x55ec5a6ea3a2, out_creation_ts=0x55ec5a6ea3a8,
    out_last_access_ts=0x55ec5a6ea3b8, out_last_write_ts=0x55ec5a6ea3c8, out_change_ts=0x55ec5a6ea3d8, out_allocation_size=0x55ec5a6ea3e8, out_end_of_file=0x55ec5a6ea3f0,
    out_file_attributes=0x55ec5a6ea3f8) at ../../source3/smbd/smb2_close.c:277
#12 0x00007f4dac6918aa in smbd_smb2_close_send (mem_ctx=0x55ec5a6e9e10, ev=0x55ec5a69b100, smb2req=0x55ec5a6e9e10, in_fsp=0x55ec5a697cb0, in_flags=1)
    at ../../source3/smbd/smb2_close.c:393
#13 0x00007f4dac690a8f in smbd_smb2_request_process_close (req=0x55ec5a6e9e10) at ../../source3/smbd/smb2_close.c:72
#14 0x00007f4dac67d916 in smbd_smb2_request_dispatch (req=0x55ec5a6e9e10) at ../../source3/smbd/smb2_server.c:3309
#15 0x00007f4dac682821 in smbd_smb2_io_handler (xconn=0x55ec5a698570, fde_flags=1) at ../../source3/smbd/smb2_server.c:4913
#16 0x00007f4dac682950 in smbd_smb2_connection_handler (ev=0x55ec5a69b100, fde=0x55ec5a6aef40, flags=1, private_data=0x55ec5a698570) at ../../source3/smbd/smb2_server.c:4951
#17 0x00007f4dabc00095 in tevent_common_invoke_fd_handler (fde=0x55ec5a6aef40, flags=1, removed=0x0) at ../../lib/tevent/tevent_fd.c:138
#18 0x00007f4dabc0aafc in epoll_event_loop (epoll_ev=0x55ec5a6aedd0, tvalp=0x7fff5fcfb7a0) at ../../lib/tevent/tevent_epoll.c:736
#19 0x00007f4dabc0b1df in epoll_event_loop_once (ev=0x55ec5a69b100, location=0x7f4dac782b88 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent_epoll.c:937
#20 0x00007f4dabc07941 in std_event_loop_once (ev=0x55ec5a69b100, location=0x7f4dac782b88 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent_standard.c:110
#21 0x00007f4dabbff2b4 in _tevent_loop_once (ev=0x55ec5a69b100, location=0x7f4dac782b88 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent.c:772
#22 0x00007f4dabbff606 in tevent_common_loop_wait (ev=0x55ec5a69b100, location=0x7f4dac782b88 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent.c:895
#23 0x00007f4dabc079e3 in std_event_loop_wait (ev=0x55ec5a69b100, location=0x7f4dac782b88 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent_standard.c:141
#24 0x00007f4dabbff6a9 in _tevent_loop_wait (ev=0x55ec5a69b100, location=0x7f4dac782b88 "../../source3/smbd/process.c:4232") at ../../lib/tevent/tevent.c:914
#25 0x00007f4dac664179 in smbd_process (ev_ctx=0x55ec5a69b100, msg_ctx=0x55ec5a696780, dce_ctx=0x55ec5a696650, sock_fd=47, interactive=false)
    at ../../source3/smbd/process.c:4232
#26 0x000055ec59837185 in smbd_accept_connection (ev=0x55ec5a69b100, fde=0x55ec5a6d8200, flags=1, private_data=0x55ec5a6d2340) at ../../source3/smbd/server.c:1020
#27 0x00007f4dabc00095 in tevent_common_invoke_fd_handler (fde=0x55ec5a6d8200, flags=1, removed=0x0) at ../../lib/tevent/tevent_fd.c:138
#28 0x00007f4dabc0aafc in epoll_event_loop (epoll_ev=0x55ec5a6ad210, tvalp=0x7fff5fcfbb50) at ../../lib/tevent/tevent_epoll.c:736
#29 0x00007f4dabc0b1df in epoll_event_loop_once (ev=0x55ec5a69b100, location=0x55ec5983f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent_epoll.c:937
#30 0x00007f4dabc07941 in std_event_loop_once (ev=0x55ec5a69b100, location=0x55ec5983f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent_standard.c:110
#31 0x00007f4dabbff2b4 in _tevent_loop_once (ev=0x55ec5a69b100, location=0x55ec5983f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent.c:772
#32 0x00007f4dabbff606 in tevent_common_loop_wait (ev=0x55ec5a69b100, location=0x55ec5983f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent.c:895
#33 0x00007f4dabc079e3 in std_event_loop_wait (ev=0x55ec5a69b100, location=0x55ec5983f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent_standard.c:141
#34 0x00007f4dabbff6a9 in _tevent_loop_wait (ev=0x55ec5a69b100, location=0x55ec5983f1a8 "../../source3/smbd/server.c:1367") at ../../lib/tevent/tevent.c:914
#35 0x000055ec59837ea7 in smbd_parent_loop (ev_ctx=0x55ec5a69b100, parent=0x55ec5a6b12c0) at ../../source3/smbd/server.c:1367
#36 0x000055ec5983a221 in main (argc=3, argv=0x7fff5fcfc178) at ../../source3/smbd/server.c:2220

Will also try it on Ubuntu 20.04LTS (hopefully today)
Comment 33 Michael Albert 2021-07-23 07:11:42 UTC
Created attachment 16700 [details]
log for dbwrap_tdb_mutexes:*=no and recompiling with --disable-tdb-mutex-locking
Comment 34 Michael Albert 2021-07-26 14:07:03 UTC
I tested 4.14.6 with Extra debug for 4.14.x (5.65 KB, patch) appied on Ubuntu:
# uname -a
Linux mySambaTestServer 5.8.0-63-generic #71~20.04.1-Ubuntu SMP Thu Jul 15 17:46:08 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

I couldn't crash smbd on this system.
Comment 35 Jeremy Allison 2021-08-02 18:05:31 UTC
OK, just got back from vacation and trying to catch up :-).

Can I confirm that with this patch:

https://bugzilla.samba.org/attachment.cgi?id=16698

applied to 4.14.6 running on Ubuntu 20.04 with this kernel:

Linux mySambaTestServer 5.8.0-63-generic #71~20.04.1-Ubuntu SMP Thu Jul 15 17:46:08 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

with multiple Windows clients, you do *NOT* see any crashes ?

Is this build using tdb mutexes or not ? I'm trying to determine if there is a kernel for which this bug does not *ever* manifest.

If this is so, then it isn't a Samba bug, but a glibc/kernel one.

FYI, I don't have multiple Windows clients, so it's hard to reproduce this locally for me.
Comment 36 Michael Albert 2021-08-03 10:25:51 UTC
(In reply to Jeremy Allison from comment #35)
Ok, short Summary :-)

Not affected all Samba Versions <= 4.12.x regartless from the Kernel/glic/distribution

Affected:

All Samba versions >= 4.13.x on Ubuntu 16.04 and CentOS 8 regardless if "dbwrap_tdb_mutexes:*=no and recompiling with the --disable-tdb-mutex-locking" is set or not.

Not affected:

Ubuntu 20.04
Comment 37 Michael Albert 2021-08-03 11:00:48 UTC
(In reply to Michael Albert from comment #36)
Yes this patch https://bugzilla.samba.org/attachment.cgi?id=16698 was applied on all tests
Comment 38 Jeremy Allison 2021-08-03 16:05:55 UTC
So this looks to me like a kernel/glibc issue on Ubuntu 16.04 and CentOS 8.

If the problem just "goes away" on Ubuntu 20.04 then it can't be a logic error in the Samba code - it would manifest in all circumstances then.

Ralph, Metze - I'm going to submit the patch in:

https://bugzilla.samba.org/attachment.cgi?id=16698

to ci, and try and get it into next releases. After that, I think we need to point Red Hat kernel engineers at this to try and figure out what is going wrong futex-wise.
Comment 39 Jeremy Allison 2021-08-03 18:09:06 UTC
Oh, unfortunately the fix in:

https://bugzilla.samba.org/attachment.cgi?id=16698

setting the is_fsa is incorrect :-(. Breaks streams file close tests. I'll need to re-work this. Hold on this space :-).
Comment 40 Jeremy Allison 2021-08-03 18:55:53 UTC
The problem is I was only setting fsp->fsp_flags.is_fsa inside fsp_bind_smb() once the smbXsrv_open_create() succeeded. When opening the base_fsp for a stream open, we pass in req==NULL (meaning internal open for base_fsp) which skips setting fsp->fsp_flags.is_fsa.

I'm starting to think 9c277b27dec568677c0b407497da6eb95ae7cb8d is incorrect..
Comment 41 Jeremy Allison 2021-08-03 20:44:54 UTC
No, 9c277b27dec568677c0b407497da6eb95ae7cb8d is good - it's my code that's wrong :-). Still working on it :-).
Comment 42 Jeremy Allison 2021-08-03 22:15:24 UTC
Michael, can you do me a favor ? Can you try running your tests against stock 4.14.6 running on Ubuntu 20.04 *WITHOUT* the patch in:

https://bugzilla.samba.org/attachment.cgi?id=16698

applied and see if they pass ?

I'm trying to figure out if I'm chasing a ghost here. The original problem that the is_fsa modification was trying to fix was a locking order violation when create_file_unixpath() errors out after calling fsp_bind_smb(), or file_new() but before creating a share mode locking entry.

I've been trying to reproduce this locally by forcing a 'goto fail;' call after the fsp_bind_smb() or file_new() for a specific canary filename, then running an existing test that uses that filename to see if I get the same smb_panic lock order violation that you did.

I don't, which leads me to believe the original lock order violation problem may have been an artifact of previous locking failures that the process was getting when running on CentOS 8 and Ubuntu 16.04.
Comment 43 Michael Albert 2021-08-04 10:08:01 UTC
(In reply to Jeremy Allison from comment #42)
Could also NOT crash smbd with Stock 4.14.6 (without patches) on Ubuntu 20.04
Comment 44 Jeremy Allison 2021-08-04 16:08:22 UTC
Ah, as I suspected - my patch was a band-aid on top of the broken locking in CentOS8 and Ubuntu 16.04 and so not really needed. The existing code is already good.

Not sure where to go with this - I think it's worth leaving open at least for a short while in the hopes I can get Red Hat engineers to take a look at it. But if it's already fixed in their latest releases there's not much hope that it is a priority, sorry.
Comment 45 Michael Albert 2021-08-06 06:53:18 UTC
(In reply to Jeremy Allison from comment #44)
Jeremy, tried it on Redhat 8 (4.18.0-305.10.2.el8_4.x86_64 #1 SMP Mon Jul 12 04:43:18 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux)
, build stock samba 4.14.6 without any patch and couldn't crash it. May be they backported some features to their default kernel?
Comment 46 Jeremy Allison 2021-08-06 16:39:34 UTC
That kind of confirms it. I'm going to close this one out as I don't think it's a Samba issue, and Red Hat 8 also doesn't suffer from this problem.
Comment 47 Stefan Metzmacher 2021-08-06 20:11:16 UTC
(In reply to Jeremy Allison from comment #46)

Do we at least know what the problem is?
Comment 48 Jeremy Allison 2021-08-06 20:36:29 UTC
share mode locking is failing on these systems on 4.14 and above. I can't afford to spend more time on it unfortunately when updated systems work :-(.