Bug 15370 - smbd deadlocks in tdb_nest_lock -> fcntl_lock with TCP connection in CLOSED state
Summary: smbd deadlocks in tdb_nest_lock -> fcntl_lock with TCP connection in CLOSED s...
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.18.7
Hardware: x64 FreeBSD
: P5 normal (vote)
Target Milestone: ---
Assignee: Samba QA Contact
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-05-08 18:17 UTC by Peter Eriksson
Modified: 2023-10-06 19:01 UTC (History)
1 user (show)

See Also:


Attachments
Backtrace of process 48592 (9.92 KB, text/plain)
2023-05-19 19:36 UTC, Peter Eriksson
no flags Details
Backtrace of process 51187 (13.01 KB, text/plain)
2023-05-19 19:37 UTC, Peter Eriksson
no flags Details
Backtrace of process 75528 (8.43 KB, text/plain)
2023-05-19 19:37 UTC, Peter Eriksson
no flags Details
tar archive of lsof and smbstatus output for the running processes (3.49 KB, application/x-compressed-tar)
2023-05-19 19:41 UTC, Peter Eriksson
no flags Details
Tar archive with gdb backtrace and lsof outputs (9.42 KB, application/x-compressed-tar)
2023-10-03 13:41 UTC, Peter Eriksson
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Eriksson 2023-05-08 18:17:57 UTC
After upgrading to Samba 4.18.0 a couple of weeks ago I've started seeing some smbd processes that seems to have deadlocked with a closed TCP session (according to "netstat" - state is "CLOSED") and waiting in:

  tdb_nest_lock -> tdb_brlock -> fcntl_lock -> fcntl

This is a bit of a problem if this happens to a client that have locks held on some files at the moment it occurs. 

Manually killing the problematic smbd processes "fixes" the issue and I have a monitoring script running that now watches for when this happens and alerts me.


The strange thing is that it seems to happen not all the time but after some 4-10 days, and it doesn't happen on all our servers (very busy ones) but just two (so far).

I can't currently upgrade to 4.18.2 on the servers where this occurs, but I can't anyway see anything in the release notes for 4.18.1 or 4.18.2 that looks like it be related to a fix. I've downgraded one of them to 4.17 now for the moment.


Here is a part of a GDB backtrace from a running deadlocked smbd process when
this has happened.

0x00000008049affc8 in _fcntl () from /lib/libc.so.7
#0  0x00000008049affc8 in _fcntl () from /lib/libc.so.7
#1  0x0000000807aab8d6 in ?? () from /lib/libthr.so.3
#2  0x00000008048845b7 in fcntl () from /lib/libc.so.7
#3  0x0000000804bf0e5c in fcntl_lock (waitflag=true, len=1, off=7608, rw=1, tdb=0x80f32ed20) at ../../lib/tdb/common/lock.c:58
#4  tdb_brlock (tdb=tdb@entry=0x80f32ed20, rw_type=rw_type@entry=1, offset=offset@entry=7608, len=len@entry=1, flags=flags@entry=TDB_LOCK_WAIT) at ../../lib/tdb/common/lock.c:200
#5  0x0000000804bf142d in tdb_nest_lock (tdb=tdb@entry=0x80f32ed20, offset=offset@entry=7608, ltype=ltype@entry=1, flags=flags@entry=TDB_LOCK_WAIT) at ../../lib/tdb/common/lock.c:390
#6  0x0000000804bf1659 in tdb_lock_list (tdb=tdb@entry=0x80f32ed20, list=list@entry=1860, ltype=ltype@entry=1, waitflag=waitflag@entry=TDB_LOCK_WAIT) at ../../lib/tdb/common/lock.c:482
#7  0x0000000804bf1720 in tdb_lock (tdb=tdb@entry=0x80f32ed20, list=1860, ltype=ltype@entry=1) at ../../lib/tdb/common/lock.c:500
#8  0x0000000804beeb82 in tdb_find_lock_hash (tdb=tdb@entry=0x80f32ed20, key=..., hash=2009077229, locktype=locktype@entry=1, rec=rec@entry=0x7fffffffd960) at ../../lib/tdb/common/tdb.c:165
#9  0x0000000804beed65 in tdb_parse_record (tdb=0x80f32ed20, key=..., parser=parser@entry=0x80589df7d <db_tdb_parser>, private_data=private_data@entry=0x7fffffffd9b0) at ../../lib/tdb/common/tdb.c:329
#10 0x000000080589e2e2 in db_tdb_parse (db=<optimized out>, key=..., parser=0x802108824 <dbwrap_watched_parse_record_parser>, private_data=0x7fffffffda20) at ../../lib/dbwrap/dbwrap_tdb.c:273
#11 0x000000080589ba5f in dbwrap_parse_record (db=<optimized out>, key=..., parser=parser@entry=0x802108824 <dbwrap_watched_parse_record_parser>, private_data=private_data@entry=0x7fffffffda20) at ../../lib/dbwrap/dbwrap.c:425
#12 0x000000080210893b in dbwrap_watched_parse_record (db=0x80f3ffb60, key=..., parser=0x80210daf8 <g_lock_dump_fn>, private_data=0x7fffffffdaa0) at ../../source3/lib/dbwrap/dbwrap_watch.c:783
#13 0x000000080589ba5f in dbwrap_parse_record (db=<optimized out>, key=..., parser=parser@entry=0x80210daf8 <g_lock_dump_fn>, private_data=private_data@entry=0x7fffffffdaa0) at ../../lib/dbwrap/dbwrap.c:425
#14 0x000000080210f129 in g_lock_dump (ctx=<optimized out>, key=..., fn=fn@entry=0x8016d0496 <fetch_share_mode_unlocked_parser>, private_data=private_data@entry=0x7fffffffdb00) at ../../source3/lib/g_lock.c:1653
#15 0x00000008016d1799 in fetch_share_mode_unlocked (mem_ctx=<optimized out>, id=...) at ../../source3/locking/share_mode_lock.c:1567
#16 0x00000008016c6b0e in get_file_infos (id=..., name_hash=name_hash@entry=0, delete_on_close=delete_on_close@entry=0x0, write_time=write_time@entry=0x7fffffffdc80) at ../../source3/locking/locking.c:652
#17 0x00000008016ea7d5 in smbd_dirptr_get_entry (ctx=ctx@entry=0x819095050, dirptr=dirptr@entry=0x80f3a12e0, mask=mask@entry=0x819094e00 "*", dirtype=dirtype@entry=22, dont_descend=dont_descend@entry=false, ask_sharemode=ask_sharemode@entry=true, get_dosmode_in=true, match_fn=0x8016f36d3 <smbd_dirptr_lanman2_match_fn>, mode_fn=0x8016f336b <smbd_dirptr_lanman2_mode_fn>, private_data=0x7fffffffdf80, _fname=0x7fffffffdf50, _smb_fname=0x7fffffffdf58, _mode=0x7fffffffdf44, _prev_offset=0x7fffffffdf48) at ../../source3/smbd/dir.c:1050
Comment 1 Andrew Bartlett 2023-05-09 01:41:34 UTC
The important trace will not be of the process 'stuck' in fcntl, but the process that holds the lock it wants, which won't be in fcntl (it will be somewhere else, busy doing something).
Comment 2 Peter Eriksson 2023-05-09 07:14:14 UTC
Ok, I'll try to catch backtraces of all smbd's (some 400 or so) next time this occurs (which should be in a couple of days if it happens with the same frequency) and we'll see if I can find something interresting :-)
Comment 3 Peter Eriksson 2023-05-19 17:24:21 UTC
Ok, I'm now got a new deadlocked pair of smbd's on another server.

I "gcore" all running smbd processes and got a "gdb bt" backtrace from them all.


root@filur02:/var/tmp/smbd # /liu/sbin/smbreaper
SUSPECT: pid=48592 user=mihha273 locks=1 from=2001:6b0:17:ac00:1000::ed11.63854 started=Fri May 19 17:22:41 2023 CEST
SUSPECT: pid=51187 user=axema002 locks=1 from=2001:6b0:17:ac00:2000::58fb.57873 started=Fri May 19 14:11:27 2023 CEST

Those two and a third one is the only ones sitting at fcntl_lock, 
root@filur02:/var/tmp/smbd # grep '^#[23] ' *.bt | egrep -v poll_event_loop
core.48592.bt:#2  0x00000008048845b7 in fcntl () from /lib/libc.so.7
core.48592.bt:#3  0x0000000804bf0e5c in fcntl_lock (waitflag=true, len=1, off=38356, rw=1, 
core.51187.bt:#2  0x00000008048845b7 in fcntl () from /lib/libc.so.7
core.51187.bt:#3  0x0000000804bf0e5c in fcntl_lock (waitflag=true, len=1, off=38356, rw=1, 
core.75528.bt:#2  0x00000008048845b7 in fcntl () from /lib/libc.so.7
core.75528.bt:#3  0x0000000804bf0e5c in fcntl_lock (waitflag=true, len=1, off=38356, rw=1, 

root@filur02:/var/tmp/smbd # smbstatus|egrep 75528
75528   nikha067     student-liu.se 2001:6b0:17:ac00:1000::5317 (ipv6:2001:6b0:17:ac00:1000::5317:50595) SMB3_11           -                    partial(AES-128-CMAC)
IPC$         75528   2001:6b0:17:ac00:1000::5317 Fri May 19 17:35:43 2023 CEST    -            AES-128-CMAC
students     75528   2001:6b0:17:ac00:1000::5317 Fri May 19 11:14:39 2023 CEST    -            AES-128-CMAC
75528        1000095530  DENY_NONE  0x100081    RDONLY     NONE             /export/students   nikha067/SettingsPackages/Microsoft-Windows-Powershell   Fri May 19 17:35:48 2023


The other two have a lot of files open according to "smbstatus". I can wait a little with restarting the smbd processes for now if you have some ideas on what to test. 

Btw, on the other server where I was experiencing this I backed down to 4.17.6 and there I haven't seen this issue again.


lsof output for this process:

oot@filur02:/var/tmp/smbd # lsof -p 75528
lsof: WARNING: compiled for FreeBSD release 12.3-RELEASE-p7; this is 12.3-RELEASE-p6.
lsof: WARNING: device cache mismatch: /dev/ttyu1
COMMAND   PID     USER   FD   TYPE                DEVICE  SIZE/OFF   NODE NAME
smbd    75528 nikha067  txt   VREG 1441510162,2280381956    212224 297721 /liu/pkg/samba/4.18.0-liu/sbin/smbd (zroot/ROOT/default)
smbd    75528 nikha067  cwd   VDIR  620489695,2065293036     34939      4 /export/students (DATA/students)
smbd    75528 nikha067  rtd   VDIR 1441510162,2280381956        32      4 / (zroot/ROOT/default)
smbd    75528 nikha067    0u  VCHR                  0,14       0t0     14 /dev/null (devfs)
smbd    75528 nikha067    1u  VCHR                  0,14       0t0     14 /dev/null (devfs)
smbd    75528 nikha067    2w  VREG 1441510162,2280381956   3576637 457153 /liu/var/samba/logs/log.smbd (zroot/ROOT/default)
smbd    75528 nikha067    3u  unix    0xfffff85aa6856368       0t0        ->0xfffff8427762a6d0
smbd    75528 nikha067    4u  VREG 1441510162,2280381956    430080 210017 /liu/etc/samba/private/secrets.tdb (zroot/ROOT/default)
smbd    75528 nikha067    5uW VREG 1441510162,2280381956        20 412038 /liu/var/samba/locks/msg.lock/75528 (zroot/ROOT/default)
smbd    75528 nikha067    6u  unix    0xfffff80562305a38       0t0        /liu/etc/samba/private/msg.sock/75528
smbd    75528 nikha067    7u  VREG 1441510162,2280381956     12288 114723 /liu/var/samba/locks/names.tdb (zroot/ROOT/default)
smbd    75528 nikha067    8r  VDIR 1355234787,1513152036        99     50 /export/students/nikha067/SettingsPackages/Microsoft-Windows-Powershell (DATA/students/nikha067)
smbd    75528 nikha067    9u  VCHR                  0,14       0t0     14 /dev/null (devfs)
smbd    75528 nikha067   10u  VCHR                  0,14       0t0     14 /dev/null (devfs)
smbd    75528 nikha067   11w  VREG 1441510162,2280381956         6 451482 /liu/var/samba/run/smbd.pid (zroot/ROOT/default)
smbd    75528 nikha067   12u  PIPE    0xfffff85228333000     16384        ->0xfffff85228333168
smbd    75528 nikha067   13u  PIPE    0xfffff83dc4835000     16384        ->0xfffff83dc4835168
smbd    75528 nikha067   14u  PIPE    0xfffff83dc4835168      4096        ->0xfffff83dc4835000
smbd    75528 nikha067   15w  VREG 1441510162,2280381956   3576637 457153 /liu/var/samba/logs/log.smbd (zroot/ROOT/default)
smbd    75528 nikha067   16u  VREG 1441510162,2280381956     16384 114728 /liu/var/samba/locks/smbXsrv_version_global.tdb (zroot/ROOT/default)
smbd    75528 nikha067   17u  VREG 1441510162,2280381956    253952 114769 /liu/var/samba/locks/smbXsrv_client_global.tdb (zroot/ROOT/default)
smbd    75528 nikha067   18u  VREG 1441510162,2280381956 115605504 321458 /liu/var/samba/locks/smbXsrv_session_global.tdb (zroot/ROOT/default)
smbd    75528 nikha067   19u  VREG 1441510162,2280381956   2306048 114730 /liu/var/samba/locks/smbXsrv_tcon_global.tdb (zroot/ROOT/default)
smbd    75528 nikha067   20u  VREG 1441510162,2280381956   1474560 114731 /liu/var/samba/locks/brlock.tdb (zroot/ROOT/default)
smbd    75528 nikha067   21u  VREG 1441510162,2280381956 149245952 451485 /liu/var/samba/locks/locking.tdb (zroot/ROOT/default)
smbd    75528 nikha067   22u  VREG 1441510162,2280381956     40200 114736 /liu/var/samba/locks/leases.tdb (zroot/ROOT/default)
smbd    75528 nikha067   23u  VREG 1441510162,2280381956    421888 114751 /liu/var/samba/state/share_info.tdb (zroot/ROOT/default)
smbd    75528 nikha067   24u  VREG 1441510162,2280381956  53071872 269463 /liu/var/samba/locks/gencache.tdb (zroot/ROOT/default)
smbd    75528 nikha067   25u  unix    0xfffff8054a0a6000       0t0        ->(none)
smbd    75528 nikha067   26u  VREG 1441510162,2280381956    425984 114755 /liu/var/samba/state/group_mapping.tdb (zroot/ROOT/default)
smbd    75528 nikha067   27u  VREG 1441510162,2280381956    421888 114758 /liu/var/samba/state/account_policy.tdb (zroot/ROOT/default)
smbd    75528 nikha067   28u  VREG 1441510162,2280381956    421888 210015 /liu/etc/samba/private/passdb.tdb (zroot/ROOT/default)
smbd    75528 nikha067   29u  VREG 1441510162,2280381956  11005952 114759 /liu/var/samba/locks/smbXsrv_open_global.tdb (zroot/ROOT/default)
smbd    75528 nikha067   30r  VREG 1355234787,1513152036     27208  69865 /export/students/nikha067/SettingsPackages/Microsoft-Windows-Powershell/12CA888A-96C3-4D71-B309-0311B54A61BE.pkgdat (DATA/students/nikha067)
smbd    75528 nikha067   31u  unix    0xfffff85aa68566d0       0t0        ->0xfffff8730445c368
smbd    75528 nikha067   34u  IPv6    0xfffff83f621ccb88       0t0    TCP filur02.it.liu.se:microsoft-ds->gimle-405.ad.liu.se:50595 (CLOSED)
smbd    75528 nikha067   36u  VREG 1441510162,2280381956     12288 114778 /liu/var/samba/locks/smbd_cleanupd.tdb (zroot/ROOT/default)


And a more detailed backtrace:

(gdb) bt
#0  0x00000008049affc8 in _fcntl () from /lib/libc.so.7
#1  0x0000000807aab8d6 in ?? () from /lib/libthr.so.3
#2  0x00000008048845b7 in fcntl () from /lib/libc.so.7
#3  0x0000000804bf0e5c in fcntl_lock (waitflag=true, len=1, off=38356, rw=1, tdb=0x80f32ed20) at ../../lib/tdb/common/lock.c:58
#4  tdb_brlock (tdb=tdb@entry=0x80f32ed20, rw_type=rw_type@entry=1, offset=offset@entry=38356, len=len@entry=1, 
    flags=flags@entry=TDB_LOCK_WAIT) at ../../lib/tdb/common/lock.c:200
#5  0x0000000804bf142d in tdb_nest_lock (tdb=tdb@entry=0x80f32ed20, offset=offset@entry=38356, ltype=ltype@entry=1, 
    flags=flags@entry=TDB_LOCK_WAIT) at ../../lib/tdb/common/lock.c:390
#6  0x0000000804bf1659 in tdb_lock_list (tdb=tdb@entry=0x80f32ed20, list=list@entry=9547, ltype=ltype@entry=1, 
    waitflag=waitflag@entry=TDB_LOCK_WAIT) at ../../lib/tdb/common/lock.c:482
#7  0x0000000804bf1720 in tdb_lock (tdb=tdb@entry=0x80f32ed20, list=9547, ltype=ltype@entry=1) at ../../lib/tdb/common/lock.c:500
#8  0x0000000804beeb82 in tdb_find_lock_hash (tdb=tdb@entry=0x80f32ed20, key=..., hash=3111656176, locktype=locktype@entry=1, 
    rec=rec@entry=0x7fffffffd960) at ../../lib/tdb/common/tdb.c:165
#9  0x0000000804beed65 in tdb_parse_record (tdb=0x80f32ed20, key=..., parser=parser@entry=0x80589df7d <db_tdb_parser>, 
    private_data=private_data@entry=0x7fffffffd9b0) at ../../lib/tdb/common/tdb.c:329
#10 0x000000080589e2e2 in db_tdb_parse (db=<optimized out>, key=..., parser=0x802108824 <dbwrap_watched_parse_record_parser>, 
    private_data=0x7fffffffda20) at ../../lib/dbwrap/dbwrap_tdb.c:273
#11 0x000000080589ba5f in dbwrap_parse_record (db=<optimized out>, key=..., 
    parser=parser@entry=0x802108824 <dbwrap_watched_parse_record_parser>, private_data=private_data@entry=0x7fffffffda20)
    at ../../lib/dbwrap/dbwrap.c:425
#12 0x000000080210893b in dbwrap_watched_parse_record (db=0x80f3ffb60, key=..., parser=0x80210daf8 <g_lock_dump_fn>, 
    private_data=0x7fffffffdaa0) at ../../source3/lib/dbwrap/dbwrap_watch.c:783
#13 0x000000080589ba5f in dbwrap_parse_record (db=<optimized out>, key=..., parser=parser@entry=0x80210daf8 <g_lock_dump_fn>, 
    private_data=private_data@entry=0x7fffffffdaa0) at ../../lib/dbwrap/dbwrap.c:425
#14 0x000000080210f129 in g_lock_dump (ctx=<optimized out>, key=..., fn=fn@entry=0x8016d0496 <fetch_share_mode_unlocked_parser>, 
    private_data=private_data@entry=0x7fffffffdb00) at ../../source3/lib/g_lock.c:1653
#15 0x00000008016d1799 in fetch_share_mode_unlocked (mem_ctx=<optimized out>, id=...)
    at ../../source3/locking/share_mode_lock.c:1567
#16 0x00000008016c6b0e in get_file_infos (id=..., name_hash=name_hash@entry=0, delete_on_close=delete_on_close@entry=0x0, 
    write_time=write_time@entry=0x7fffffffdc80) at ../../source3/locking/locking.c:652
#17 0x00000008016ea7d5 in smbd_dirptr_get_entry (ctx=ctx@entry=0x80f3b7640, dirptr=dirptr@entry=0x80f3a0da0, 
    mask=mask@entry=0x80f32d0a0 "*", dirtype=dirtype@entry=22, dont_descend=dont_descend@entry=false, 
    ask_sharemode=ask_sharemode@entry=true, get_dosmode_in=true, match_fn=0x8016f36d3 <smbd_dirptr_lanman2_match_fn>, 
    mode_fn=0x8016f336b <smbd_dirptr_lanman2_mode_fn>, private_data=0x7fffffffdf80, _fname=0x7fffffffdf50, 
    _smb_fname=0x7fffffffdf58, _mode=0x7fffffffdf44, _prev_offset=0x7fffffffdf48) at ../../source3/smbd/dir.c:1050
#18 0x00000008016f75ae in smbd_dirptr_lanman2_entry (ctx=ctx@entry=0x80f3b7640, conn=0x80f3a8c60, dirptr=0x80f3a0da0, 
    flags2=<optimized out>, path_mask=<optimized out>, dirtype=22, info_level=260, requires_resume_key=0, dont_descend=false, 
    ask_sharemode=true, get_dosmode=true, align=8 '\b', do_pad=false, ppdata=0x80f3b76a0, base_data=0x80f37c060 "\270", 
    end_data=0x80f37e057 "", space_remaining=2988, _smb_fname=0x7fffffffe0f0, got_exact_match=0x7fffffffe0ee, 
    _last_entry_off=0x80f3b76c4, name_list=0x0, file_id=0x7fffffffe100) at ../../source3/smbd/smb2_trans2.c:1958
#19 0x0000000801754d03 in smb2_query_directory_next_entry (req=req@entry=0x80f3b7480)
    at ../../source3/smbd/smb2_query_directory.c:561
#20 0x000000080175639b in smbd_smb2_query_directory_send (in_file_name=<optimized out>, in_output_buffer_length=<optimized out>, 
    in_file_index=<optimized out>, in_flags=0 '\000', in_file_info_class=3 '\003', fsp=0x80f376720, smb2req=0x813118060, 
    ev=0x80f3a8060, mem_ctx=0x813118060) at ../../source3/smbd/smb2_query_directory.c:529
Comment 4 Ralph Böhme 2023-05-19 19:33:55 UTC
Please attach the backtraces of all three processes.
Comment 5 Peter Eriksson 2023-05-19 19:36:34 UTC
Created attachment 17892 [details]
Backtrace of process 48592
Comment 6 Peter Eriksson 2023-05-19 19:37:00 UTC
Created attachment 17893 [details]
Backtrace of process 51187
Comment 7 Peter Eriksson 2023-05-19 19:37:29 UTC
Created attachment 17894 [details]
Backtrace of process 75528
Comment 8 Peter Eriksson 2023-05-19 19:41:06 UTC
Created attachment 17895 [details]
tar archive of lsof and smbstatus output for the running processes
Comment 9 Peter Eriksson 2023-10-03 13:40:07 UTC
Still the same issue with 4.18.7

I just caught three smbd processes deadlocked trying to aquire a fcntl() lock for the "locking.tdb" database.

Killing all three fixes to problem and Samba will happily continue serving clients. If I don't kill them then smbd processes will start to accumulate.

I tried looking for something else keeping a lock on the locking.tdb database but none was found.

I'll upload a tar archive of gdb backtraces & lsof output for all three processes (and the smb.conf file).

A "tdbtool locking.tdb check" finds no errors.

Two of the three processes references full_audit in their back traces but they seem to be newer (higher pid numbers) than the first one which seems to be busy doing some directory listing?

This is a server for home directories so the users *shouldn't* be accessing some common directory but their own directories only.

Hmm.. The user that owns the first smbd that is doing the directory listing seems to have a *lot* of files & directories in his Home...

241342 directories, 235383 files, 15 levels deep.

But it should not be blocking just because of that. Hmm...
Comment 10 Peter Eriksson 2023-10-03 13:41:25 UTC
Created attachment 18150 [details]
Tar archive with gdb backtrace and lsof outputs
Comment 11 Peter Eriksson 2023-10-06 18:52:42 UTC
Happened again. This time I think I found the real smbd that was holding the lock and blocking the others. It was a smbd running as "root" but sitting at the poll_even_loop_poll(). However an "lsof" on it located that it was having "locking.tdb" open with a read lock on it.

(gdb) bt
#0  0x000019ec86180f5a in _poll () from /lib/libc.so.7
#1  0x000019ec9436fd36 in ?? () from /lib/libthr.so.3
#2  0x000019ec7cb0082f in poll_event_loop_poll (tvalp=0x19ec75c83ca0, ev=0x19ecc1c1b060) at ../../lib/tevent/tevent_poll.c:483
#3  poll_event_loop_once (ev=0x19ecc1c1b060, location=<optimized out>) at ../../lib/tevent/tevent_poll.c:626
#4  0x000019ec7cafd77e in _tevent_loop_once (ev=ev@entry=0x19ecc1c1b060, location=location@entry=0x19e455449368 "../../source3/smbd/server.c:1373")
    at ../../lib/tevent/tevent.c:823
#5  0x000019ec7cafd97b in tevent_common_loop_wait (ev=0x19ecc1c1b060, location=0x19e455449368 "../../source3/smbd/server.c:1373")
    at ../../lib/tevent/tevent.c:949
#6  0x000019ec7cafd9dd in _tevent_loop_wait (ev=ev@entry=0x19ecc1c1b060, location=location@entry=0x19e455449368 "../../source3/smbd/server.c:1373")
    at ../../lib/tevent/tevent.c:968
#7  0x000019e45544672c in smbd_parent_loop (parent=<optimized out>, ev_ctx=0x19ecc1c1b060) at ../../source3/smbd/server.c:1373
#8  main (argc=<optimized out>, argv=<optimized out>) at ../../source3/smbd/server.c:2130
(gdb) 
#0  0x000019ec86180f5a in _poll () from /lib/libc.so.7
#1  0x000019ec9436fd36 in ?? () from /lib/libthr.so.3
#2  0x000019ec7cb0082f in poll_event_loop_poll (tvalp=0x19ec75c83ca0, ev=0x19ecc1c1b060) at ../../lib/tevent/tevent_poll.c:483
#3  poll_event_loop_once (ev=0x19ecc1c1b060, location=<optimized out>) at ../../lib/tevent/tevent_poll.c:626
#4  0x000019ec7cafd77e in _tevent_loop_once (ev=ev@entry=0x19ecc1c1b060, location=location@entry=0x19e455449368 "../../source3/smbd/server.c:1373")
    at ../../lib/tevent/tevent.c:823
#5  0x000019ec7cafd97b in tevent_common_loop_wait (ev=0x19ecc1c1b060, location=0x19e455449368 "../../source3/smbd/server.c:1373")
    at ../../lib/tevent/tevent.c:949
#6  0x000019ec7cafd9dd in _tevent_loop_wait (ev=ev@entry=0x19ecc1c1b060, location=location@entry=0x19e455449368 "../../source3/smbd/server.c:1373")
    at ../../lib/tevent/tevent.c:968
#7  0x000019e45544672c in smbd_parent_loop (parent=<optimized out>, ev_ctx=0x19ecc1c1b060) at ../../source3/smbd/server.c:1373
#8  main (argc=<optimized out>, argv=<optimized out>) at ../../source3/smbd/server.c:2130

COMMAND   PID USER   FD      TYPE                DEVICE  SIZE/OFF   NODE NAME
smbd    39610 root  txt      VREG 1441510162,2280381956    211424 358328 /liu/pkg/samba/4.18.7/sbin/smbd (zroot/ROOT/default)
smbd    39610 root  cwd      VDIR 1441510162,2280381956        34      4 / (zroot/ROOT/default)
smbd    39610 root  rtd      VDIR 1441510162,2280381956        34      4 / (zroot/ROOT/default)
smbd    39610 root    0u     VCHR                  0,41       0t0     41 /dev/null (devfs)
smbd    39610 root    1u     VCHR                  0,41       0t0     41 /dev/null (devfs)
smbd    39610 root    2w     VREG 1441510162,2280381956   5120299 272757 / (zroot/ROOT/default)
smbd    39610 root    3u     unix    0xfffff841a89d5b10       0t0        ->0xfffff801ec6e7000
smbd    39610 root    4u     VREG 1441510162,2280381956    430080 210017 /liu/etc/samba/private/secrets.tdb (zroot/ROOT/default)
smbd    39610 root    5uW    VREG 1441510162,2280381956        21 357390 /liu/var/samba/locks/msg.lock/39610 (zroot/ROOT/default)
smbd    39610 root    6u     unix    0xfffff8021b9c53b0       0t0        /liu/etc/samba/private/msg.sock/39610
smbd    39610 root    7ur    VREG 1441510162,2280381956     12288 114723 /liu/var/samba/locks/names.tdb (zroot/ROOT/default)
smbd    39610 root    8w     VREG 1441510162,2280381956   5120299 272757 / (zroot/ROOT/default)
smbd    39610 root    9u     VCHR                  0,41       0t0     41 /dev/null (devfs)
smbd    39610 root   10u     VCHR                  0,41       0t0     41 /dev/null (devfs)
smbd    39610 root   11wW    VREG 1441510162,2280381956         6 357389 /liu/var/samba/run/smbd.pid (zroot/ROOT/default)
smbd    39610 root   12u     PIPE    0xfffff8192b3ce5d0     16384        ->0xfffff8192b3ce730
smbd    39610 root   13u     PIPE    0xfffff8192b3ce730      4096        ->0xfffff8192b3ce5d0
smbd    39610 root   14u  EVENTFD                                        value=0, flags=0x4
smbd    39610 root   15ur    VREG 1441510162,2280381956     16384 114728 /liu/var/samba/locks/smbXsrv_version_global.tdb (zroot/ROOT/default)
smbd    39610 root   16ur    VREG 1441510162,2280381956    233472 114769 /liu/var/samba/locks/smbXsrv_client_global.tdb (zroot/ROOT/default)
smbd    39610 root   17ur    VREG 1441510162,2280381956 119623680 321458 /liu/var/samba/locks/smbXsrv_session_global.tdb (zroot/ROOT/default)
smbd    39610 root   18ur    VREG 1441510162,2280381956   2306048 114730 /liu/var/samba/locks/smbXsrv_tcon_global.tdb (zroot/ROOT/default)
smbd    39610 root   19ur    VREG 1441510162,2280381956   2883584 114731 /liu/var/samba/locks/brlock.tdb (zroot/ROOT/default)
smbd    39610 root   20ur    VREG 1441510162,2280381956 122077184 357392 /liu/var/samba/locks/locking.tdb (zroot/ROOT/default)
smbd    39610 root   21ur    VREG 1441510162,2280381956     40200 114736 /liu/var/samba/locks/leases.tdb (zroot/ROOT/default)
smbd    39610 root   22u     VREG 1441510162,2280381956    421888 114751 /liu/var/samba/state/share_info.tdb (zroot/ROOT/default)
smbd    39610 root   23u     VREG 1441510162,2280381956  53071872 269463 /liu/var/samba/locks/gencache.tdb (zroot/ROOT/default)
smbd    39610 root   24u     unix    0xfffff817f0cf4000       0t0        ->(none)
smbd    39610 root   25u     VREG 1441510162,2280381956    425984 114755 /liu/var/samba/state/group_mapping.tdb (zroot/ROOT/default)
smbd    39610 root   26u     VREG 1441510162,2280381956    421888 114758 /liu/var/samba/state/account_policy.tdb (zroot/ROOT/default)
smbd    39610 root   27u     VREG 1441510162,2280381956    421888 210015 /liu/etc/samba/private/passdb.tdb (zroot/ROOT/default)
smbd    39610 root   28ur    VREG 1441510162,2280381956  11005952 114759 /liu/var/samba/locks/smbXsrv_open_global.tdb (zroot/ROOT/default)
smbd    39610 root   29u     IPv4    0xfffffe169f59f518       0t0    TCP filur02.it.liu.se:microsoft-ds->*:* (LISTEN)
smbd    39610 root   30u     IPv4    0xfffffe150cded518       0t0    TCP filur02.it.liu.se:netbios-ssn->*:* (LISTEN)
smbd    39610 root   31u     IPv6    0xfffffe16e95d50e0       0t0    TCP filur02.it.liu.se:microsoft-ds (LISTEN)
smbd    39610 root   32u     IPv6    0xfffffe13a40300e0       0t0    TCP filur02.it.liu.se:netbios-ssn (LISTEN)
smbd    39610 root   33u     unix                             0t0        can't read unpcb at 0xfffff84f455c3300
smbd    39610 root   35ur    VREG 1441510162,2280381956     12288 114778 /liu/var/samba/locks/smbd_cleanupd.tdb (zroot/ROOT/default)


As soon as I killed that one the other blocked smbd's continued.
Comment 12 Peter Eriksson 2023-10-06 19:01:19 UTC
Hmm..


# fgrep locking.tdb lsof.all | egrep 'ur|uw' 

smbd      39610     root   20ur    VREG 1441510162,2280381956          122077184 357392 /liu/var/samba/locks/locking.tdb (zroot/ROOT/default)

smbd      73641  micbe11   20uw    VREG 1441510162,2280381956          122077184 357392 /liu/var/samba/locks/locking.tdb (zroot/ROOT/default)

Process 39610 has a read-lock (this was the process I killed that made the other blocked processes continue). 

I wonder what 73641 is doing though... Strange.

(gdb) #0  0x000019ec86180e18 in _fcntl () from /lib/libc.so.7
#1  0x000019ec9436faf6 in ?? () from /lib/libthr.so.3
#2  0x000019ec8617d977 in fcntl () from /lib/libc.so.7
#3  0x000019ec88237e94 in fcntl_lock (waitflag=true, len=1, off=11236, rw=1, tdb=0x19ecc1c0fb60) at ../../lib/tdb/common/lock.c:58
#4  tdb_brlock (tdb=tdb@entry=0x19ecc1c0fb60, rw_type=rw_type@entry=1, offset=offset@entry=11236, len=len@entry=1, flags=flags@entry=TDB_LOCK_WAIT)
    at ../../lib/tdb/common/lock.c:200
#5  0x000019ec8823846b in tdb_nest_lock (tdb=tdb@entry=0x19ecc1c0fb60, offset=offset@entry=11236, ltype=ltype@entry=1, flags=flags@entry=TDB_LOCK_WAIT)
    at ../../lib/tdb/common/lock.c:390
#6  0x000019ec88238696 in tdb_lock_list (tdb=tdb@entry=0x19ecc1c0fb60, list=list@entry=2767, ltype=ltype@entry=1, waitflag=waitflag@entry=TDB_LOCK_WAIT)
    at ../../lib/tdb/common/lock.c:482
#7  0x000019ec8823875d in tdb_lock (tdb=tdb@entry=0x19ecc1c0fb60, list=2767, ltype=ltype@entry=1) at ../../lib/tdb/common/lock.c:500
#8  0x000019ec88235ba9 in tdb_find_lock_hash (tdb=tdb@entry=0x19ecc1c0fb60, key=..., hash=2144392790, locktype=locktype@entry=1, 
    rec=rec@entry=0x19ec75c82890) at ../../lib/tdb/common/tdb.c:165
#9  0x000019ec88235d8c in tdb_parse_record (tdb=0x19ecc1c0fb60, key=..., parser=parser@entry=0x19ec88d7ffa2 <db_tdb_parser>, 
    private_data=private_data@entry=0x19ec75c828e0) at ../../lib/tdb/common/tdb.c:329
#10 0x000019ec88d80307 in db_tdb_parse (db=<optimized out>, key=..., parser=0x19ec77aacc0d <brl_get_locks_readonly_parser>, private_data=0x19ec75c82960)
    at ../../lib/dbwrap/dbwrap_tdb.c:273
#11 0x000019ec88d7da81 in dbwrap_parse_record (db=<optimized out>, key=..., parser=parser@entry=0x19ec77aacc0d <brl_get_locks_readonly_parser>, 
    private_data=private_data@entry=0x19ec75c82960) at ../../lib/dbwrap/dbwrap.c:425
#12 0x000019ec77aaec30 in brl_get_locks_readonly (fsp=fsp@entry=0x19ecc1c1faa0) at ../../source3/locking/brlock.c:1887
#13 0x000019ec77af11f2 in file_has_brlocks (fsp=0x19ecc1c1faa0) at ../../source3/smbd/open.c:2277
#14 delay_for_oplock (pgranted=<synthetic pointer>, poplock_type=<synthetic pointer>, first_open_attempt=true, create_disposition=<optimized out>, 
    have_sharing_violation=false, lck=0x19ec75c83280, lease=0x0, oplock_request=0, fsp=0x19ecc1c1faa0) at ../../source3/smbd/open.c:2794
#15 handle_share_mode_lease (pgranted=<synthetic pointer>, poplock_type=<synthetic pointer>, first_open_attempt=true, lease=0x0, oplock_request=0, 
    share_access=1, access_mask=1179785, create_disposition=<optimized out>, lck=0x19ec75c83280, fsp=0x19ecc1c1faa0) at ../../source3/smbd/open.c:2910
#16 check_and_store_share_mode (first_open_attempt=true, lease=0x0, oplock_request=0, share_access=1, access_mask=1179785, 
    create_disposition=<optimized out>, lck=0x19ec75c83280, req=0x19ecc1cbbfa0, fsp=0x19ecc1c1faa0) at ../../source3/smbd/open.c:3378
#17 open_ntcreate_lock_add_entry (lck=0x19ec75c83280, keep_locked=0x19ec75c83008, private_data=0x19ec75c83260) at ../../source3/smbd/open.c:3723
#18 0x000019ec77ab4a21 in share_mode_entry_prepare_lock_fn (glck=0x19ec75c82b60, cb_private=0x19ec75c82fc0) at ../../source3/locking/share_mode_lock.c:2978
#19 0x000019ec7afb41dc in g_lock_lock_cb_run_and_store (cb_state=cb_state@entry=0x19ec75c82b60) at ../../source3/lib/g_lock.c:597
#20 0x000019ec7afb5552 in g_lock_lock_simple_fn (rec=0x19ec75c82d80, value=..., private_data=0x19ec75c82f30) at ../../source3/lib/g_lock.c:1212
#21 0x000019ec7afb283d in dbwrap_watched_do_locked_fn (backend_rec=backend_rec@entry=0x19ec75c82c80, backend_value=..., 
    private_data=private_data@entry=0x19ec75c82d50) at ../../source3/lib/dbwrap/dbwrap_watch.c:458
#22 0x000019ec88d8062e in db_tdb_do_locked (db=<optimized out>, key=..., fn=0x19ec7afb27fd <dbwrap_watched_do_locked_fn>, private_data=0x19ec75c82d50)
    at ../../lib/dbwrap/dbwrap_tdb.c:208
#23 0x000019ec88d7ddbe in dbwrap_do_locked (db=0x19ecc1c7eb60, key=..., fn=fn@entry=0x19ec7afb27fd <dbwrap_watched_do_locked_fn>, 
    private_data=private_data@entry=0x19ec75c82d50) at ../../lib/dbwrap/dbwrap.c:553
#24 0x000019ec7afb1acb in dbwrap_watched_do_locked (db=0x19ecc1c7ec60, key=..., fn=0x19ec7afb5415 <g_lock_lock_simple_fn>, private_data=0x19ec75c82f30)
    at ../../source3/lib/dbwrap/dbwrap_watch.c:480
#25 0x000019ec88d7ddbe in dbwrap_do_locked (db=0x19ecc1c7ec60, key=..., fn=fn@entry=0x19ec7afb5415 <g_lock_lock_simple_fn>, 
    private_data=private_data@entry=0x19ec75c82f30) at ../../lib/dbwrap/dbwrap.c:553
#26 0x000019ec7afb6b7d in g_lock_lock (ctx=0x19ecc1c86960, key=..., type=type@entry=G_LOCK_WRITE, timeout=..., cb_fn=<optimized out>, 
    cb_private=<optimized out>) at ../../source3/lib/g_lock.c:1267
#27 0x000019ec77ab941c in _share_mode_entry_prepare_lock (prepare_state=prepare_state@entry=0x19ec75c83260, id=..., servicepath=<optimized out>, 
    smb_fname=smb_fname@entry=0x19ecc1c28880, old_write_time=old_write_time@entry=0x19ec75c83160, fn=fn@entry=0x19ec77af0a33 <open_ntcreate_lock_add_entry>, 
    private_data=private_data@entry=0x19ec75c83260, location=<optimized out>) at ../../source3/locking/share_mode_lock.c:3033
#28 0x000019ec77af4ed2 in open_file_ntcreate (fsp=<optimized out>, pinfo=0x19ec75c83138, smb_fname_atname=<optimized out>, parent_dir_fname=<optimized out>, 
    private_flags=<optimized out>, lease=<optimized out>, oplock_request=<optimized out>, new_dos_attributes=<optimized out>, 
    create_options=<optimized out>, create_disposition=<optimized out>, share_access=<optimized out>, access_mask=<optimized out>, req=<optimized out>, 
    conn=0x19ecc1c1c9e0) at ../../source3/smbd/open.c:4302
#29 create_file_unixpath (conn=conn@entry=0x19ecc1c1c9e0, req=req@entry=0x19ecc1cbbfa0, dirfsp=<optimized out>, dirfsp@entry=0x19ecc1c208a0, 
    smb_fname=smb_fname@entry=0x19ecc1c28580, access_mask=access_mask@entry=1179785, share_access=share_access@entry=1, create_disposition=<optimized out>, 
    create_options=<optimized out>, file_attributes=<optimized out>, oplock_request=<optimized out>, lease=<optimized out>, allocation_size=<optimized out>, 
    private_flags=<optimized out>, sd=<optimized out>, ea_list=<optimized out>, result=<optimized out>, pinfo=<optimized out>)
    at ../../source3/smbd/open.c:6299
#30 0x000019ec77af6cd2 in create_file_default (conn=<optimized out>, req=0x19ecc1cbbfa0, dirfsp=<optimized out>, smb_fname=0x19ecc1c28580, 
    access_mask=<optimized out>, share_access=<optimized out>, create_disposition=<optimized out>, create_options=<optimized out>, 
    file_attributes=<optimized out>, oplock_request=<optimized out>, lease=<optimized out>, allocation_size=<optimized out>, private_flags=<optimized out>, 
    sd=<optimized out>, ea_list=<optimized out>, result=<optimized out>, pinfo=<optimized out>, in_context_blobs=<optimized out>,