Bug 15537 - Deadlock between sb->s_umount and ses->session_mutex when server is unresponsive
Summary: Deadlock between sb->s_umount and ses->session_mutex when server is unresponsive
Status: NEW
Alias: None
Product: CifsVFS
Classification: Unclassified
Component: kernel fs (show other bugs)
Version: 5.x
Hardware: All All
: P5 normal
Target Milestone: ---
Assignee: Steve French
QA Contact: cifs QA contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-12-19 16:21 UTC by Ondrej Hubsch
Modified: 2023-12-19 16:21 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ondrej Hubsch 2023-12-19 16:21:22 UTC
We've hit this deadlock on 2.41 / centos stream kernel 5.14.0-330.el9.x86_64 (it backported some patches from 6.2). From the source code I can't see what would avoid it on 6.6.7, but the steps to reproduce it are not working on that kernel version. Could somebody please point me to a patch that is supposed to fix it in case this deadlock is addressed? 

Basically, there seems to be a deadlock between sb->s_umount write lock obtained while mounting and ses->session_mutex, and only reboot of the client machine seems to resolve things (since both processes are stuck in D state).

There are likely more ways to hit this bug, we've consistently reproduced this through artificial delays on the SMB server side through the following steps:
1) Setup autofs mount on a directory (this may not be needed)
2) Delayed all responses on the server by 6000ms, so that we can increase this delay at a specific SMB2 request manually while observing communication in Wireshark
3) Invoke the autofs mount
4) Just before the last SMB2_CLOSE (root dir) request comes in that happens before the first SMB2_CREATE+SMB2_QUERY_INFO+SMB2_CLOSE compound, increase the delay to 200s. Note that we need this very long delay to happen for this last SMB2_CLOSE, not afterwards when the compound is received by the server -- this seems to be too late.

What happens then:
1) the thread that mounts the share has obtained the superblock write lock through cifs_smb3_do_mount / sget / alloc_super call sequence. This lock is held afterwards.
2) Because of the long delay that exceeds 180s, a background worker with __cifs_reconnect / smb2_reconnect routine is invoked. It's able to take ses->session_mutex because nothing currently holds it, and calls cifs_tree_connect. This iterates through the superblocks in cifs_get_tcp_super, and in the iterate_supers_type we want to get a read lock on each superblock -- including the currently write-locked one from 1), since it was already added to the list as well.
3) Since 2) can't make progress (it's waiting for sb->s_umount that 1) holds), thread from 1) will eventually hit the ses->session_mutex and 1) can't proceed as well since 2) holds the ses->session_mutex already.

Stacktraces from the two stuck portions:

[Fri Dec 15 06:28:30 2023] INFO: task kworker/3:1:197 blocked for more than 368 seconds.
[Fri Dec 15 06:28:30 2023]       Not tainted 5.14.0-330.el9.x86_64 #1
[Fri Dec 15 06:28:30 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Dec 15 06:28:30 2023] task:kworker/3:1     state:D stack:0     pid:197   ppid:2      flags:0x00004000
[Fri Dec 15 06:28:30 2023] Workqueue: cifsiod smb2_reconnect_server [cifs]
[Fri Dec 15 06:28:30 2023] Call Trace:
[Fri Dec 15 06:28:30 2023]  <TASK>
[Fri Dec 15 06:28:30 2023]  __schedule+0x20a/0x550
[Fri Dec 15 06:28:30 2023]  schedule+0x5a/0xd0
[Fri Dec 15 06:28:30 2023]  schedule_preempt_disabled+0x11/0x20
[Fri Dec 15 06:28:30 2023]  rwsem_down_read_slowpath+0x37f/0x4f0
[Fri Dec 15 06:28:30 2023]  ? cifs_symlink+0x610/0x610 [cifs]
[Fri Dec 15 06:28:30 2023]  down_read+0x45/0xa0
[Fri Dec 15 06:28:30 2023]  iterate_supers_type+0x5e/0xf0
[Fri Dec 15 06:28:30 2023]  cifs_get_tcp_super+0x74/0xb0 [cifs]
[Fri Dec 15 06:28:30 2023]  cifs_tree_connect+0x11f/0x450 [cifs]
[Fri Dec 15 06:28:30 2023]  smb2_reconnect.part.0+0x32e/0x5f0 [cifs]
[Fri Dec 15 06:28:30 2023]  smb2_reconnect_server+0x203/0x5f0 [cifs]
[Fri Dec 15 06:28:30 2023]  ? __tdx_hypercall+0x70/0x80
[Fri Dec 15 06:28:30 2023]  process_one_work+0x1e2/0x3b0
[Fri Dec 15 06:28:30 2023]  ? rescuer_thread+0x390/0x390
[Fri Dec 15 06:28:30 2023]  worker_thread+0x50/0x3a0
[Fri Dec 15 06:28:30 2023]  ? rescuer_thread+0x390/0x390
[Fri Dec 15 06:28:30 2023]  kthread+0xdd/0x100
[Fri Dec 15 06:28:30 2023]  ? kthread_complete_and_exit+0x20/0x20
[Fri Dec 15 06:28:30 2023]  ret_from_fork+0x1f/0x30
[Fri Dec 15 06:28:30 2023]  </TASK>
[Fri Dec 15 06:28:30 2023] INFO: task mount.cifs:2119 blocked for more than 368 seconds.
[Fri Dec 15 06:28:30 2023]       Not tainted 5.14.0-330.el9.x86_64 #1
[Fri Dec 15 06:28:30 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Dec 15 06:28:30 2023] task:mount.cifs      state:D stack:0     pid:2119  ppid:2118   flags:0x00004002
[Fri Dec 15 06:28:30 2023] Call Trace:
[Fri Dec 15 06:28:30 2023]  <TASK>
[Fri Dec 15 06:28:30 2023]  __schedule+0x20a/0x550
[Fri Dec 15 06:28:30 2023]  schedule+0x5a/0xd0
[Fri Dec 15 06:28:30 2023]  schedule_preempt_disabled+0x11/0x20
[Fri Dec 15 06:28:30 2023]  __mutex_lock.constprop.0+0x433/0x6a0
[Fri Dec 15 06:28:30 2023]  ? prepare_to_wait_event+0x5d/0x180
[Fri Dec 15 06:28:30 2023]  smb2_reconnect.part.0+0x2a9/0x5f0 [cifs]
[Fri Dec 15 06:28:30 2023]  ? dequeue_task_stop+0x70/0x70
[Fri Dec 15 06:28:30 2023]  SMB2_open_init+0x58/0x700 [cifs]
[Fri Dec 15 06:28:30 2023]  smb2_compound_op+0x8c3/0x1a80 [cifs]
[Fri Dec 15 06:28:30 2023]  ? memcg_list_lru_alloc+0xa8/0x350
[Fri Dec 15 06:28:30 2023]  smb2_query_path_info+0x1cd/0x3e0 [cifs]
[Fri Dec 15 06:28:30 2023]  ? xas_load+0x5/0xa0
[Fri Dec 15 06:28:30 2023]  ? cifs_get_inode_info+0x2e4/0x8d0 [cifs]
[Fri Dec 15 06:28:30 2023]  cifs_get_inode_info+0x2e4/0x8d0 [cifs]
[Fri Dec 15 06:28:30 2023]  ? inode_doinit_with_dentry+0xf6/0x510
[Fri Dec 15 06:28:30 2023]  cifs_root_iget+0x3a8/0x500 [cifs]
[Fri Dec 15 06:28:30 2023]  cifs_read_super+0x15d/0x240 [cifs]
[Fri Dec 15 06:28:30 2023]  cifs_smb3_do_mount+0x210/0x310 [cifs]
[Fri Dec 15 06:28:30 2023]  smb3_get_tree+0x3d/0x70 [cifs]
[Fri Dec 15 06:28:30 2023]  vfs_get_tree+0x22/0xc0
[Fri Dec 15 06:28:30 2023]  do_new_mount+0x17a/0x310
[Fri Dec 15 06:28:30 2023]  __x64_sys_mount+0x107/0x140
[Fri Dec 15 06:28:30 2023]  do_syscall_64+0x59/0x90
[Fri Dec 15 06:28:30 2023]  ? do_syscall_64+0x69/0x90
[Fri Dec 15 06:28:30 2023]  ? exc_page_fault+0x62/0x150
[Fri Dec 15 06:28:30 2023]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[Fri Dec 15 06:28:30 2023] RIP: 0033:0x7effce03f7be
[Fri Dec 15 06:28:30 2023] RSP: 002b:00007ffd34a02c28 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[Fri Dec 15 06:28:30 2023] RAX: ffffffffffffffda RBX: 0000560e3925ceb0 RCX: 00007effce03f7be
[Fri Dec 15 06:28:30 2023] RDX: 0000560e385fb476 RSI: 0000560e385fb4dc RDI: 00007ffd34a03cb5
[Fri Dec 15 06:28:30 2023] RBP: 0000000000000000 R08: 0000560e3925ceb0 R09: 0000000000000000
[Fri Dec 15 06:28:30 2023] R10: 0000000000000000 R11: 0000000000000206 R12: 00007ffd34a03caa
[Fri Dec 15 06:28:30 2023] R13: 00007effce2b6000 R14: 00007effce2b890f R15: 0000560e3925dfa0
[Fri Dec 15 06:28:30 2023]  </TASK>