Created attachment 18224 [details] Wireshark screenshot We were doing some stress testing, where SMB server was configured as follows: - artificial latency while processing SMB session setup, negotiate or treeconnect set to 150ms - artificial latency while processing each SMB request set to 300ms - random probability of disconnecting the connection after responding to SMB request set to 0.1% pct, aka on average 1 out of 1000 SMB packets would lead to a disconnect after sending a response Then we spawn roughly 128 workers where each one reads a random 20MB files stored on the server and then picks a different one and so on. We mainly used 5.14.0-330 and 5.14.0-391 kernels from Centos Stream 9 and we suspect there's a similar issue in both of them, but it manifests slightly differently. The same issue as in 391 also occurs in 6.6.9 from elrepo (6.6.9-1.el9.elrepo.x86_64). On 5.14.0-330 that does not contain https://github.com/torvalds/linux/commit/326a8d04f147e2bf393f6f9cdb74126ee6900607 credits sometimes go negative in the DebugDump and eventually the client gets to a state where only KeepAlives can be observed in Wireshark. Negotiate command seems stuck waiting for credits. On both 5.14.0-391 and 6.6.9 we can see the client **constantly** reconnecting. Every second or so, the "Instance" field increases by roughly 5. Both the load tool and cifsiod processes seem stuck in D state. Wireshark shows the following pattern: - SMB2 negotiate request (10 credits requested) - SMB2 negotiate response (contains 10 granted credits) - FIN sent to the server Some fields in DebugDump oscilates slightly, highligted the oscilating fields. Credits seem permantently stuck on 0, 0, 0. Display Internal CIFS Data Structures for Debugging --------------------------------------------------- CIFS Version 2.45 Features: DFS,SMB_DIRECT,STATS,DEBUG,ALLOW_INSECURE_LEGACY,CIFS_POSIX,UPCALL(SPNEGO),XATTR,ACL,WITNESS CIFSMaxBufSize: 16384 Active VFS Requests: 0 Servers: 1) ConnectionId: 0x1 Hostname: gs63-10.gs63-10-ct1.dev.purestorage.com ClientGUID: 012EDED1-49D5-D347-B4B4-92563CA1FB2E Number of credits: 0,0,0 Dialect 0x311 TCP status: 5 (occasionally blinks to 4) Instance: 2740429 (keeps increasing) Local Users To Server: 2 SecMode: 0x1 Req On Wire: 0 (blinks to -1 occasionally, immediately goes back to 0) Net namespace: 4026531840 In Send: 0 In MaxReq Wait: 69 Sessions: 1) Address: 10.14.63.102 Uses: 2 Capability: 0x300067 Session Status: 3 Guest Security type: RawNTLMSSP SessionId: 0x2d2a9b180000086c User: 0 Cred User: 0 Primary channel: DISCONNECTED Shares: 0) IPC: \\10.14.63.102\IPC$ Mounts: 1 DevInfo: 0x0 Attributes: 0x0 PathComponentMax: 0 Status: 3 type: 0 Serial Number: 0x0 Share Capabilities: None Share Flags: 0x30 tid: 0x3 Maximal Access: 0x1f01ff DISCONNECTED 1) \\10.14.63.102\fs1 Mounts: 2 (oscilates between 1 and 2) DevInfo: 0x50 Attributes: 0xc0008f PathComponentMax: 1040 Status: 3 type: DISK Serial Number: 0xccb9f39e Share Capabilities: None Aligned, Partition Aligned, Share Flags: 0x30 tid: 0x2 Optimal sector size: 0x1000 Maximal Access: 0x1f01ff DISCONNECTED Server interfaces: 3 Last updated: 4628375 seconds ago 1) Speed: 10Gbps Capabilities: None IPv4: 192.168.8.24 Weight (cur,total): (0,10) Allocated channels: 0 2) Speed: 10Gbps Capabilities: None IPv4: 10.14.63.117 Weight (cur,total): (0,10) Allocated channels: 0 3) Speed: 1Gbps Capabilities: None IPv4: 10.14.63.102 Weight (cur,total): (0,1) Allocated channels: 0 MIDs: -- this seems interesting/important. There is this "long standing one" 00000000aec8f49d that is there most of the time when doing watch -n 0.2 cat /proc/fs/cifs/DebugData, and it sometimes oscilates to a different one (a different random "cbdata", each time it changes to something else), and then switches back to this same one (00000000aec8f49d). Server ConnectionId: 0x1 State: 2 com: 0 pid: 110433 cbdata: 00000000aec8f49d mid 0 Example oscilation (aec cbdata disappears at that time): Server ConnectionId: 0x1 State: 2 com: 0 pid: 6040 cbdata: 000000009d66f66f mid 0 -- Witness registrations: Stacktrace of 6040 (not at the time of blink), other processes are in similar paths: [Tue Jan 9 05:56:04 2024] task:kworker/10:138 state:D stack:0 pid:6040 ppid:2 flags:0x00004000 [Tue Jan 9 05:56:04 2024] Workqueue: cifsiod cifs_uncached_readv_complete [cifs] [Tue Jan 9 05:56:04 2024] Call Trace: [Tue Jan 9 05:56:04 2024] <TASK> [Tue Jan 9 05:56:04 2024] __schedule+0x21c/0x680 [Tue Jan 9 05:56:04 2024] schedule+0x5d/0xd0 [Tue Jan 9 05:56:04 2024] smb2_wait_mtu_credits+0xde/0x2d0 [cifs] [Tue Jan 9 05:56:04 2024] ? __pfx_autoremove_wake_function+0x10/0x10 [Tue Jan 9 05:56:04 2024] cifs_resend_rdata.constprop.0+0x81/0x270 [cifs] [Tue Jan 9 05:56:04 2024] collect_uncached_read_data+0x2d5/0x300 [cifs] [Tue Jan 9 05:56:04 2024] cifs_uncached_readv_complete+0x20/0x60 [cifs] [Tue Jan 9 05:56:04 2024] process_one_work+0x174/0x340 [Tue Jan 9 05:56:04 2024] worker_thread+0x27e/0x390 [Tue Jan 9 05:56:04 2024] ? __pfx_worker_thread+0x10/0x10 [Tue Jan 9 05:56:04 2024] kthread+0xee/0x120 [Tue Jan 9 05:56:04 2024] ? __pfx_kthread+0x10/0x10 [Tue Jan 9 05:56:04 2024] ret_from_fork+0x2d/0x50 [Tue Jan 9 05:56:04 2024] ? __pfx_kthread+0x10/0x10 [Tue Jan 9 05:56:04 2024] ret_from_fork_asm+0x1b/0x30 [Tue Jan 9 05:56:04 2024] </TASK> Some other process: [Tue Jan 9 05:56:04 2024] task:kworker/8:68 state:D stack:0 pid:8673 ppid:2 flags:0x00004000 [Tue Jan 9 05:56:04 2024] Workqueue: cifsiod cifs_uncached_readv_complete [cifs] [Tue Jan 9 05:56:04 2024] Call Trace: [Tue Jan 9 05:56:04 2024] <TASK> [Tue Jan 9 05:56:04 2024] __schedule+0x21c/0x680 [Tue Jan 9 05:56:04 2024] schedule+0x5d/0xd0 [Tue Jan 9 05:56:04 2024] schedule_timeout+0x88/0x160 [Tue Jan 9 05:56:04 2024] ? __pfx_process_timeout+0x10/0x10 [Tue Jan 9 05:56:04 2024] wait_for_free_credits+0x2f1/0x840 [cifs] [Tue Jan 9 05:56:04 2024] ? __pfx_autoremove_wake_function+0x10/0x10 [Tue Jan 9 05:56:04 2024] ? __pfx_smb2_readv_callback+0x10/0x10 [cifs] [Tue Jan 9 05:56:04 2024] cifs_call_async+0x260/0x320 [cifs] [Tue Jan 9 05:56:04 2024] ? __pfx_cifs_readv_receive+0x10/0x10 [cifs] [Tue Jan 9 05:56:04 2024] ? __pfx_smb3_handle_read_data+0x10/0x10 [cifs] [Tue Jan 9 05:56:04 2024] smb2_async_readv+0x1c5/0x360 [cifs] [Tue Jan 9 05:56:04 2024] cifs_resend_rdata.constprop.0+0x17b/0x270 [cifs] [Tue Jan 9 05:56:04 2024] collect_uncached_read_data+0x2d5/0x300 [cifs] [Tue Jan 9 05:56:04 2024] cifs_uncached_readv_complete+0x20/0x60 [cifs] [Tue Jan 9 05:56:04 2024] process_one_work+0x174/0x340 [Tue Jan 9 05:56:04 2024] worker_thread+0x27e/0x390 [Tue Jan 9 05:56:04 2024] ? __pfx_worker_thread+0x10/0x10 [Tue Jan 9 05:56:04 2024] kthread+0xee/0x120 [Tue Jan 9 05:56:04 2024] ? __pfx_kthread+0x10/0x10 [Tue Jan 9 05:56:04 2024] ret_from_fork+0x2d/0x50 [Tue Jan 9 05:56:04 2024] ? __pfx_kthread+0x10/0x10 [Tue Jan 9 05:56:04 2024] ret_from_fork_asm+0x1b/0x30 [Tue Jan 9 05:56:04 2024] </TASK> Stacktrace of 110433: [Tue Jan 9 05:56:04 2024] task:kworker/9:1 state:D stack:0 pid:110433 ppid:2 flags:0x00004000 [Tue Jan 9 05:56:04 2024] Workqueue: cifsiod smb2_reconnect_server [cifs] [Tue Jan 9 05:56:04 2024] Call Trace: [Tue Jan 9 05:56:04 2024] <TASK> [Tue Jan 9 05:56:04 2024] __schedule+0x21c/0x680 [Tue Jan 9 05:56:04 2024] schedule+0x5d/0xd0 [Tue Jan 9 05:56:04 2024] wait_for_response+0x6b/0xb0 [cifs] [Tue Jan 9 05:56:04 2024] ? __pfx_autoremove_wake_function+0x10/0x10 [Tue Jan 9 05:56:04 2024] compound_send_recv+0x415/0xa80 [cifs] [Tue Jan 9 05:56:04 2024] ? cifs_strtoUTF16+0xd2/0x130 [cifs] [Tue Jan 9 05:56:04 2024] ? preempt_count_add+0x70/0xa0 [Tue Jan 9 05:56:04 2024] cifs_send_recv+0x1f/0x30 [cifs] [Tue Jan 9 05:56:04 2024] SMB2_negotiate+0x253/0xab0 [cifs] [Tue Jan 9 05:56:04 2024] ? __try_to_del_timer_sync+0x62/0xa0 [Tue Jan 9 05:56:04 2024] ? smb2_negotiate+0x44/0x60 [cifs] [Tue Jan 9 05:56:04 2024] ? _raw_spin_unlock+0x15/0x30 [Tue Jan 9 05:56:04 2024] smb2_negotiate+0x44/0x60 [cifs] [Tue Jan 9 05:56:04 2024] cifs_negotiate_protocol+0x93/0x110 [cifs] [Tue Jan 9 05:56:04 2024] smb2_reconnect.part.0+0x237/0x4a0 [cifs] [Tue Jan 9 05:56:04 2024] smb2_reconnect_server+0x22a/0x650 [cifs] [Tue Jan 9 05:56:04 2024] process_one_work+0x174/0x340 [Tue Jan 9 05:56:04 2024] worker_thread+0x27e/0x390 [Tue Jan 9 05:56:04 2024] ? __pfx_worker_thread+0x10/0x10 [Tue Jan 9 05:56:04 2024] kthread+0xee/0x120 [Tue Jan 9 05:56:04 2024] ? __pfx_kthread+0x10/0x10 [Tue Jan 9 05:56:04 2024] ret_from_fork+0x2d/0x50 [Tue Jan 9 05:56:04 2024] ? __pfx_kthread+0x10/0x10 [Tue Jan 9 05:56:04 2024] ret_from_fork_asm+0x1b/0x30
The "long-standing cbdata" also seems to change. Don't know how often, but now I see there most of the time: Server ConnectionId: 0x1 State: 2 com: 0 pid: 109094 cbdata: 000000008ea7071c mid 0 instead of 00000000aec8f49d.