Bug 15136 - Access to cifs gets stuck for a while (>20s) after disconnecting from network
Summary: Access to cifs gets stuck for a while (>20s) after disconnecting from network
Status: NEW
Alias: None
Product: CifsVFS
Classification: Unclassified
Component: kernel fs (show other bugs)
Version: 4.x
Hardware: All Linux
: P5 normal
Target Milestone: ---
Assignee: Steve French
QA Contact: cifs QA contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-08-04 03:01 UTC by wangrong
Modified: 2022-08-05 03:14 UTC (History)
0 users

See Also:


Attachments
patch for code update (4.68 KB, patch)
2022-08-04 03:01 UTC, wangrong
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description wangrong 2022-08-04 03:01:40 UTC
Created attachment 17460 [details]
patch for code update

After the network is interrupted, a new SMB request will wait for a
fixed 10 seconds for network reconnection before sending, and return an
error if it times out.

Because 10 seconds is too long and there are operation retries, system
calls (such as stat) take a long time (greater than 20 seconds), which
makes application respond slowly.

Here, the duration of waiting for network reconnection is exposed to
the user space through the mount option, which is convenient for users
and applications to set according to the actual situation.
Comment 1 Enzo Matsumiya 2022-08-04 14:05:05 UTC
Hi, thanks for the patch. This is also a problem I've been trying to solve, but haven't came up with a good solution so far.

You set SMB_WAIT_RECONNECT_TIMEOUT_MIN to 0, but that would conflict with the socket timeout which is 7 seconds. This is also stated above the loop in smb2pdu.c:

> ...
>  189         /*
>  190          * Give demultiplex thread up to 10 seconds to each target available for
>  191          * reconnect -- should be greater than cifs socket timeout which is 7
>  192          * seconds.
>  193          */
>  194         while (server->tcpStatus == CifsNeedReconnect) {
> ...

Those 7 seconds are also hardcoded (connect.c):

> 2951         /*
> 2952          * Eventually check for other socket options to change from
> 2953          * the default. sock_setsockopt not used because it expects
> 2954          * user space buffer
> 2955          */
> 2956         socket->sk->sk_rcvtimeo = 7 * HZ;

(bear in mind that this value predates the 2.6 kernel, IOW I don't know the reasoning to choosing such value)

Maybe we could reduce the hardcoded values and/or apply such mount option to the socket instead, and, then, base the reconnect on sk_rcvtimeo? Just an idea.
Comment 2 wangrong 2022-08-05 03:14:33 UTC
Hi, Thanks for your reply and suggestion

I want to explain the design intent of wait_reconnect_timeout, it is used to set the strategy for the application to wait for reconnection.
    wait_reconnect_timeout == 0, the application does not want to wait, and returns as soon as it detects that the connection is unavailable.
    0 < wait_reconnect_timeout <= sk_rcvtimeo, the application wants to try to wait, but is not guaranteed to complete a connection attempt.
    sk_rcvtimeo < wait_reconnect_timeout, the application wants to make sure a connection attempt completes. This is the behavior of the current code, but providing the wait_reconnect_timeout mount option gives the application more choices.

Would it be better to provide both sk_rcvtimeo and wait_reconnect_timeout mount options? it avoids hardcode.