Bug 14449 - We should not cancel async state changing operations when a single connection is disconnected in order to have a correct replay behavior
Summary: We should not cancel async state changing operations when a single connection...
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.13.0.rc1
Hardware: All All
: P5 critical (vote)
Target Milestone: ---
Assignee: Samba QA Contact
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks: 13703 14534
  Show dependency treegraph
 
Reported: 2020-07-24 16:02 UTC by Stefan Metzmacher
Modified: 2021-03-29 20:44 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Metzmacher 2020-07-24 16:02:32 UTC
Currently we cancel any pending operation on a disconnecting connection,
even it's not the last connection (with multi-channel).

If an unfinished open waits for an oplock break, it
should be possible to resume that operation on new channel.
Comment 1 Stefan Metzmacher 2020-07-24 16:04:43 UTC
Mark as regression in order to remember it for 4.13.0 if possible
Comment 2 Stefan Metzmacher 2020-08-26 12:22:06 UTC
Replays on pending creates should return FILE_NOT_AVAILABLE?
Comment 3 Stefan Metzmacher 2021-03-12 12:44:48 UTC
I think I basically know now how the create replay detection is supposed
to work with pending opens.

I found the key hint in this presentation on page 24:
https://www.snia.org/educational-library/smb-22-bigger-faster-scalier-parts[..]

The key point is that the server should return STATUS_FILE_NOT_AVAILABLE
as long as the open is still processed and the server detects a channel
failure after the client.

The strange thing is that [MS-SMB2] doesn't document this:
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-smb2/

My tests against Windows revealed that the server code returns
NT_STATUS_ACCESS_DENIED instead of NT_STATUS_FILE_NOT_AVAILABLE.

When SMB2 leases are not used and only oplocks, then the replay is not
detected at all and I'm getting NT_STATUS_SHARING_VIOLATION after 35 delay.

However I added test code to disconnect a connection when we get a create
without replay flag, then I delay the request by 35 seconds.
During that periot I return NT_STATUS_ACCESS_DENIED to
the replay attempts from the client in order to simulate the
Windows server. The Windows client reports that ACCESS_DENIED to
the application (e.g. explorer).

I changed the server code to return NT_STATUS_FILE_NOT_AVAILABLE,
in that case the Windows client retries the operation like documented
in [MS-SMB2]:

  <152> Section 3.2.5.1: For the following error codes, Windows-based clients will retry
the operation
  up to three times and then retry the operation every 5 seconds until the count of
milliseconds
  specified by Open.ResilientTimeout is exceeded:
  
  - STATUS_SERVER_UNAVAILABLE
  
  - STATUS_FILE_NOT_AVAILABLE
  
  - STATUS_SHARE_UNAVAILABLE

After 35-40 seconds the client reports the successful retry to the application.
I tested that with "smb2 leases = yes" and "smb2 leases = no", in both cases
the client is happy.
Comment 5 Samba QA Contact 2021-03-29 20:44:03 UTC
This bug was referenced in samba master:

2c194c0bc61958b9f569b3808b45035c2de6caef
e63651cfd6d92805bd44c1245f4534bdcfdf3a7e
1714a05b992311647a51a6dda007958a7af0f043
aa5f93eb65d2b729770a23624acfb48a688e917a
ae1c3a0d9ae00471cbbc8a7787f026b87e76aa45
f5168a21abd029fd57edfd270b86512312c801b1
87b8049320c6314fab12ff14ac825101876e87d9
a19180904ea73815e994f3c720613ae0b06099c3
997e9023c0ca94d57d75cd0069f5c6ab1f81be85
01b675ab323a73fc0cf25cd0bf706dbc1dde514b
f0e553783434dccf0637e6e9e3a87890ae56286c
d4010b9abc4a303f478420de4295c3c00fbdbbf2
b448eae5e983dcf3b7a222c5fc9a73eba88d1b06