Bug 15649 - Durable handle is not granted when a previous OPEN exists with NoOplock
Summary: Durable handle is not granted when a previous OPEN exists with NoOplock
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.20.1
Hardware: All All
: P5 major (vote)
Target Milestone: ---
Assignee: Samba QA Contact
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-05-28 13:32 UTC by Sandeep Kanyal
Modified: 2024-05-29 03:27 UTC (History)
3 users (show)

See Also:


Attachments
Contains packet traces & a synthetic client to repro the issue (237.00 KB, application/x-gzip)
2024-05-28 13:32 UTC, Sandeep Kanyal
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sandeep Kanyal 2024-05-28 13:32:51 UTC
Created attachment 18302 [details]
Contains packet traces & a synthetic client to repro the issue

Found a bug in DH2Q handling. Scenario as below.

1. Client Sends an open request with Oplock: No oplock (0x00)
2. Open Succeeds.
3. Client sends another open request for same file but this time with Oplock: Lease (0xff) and also requests for a durable handle.
4. Open succeeds but Durable handle is not granted.

This causes the application to fail in case there is a temporary network outage as the client was not granted a durable handle. Tried the same scenario against windows server & the Durable handle is granted. Attached the packet traces for both windows server & samba server.

I have tried a potential fix. Post that durable handle is granted for above scenario. In the below code, ideally for the second OPEN req we should be going to the below block and set state->got_oplock = true. But instead we are comparing e->op_type != LEASE_OPLOCK and this condition is not satisfied as fist open doesn't have any oplocks. By putting (e->op_type != LEASE_OPLOCK && e->op_type != NO_OPLOCK), issue seems to be fixed.


static bool delay_for_oplock_fn(
struct share_mode_entry *e,
bool *modified,
void *private_data)
{
....

 if (!state->got_oplock &&
   (e->op_type != LEASE_OPLOCK) &&
   !share_entry_stale_pid(e)) {
state->got_oplock = true;
}

....
}

If this fix is sufficient, then i can create a PR.
Comment 1 Stefan Metzmacher 2024-05-28 13:42:33 UTC
(In reply to Sandeep Kanyal from comment #0)

I fear it's more complex than that, because the share mode entries need to be sorted on the durable reconnect otherwise the binary search used to find them
no longer works.

https://bugzilla.samba.org/show_bug.cgi?id=15624
https://gitlab.com/samba-team/samba/-/merge_requests/3583 is also
related to durable handle problems...
Comment 2 Sandeep Kanyal 2024-05-28 14:09:38 UTC
(In reply to Stefan Metzmacher from comment #1)
Thanks Stefan. How easy would it be to fix it ? My application is failing coz of this bug :(
Comment 3 Sandeep Kanyal 2024-05-28 14:16:29 UTC
(In reply to Sandeep Kanyal from comment #2)
Stefan, one question. I was debugging more for a 2nd bug (ill file that shortly) and i saw that we keep the entries in sorted order. But i vaguely remember that
we also free the entries in case it doesn't have oplock. So in this case, during
disconnect, wont the 1st open be removed from the smbXsrv_open_global.tdb as it doesn't have any oplocks ? (if i'm not wrong that the file where we keep the open files info)
Comment 4 Ralph Böhme 2024-05-28 15:03:50 UTC
(In reply to Sandeep Kanyal from comment #0)
The problem is that we should ignore the first open which is a stat open and don't grant the Durable Handle on the second open plus RH lease instead of just R lease.
Comment 5 Sandeep Kanyal 2024-05-28 15:44:18 UTC
(In reply to Ralph Böhme from comment #4)
I see. But then the application would always fail in case of disconnection. Actually i'm using a Microsoft application to write to samba share & it fails in case of network disruption (The server & client are cross geo). The problem is that it writes for 10+ hours (sometime 16+ hours) and then fails (Terabytes of data). And subsequent retries also fail as network is slow & patchy. If we can somehow support the durable handle for this scenario, that would be really great.
Comment 6 Ralph Böhme 2024-05-28 20:12:49 UTC
(In reply to Sandeep Kanyal from comment #5)
I guess you misunderstood what I was trying to say, let me try to rephrase my statement:

We likely should grant a DH for the second open and also grant a RH lease as requested by the client. The reason why the current code doesn't (wrongly) do this, is the first stat open.
Comment 7 Sandeep Kanyal 2024-05-29 03:27:26 UTC
(In reply to Ralph Böhme from comment #6)

Perfect. Yea the POC fix that i did, worked. It granted a DH & a RWH lease and application was happy