Bug 10684 - SMB1 blocking locks can fail notification on unlock, causing client timeout.
Summary: SMB1 blocking locks can fail notification on unlock, causing client timeout.
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: unspecified
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-01 20:15 UTC by Jeremy Allison
Modified: 2014-07-17 18:39 UTC (History)
2 users (show)

See Also:


Attachments
git-am fix for master. (11.04 KB, patch)
2014-07-02 00:25 UTC, Jeremy Allison
no flags Details
git-am fix for master. (13.65 KB, patch)
2014-07-02 19:16 UTC, Jeremy Allison
no flags Details
git-am fix for master. (17.16 KB, patch)
2014-07-03 05:11 UTC, Jeremy Allison
no flags Details
git-am fix for 4.1.next and 4.0.next. (19.83 KB, patch)
2014-07-07 20:43 UTC, Jeremy Allison
vl: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy Allison 2014-07-01 20:15:36 UTC
Consider the following client scenario, as coded up from smbtorture to Windows.

Client                        Server
-------------------------------------------
Mid:

1)  -> open file /foo
                            <-- Success mid #1

2)  -> LockingX containing 2 lock requests 100->109, 120->129.

                            <-- locks granted mid #2

3)  -> LockingX on different context, block indefinitely for
       the ranges 100->109, 120->129.

                            (Server pushes request on blocking queue).


4)  -> Unlock range 100->109.

                            <-- Success mid #4

5)  -> Unlock range 120->129.

                            <-- Success mid #5

                            Server can now release blocking
                            request mid #3, which was queued,
                            and return success for mid #3.

                            <-- Success mid #3

Run the same test against Samba, and we fail to awaken the
server on mid #5. This is because we don't correctly update
the pending blocking record after mid #4 unlocks the range
we're currently pending on. So when mid #5 unlocks, smbd checks
the unlock request range 120->129 for overlap with the previous
range 100->109, and fails to notify.

Fix to follow.

Jeremy.
Comment 1 Volker Lendecke 2014-07-01 20:23:28 UTC
Maybe it's time to convert the blocking byte range lock business to dbwrap_watch_send? :-)

I know the answer is "show me the patch", but that's exactly what I was trying look at when those other brlock patches appeared...
Comment 2 Jeremy Allison 2014-07-01 20:28:09 UTC
(In reply to comment #1)
> Maybe it's time to convert the blocking byte range lock business to
> dbwrap_watch_send? :-)
> 
> I know the answer is "show me the patch", but that's exactly what I was trying
> look at when those other brlock patches appeared...

I already have the patch, plus the torture test for it :-). Will send to the list shortly.

Once that patch is in, if you want to rewrite it to use dbwrap_watch_send for 4.2 I'll be happy :-).

But for 4.1.x and 4.0.x we'll still need this patch :-).
Comment 3 Jeremy Allison 2014-07-02 00:25:04 UTC
Created attachment 10064 [details]
git-am fix for master.

The attached patchset fixes immediately
removes the pending lock record when re-processing
each blocked lock, and then causes the functions
that actually retry the blocking lock to re-add a pending
lock record when they fail to get it. That's the place
we really know we're going to have to continue to
wait, and also have the up-to-date info on what
lock range and count we're blocking on.

It also adds a torture test to raw.locks that
implements this exact scenario. Running it
without the fix causes smbtorture to hang
and time out against the server, with the
fix it completes correctly.

This smbtortute test has also been run
against Windows Server 2012 successfully.
Comment 4 Jeremy Allison 2014-07-02 05:19:36 UTC
Comment on attachment 10064 [details]
git-am fix for master.

Hang on a minute, I'm withdrawing this patch
for consideration. I've realized if the blocking
lock times out, this patch will leave a dead
pending record in the lock db.

I know how to fix this, but want to do more
testing before I report a fixed version of
this patch.

Sorry for the noise, hope to post a new
version tomorrow !
Comment 5 Jeremy Allison 2014-07-02 19:16:11 UTC
Created attachment 10070 [details]
git-am fix for master.

Fixed version that doesn't leave pending lock records in the database if the lock request times out.
Comment 6 Jeremy Allison 2014-07-02 22:45:17 UTC
Comment on attachment 10070 [details]
git-am fix for master.

Arggghh. This version still has a subtle race condition.

Trying again :-(.
Comment 7 Jeremy Allison 2014-07-03 05:11:35 UTC
Created attachment 10072 [details]
git-am fix for master.

Third time lucky. This one has no race conditions that can leave pending entries on lock timeout :-). Plus it simplifies the SMB1 blocking lock processing function considerably.

Planning to go over this with Volker :-).

Jeremy.
Comment 8 Jeremy Allison 2014-07-07 20:43:15 UTC
Created attachment 10076 [details]
git-am fix for 4.1.next and 4.0.next.

Cherry-picked from the patch that went into master, + Volker's two cleanup patches.

Applies cleanly to 4.1.next and 4.0.next.

Jeremy.
Comment 9 Volker Lendecke 2014-07-08 10:06:24 UTC
Comment on attachment 10076 [details]
git-am fix for 4.1.next and 4.0.next.

While I do not think that the whole torture test thingy is necessary, I give my blessing. Karolin, if you want the release to be as minimal as possible, you can leave out patches 5-7 of this patchset
Comment 10 Jeremy Allison 2014-07-08 16:57:37 UTC
Re-assigning to Karolin for inclusion in 4.1.next, 4.0.next.

Jeremy.
Comment 11 Karolin Seeger 2014-07-13 18:08:06 UTC
(In reply to comment #10)
> Re-assigning to Karolin for inclusion in 4.1.next, 4.0.next.
> 
> Jeremy.

Pushed to autobuild-v4-[0|1]-test.
Comment 12 Karolin Seeger 2014-07-17 18:39:16 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > Re-assigning to Karolin for inclusion in 4.1.next, 4.0.next.
> > 
> > Jeremy.
> 
> Pushed to autobuild-v4-[0|1]-test.

Pushed to both branches.
Closing out bug report.

Thanks!