Consider the following client scenario, as coded up from smbtorture to Windows. Client Server ------------------------------------------- Mid: 1) -> open file /foo <-- Success mid #1 2) -> LockingX containing 2 lock requests 100->109, 120->129. <-- locks granted mid #2 3) -> LockingX on different context, block indefinitely for the ranges 100->109, 120->129. (Server pushes request on blocking queue). 4) -> Unlock range 100->109. <-- Success mid #4 5) -> Unlock range 120->129. <-- Success mid #5 Server can now release blocking request mid #3, which was queued, and return success for mid #3. <-- Success mid #3 Run the same test against Samba, and we fail to awaken the server on mid #5. This is because we don't correctly update the pending blocking record after mid #4 unlocks the range we're currently pending on. So when mid #5 unlocks, smbd checks the unlock request range 120->129 for overlap with the previous range 100->109, and fails to notify. Fix to follow. Jeremy.
Maybe it's time to convert the blocking byte range lock business to dbwrap_watch_send? :-) I know the answer is "show me the patch", but that's exactly what I was trying look at when those other brlock patches appeared...
(In reply to comment #1) > Maybe it's time to convert the blocking byte range lock business to > dbwrap_watch_send? :-) > > I know the answer is "show me the patch", but that's exactly what I was trying > look at when those other brlock patches appeared... I already have the patch, plus the torture test for it :-). Will send to the list shortly. Once that patch is in, if you want to rewrite it to use dbwrap_watch_send for 4.2 I'll be happy :-). But for 4.1.x and 4.0.x we'll still need this patch :-).
Created attachment 10064 [details] git-am fix for master. The attached patchset fixes immediately removes the pending lock record when re-processing each blocked lock, and then causes the functions that actually retry the blocking lock to re-add a pending lock record when they fail to get it. That's the place we really know we're going to have to continue to wait, and also have the up-to-date info on what lock range and count we're blocking on. It also adds a torture test to raw.locks that implements this exact scenario. Running it without the fix causes smbtorture to hang and time out against the server, with the fix it completes correctly. This smbtortute test has also been run against Windows Server 2012 successfully.
Comment on attachment 10064 [details] git-am fix for master. Hang on a minute, I'm withdrawing this patch for consideration. I've realized if the blocking lock times out, this patch will leave a dead pending record in the lock db. I know how to fix this, but want to do more testing before I report a fixed version of this patch. Sorry for the noise, hope to post a new version tomorrow !
Created attachment 10070 [details] git-am fix for master. Fixed version that doesn't leave pending lock records in the database if the lock request times out.
Comment on attachment 10070 [details] git-am fix for master. Arggghh. This version still has a subtle race condition. Trying again :-(.
Created attachment 10072 [details] git-am fix for master. Third time lucky. This one has no race conditions that can leave pending entries on lock timeout :-). Plus it simplifies the SMB1 blocking lock processing function considerably. Planning to go over this with Volker :-). Jeremy.
Created attachment 10076 [details] git-am fix for 4.1.next and 4.0.next. Cherry-picked from the patch that went into master, + Volker's two cleanup patches. Applies cleanly to 4.1.next and 4.0.next. Jeremy.
Comment on attachment 10076 [details] git-am fix for 4.1.next and 4.0.next. While I do not think that the whole torture test thingy is necessary, I give my blessing. Karolin, if you want the release to be as minimal as possible, you can leave out patches 5-7 of this patchset
Re-assigning to Karolin for inclusion in 4.1.next, 4.0.next. Jeremy.
(In reply to comment #10) > Re-assigning to Karolin for inclusion in 4.1.next, 4.0.next. > > Jeremy. Pushed to autobuild-v4-[0|1]-test.
(In reply to comment #11) > (In reply to comment #10) > > Re-assigning to Karolin for inclusion in 4.1.next, 4.0.next. > > > > Jeremy. > > Pushed to autobuild-v4-[0|1]-test. Pushed to both branches. Closing out bug report. Thanks!