Bug 8407 - SMB2 server can return requests out-of-order when processing a compound request.
Summary: SMB2 server can return requests out-of-order when processing a compound request.
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.6
Classification: Unclassified
Component: SMB2 (show other bugs)
Version: 3.6.0
Hardware: All All
: P5 regression
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks: 8399 8429
  Show dependency treegraph
 
Reported: 2011-08-25 18:59 UTC by Jeremy Allison
Modified: 2011-09-15 18:27 UTC (History)
1 user (show)

See Also:


Attachments
Patch for master. (6.78 KB, patch)
2011-08-25 20:58 UTC, Jeremy Allison
metze: review-
Details
Untested patch for master (3.78 KB, patch)
2011-08-25 21:37 UTC, Stefan Metzmacher
no flags Details
git-am patch for master (11.62 KB, patch)
2011-08-25 22:32 UTC, Jeremy Allison
no flags Details
git-am fix for 3.6.1 (16.17 KB, patch)
2011-08-25 22:34 UTC, Jeremy Allison
no flags Details
git-am fix for master based on Metze's patch (5.32 KB, patch)
2011-08-26 00:56 UTC, Jeremy Allison
no flags Details
git-am fix for 3.6.1 based on Metze's patch (9.87 KB, patch)
2011-08-26 01:00 UTC, Jeremy Allison
no flags Details
git-am fix for master based on Metze's patch (5.34 KB, patch)
2011-08-26 21:24 UTC, Jeremy Allison
no flags Details
git-am fix for 3.6.1 based on Metze's patch (9.89 KB, patch)
2011-08-26 21:28 UTC, Jeremy Allison
metze: review-
Details
Patch for v3-6-test (9.89 KB, patch)
2011-09-03 13:06 UTC, Stefan Metzmacher
jra: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy Allison 2011-08-25 18:59:03 UTC
Consider the following packet stream:

A Compound consisting of:

SMB2_CREATE on directory "foo"; SMB2_FIND on handle returned by SMB2_CREATE

followed by a new request of :

SMB2_CREATE file "foo\bar"

In our current server code, we can get the situation where the compound is processed, including the SMB2_FIND on directory "foo" (which won't return the new file "bar" as it hasn't been created yet). The reply from the compound is created, but not yet enqueued on the outgoing tstream queue. Then the SMB2_CREATE of "foo\bar" is dispatched and completes. The outgoing packet stream now looks like:

Reply to SMB2_CREATE of "foo\bar"
Reply to SMB2_CREATE on directory "foo"; Reply of SMB2_FIND of directory "foo" **** NOT CONTAINING A LISTING FOR FILE "bar" !

This causes the client directory cache to become inconsistent and applications to error out.

Patch stream to fix this will follow.

Jeremy.
Comment 1 Stefan Metzmacher 2011-08-25 20:51:26 UTC
22:50 < metze> wanon: I think 42cde0480bd6a5e2dddaa66917e1fa71e6a4edcd should make it easier to fix it
22:51 < metze> wanon: I think smbd_smb2_request_next_incoming() could check if we're currently processing an 
               compound request
Comment 2 Jeremy Allison 2011-08-25 20:58:59 UTC
Created attachment 6807 [details]
Patch for master.

As soon as I get confirmation it works, this is what I'm going to push.

I have a cleanup patch to follow also.

Jeremy.
Comment 3 Stefan Metzmacher 2011-08-25 21:17:39 UTC
Comment on attachment 6807 [details]
Patch for master.

I think this patch is too complex.

I think we should avoid reading more requests from the socket and fill the queue, when we're processing a compound request.

We may also handle related compound requests different from unrelated compound requests.
Comment 4 Jeremy Allison 2011-08-25 21:26:29 UTC
I disagree (obviously :-). This patch allows us to continue queuing incoming requests whilst processing our existing request stream. It took me a *long* time to get this logic right, and it's robust against event library changes. I'm wary of adding a simple patch which depends on event library semantics - at least I know this is safe against event reentrancy.

This doesn't prevent us handling different types of compound requests differently, it just doesn't do so in this patch.

Jeremy.
Comment 5 Jeremy Allison 2011-08-25 21:30:39 UTC
My cleanup patch (coming soon :-) removes the "dispatched" boolean flag from the request struct and splits the requests into 2 queues - one a "pending queue" where the reader code diposits the requests from the wire, and the current queue which only holds dispatched requests.

This is cleaner, as it separates the requests out cleanly.

Jeremy.
Comment 6 Stefan Metzmacher 2011-08-25 21:37:16 UTC
Created attachment 6808 [details]
Untested patch for master

I was thinking about something like this patches, not tested nor compiled yet...
Comment 7 Jeremy Allison 2011-08-25 22:03:38 UTC
The testing is the hard part :-). It's timing related and only has been reproduced on a heavily loaded system.

I think we're doing the same thing here (or at least very similar :-), but I prefer my approach as it gives less disruption to our current codepaths. I *want* us to keep reading asynchronously whilst we're processing - the more parallelism we get the faster we'll end up going (IMHO).

Jeremy.
Comment 8 Stefan Metzmacher 2011-08-25 22:09:57 UTC
At least smbd_smb2_request_next_incoming needs to check the client does not fill
the queue too much.
Comment 9 Jeremy Allison 2011-08-25 22:21:36 UTC
That's what the credits are for :-). We can use the crediting to tune how many outstanding requests we'll allow on this queue. Remember, we're only reading from the tstream queue and putting on the pending queue whilst we're processing a compound. That is a transient state so shouldn't allow the queue to fill up too quickly.

Jeremy.
Comment 10 Jeremy Allison 2011-08-25 22:32:09 UTC
Created attachment 6809 [details]
git-am patch for master

This version includes the tidyup fix that separates the queues into "pending_requests" and "requests".

Jeremy.
Comment 11 Jeremy Allison 2011-08-25 22:34:20 UTC
Created attachment 6810 [details]
git-am fix for 3.6.1

Here is the version for 3.6.1. It contains metze's 42cde0480bd6a5e2dddaa66917e1fa71e6a4edcd fix that introduces smbd_smb2_request_next_incoming() followed by the other two patches I'm proposing for master.

Jeremy.
Comment 12 Jeremy Allison 2011-08-26 00:55:28 UTC
Ok, I have confirmation that my patch does fix the test case.

However (and I *hate* to say this :-).

Metze is probably right..

I'm going to attach a modified version of metze's "untested" patch, and the same version for 3.6.1. In tests here it works and does not exhibit the problem behavior.

I've asked Ira to test in his production environment, and if it passed we should use metze's version instead (grrrr :-). It *is* simpler.

Jeremy.
Comment 13 Jeremy Allison 2011-08-26 00:56:47 UTC
Created attachment 6811 [details]
git-am fix for master based on Metze's patch

I'm going to leave the other attachments as active until I get confirmation that this fixes the bug, then recommend we use this instead.

Jeremy.
Comment 14 Jeremy Allison 2011-08-26 01:00:22 UTC
Created attachment 6812 [details]
git-am fix for 3.6.1 based on Metze's patch

git-am fix for 3.6.1 based on Metze's patch

I'm going to leave the other attachments as active until I get confirmation
that this fixes the bug, then recommend we use this instead.

Jeremy
Comment 15 Stefan Metzmacher 2011-08-26 18:06:01 UTC
(In reply to comment #13)
> Created attachment 6811 [details]
> git-am fix for master based on Metze's patch
> 
> I'm going to leave the other attachments as active until I get confirmation
> that this fixes the bug, then recommend we use this instead.
> 
> Jeremy.

Thanks for fixing my code. Can you change comments to match
the style proposed in README.Coding?

If this fix the problem I think we should take this patches.

We can always improve things later, first we need to fix the bug.
Comment 16 Jeremy Allison 2011-08-26 21:24:32 UTC
Created attachment 6814 [details]
git-am fix for master based on Metze's patch

Updated to fix comments.
Comment 17 Jeremy Allison 2011-08-26 21:28:31 UTC
Created attachment 6815 [details]
git-am fix for 3.6.1 based on Metze's patch

Updated with comments fix.
Comment 18 Linda Walsh 2011-08-26 23:39:22 UTC
This may be unrelated, BUT, there's a similar behavior going on
w/regard to ....file creates and future attempts to rename or delete
that file...even though the permissions appear to be correct -- the
file "can't be renamed or deleted (immediately)"....

It' is like a rename/delete op comes in before a lock on the file
that was initially created, is released.   

Bugs 8388, 8379 seem to have some root cause related to this.

I saw it but due to probs I had /w my winbind and account, I thought it
was some flakey permission prob ... but others are reporting this .. inability to save files in office, some browsers, and I noticed it in photoshop....

happens when windows first 'creates' the empty file name, (verifying access
as well as zeroing out or deleting any previous file/w same name), then writes
the new file to a .tmp file and at the end of it all, tries to do a rename
(which fails), -- for me, this left (either, not sure when it does which) a 0len file of the new name, or only the old-file & non-zero len, AND a full-length, saved version of my file named, '~pxxxx.tmp', in my Samba recycling bindir under the appropriate hierarchical location.

At other times, I saw the permission problem.   I don't know if everyone is running into the tdb-backend-'corruption' (caused by removal of params to allow different ranges for UID/GID's in backend)  problems I ran into ( I don't think so -- some report successfully going back to 3.5.10, but maybe there are other reasons their worked) and thus it's all a permission problem, OR if it has something to do with SMB2 caching multiple handles to the same file on the client for 'effficiency'...    Maybe it opens a file r/w and gets lock, but
then releases write lock and maintains a read-avisory type lock, but samba, perhaps isn't getting that state change?
------ 
Sorry to horn in here, but seemed like this could be a timing issue causing
a request to be lost somewhere...

If it is, will fixing this bug fix those bugs or is further work needed?

-l
Comment 19 Jeremy Allison 2011-08-31 20:13:21 UTC
Ok, got confirmation that the 3.6.1 fix is good from the original tester.

The master patch has been applied.

Metze, can you please give a +1 on the outstanding patch review and I can then re-assign to Karolin for inclusion in 3.6.1.

Jeremy.
Comment 20 Stefan Metzmacher 2011-09-03 13:00:14 UTC
Comment on attachment 6815 [details]
git-am fix for 3.6.1 based on Metze's patch

Looks good, but the "cherry picked from commit ..." information is wrong!
Comment 21 Stefan Metzmacher 2011-09-03 13:04:19 UTC
Comment on attachment 6815 [details]
git-am fix for 3.6.1 based on Metze's patch

Jeremy, please
Comment 22 Stefan Metzmacher 2011-09-03 13:06:50 UTC
Created attachment 6855 [details]
Patch for v3-6-test

Updated patches, with the correct git hashes in the commit messages.

Jeremy, please only use git cherry-pick -x for commits from official branches.
Your commits from master already point to invalid (private) git hashes...
Comment 23 Stefan Metzmacher 2011-09-12 06:23:12 UTC
(In reply to comment #22)
> Created attachment 6855 [details]
> Patch for v3-6-test
> 
> Updated patches, with the correct git hashes in the commit messages.
> 
> Jeremy, please only use git cherry-pick -x for commits from official branches.
> Your commits from master already point to invalid (private) git hashes...

Jeremy, can you add your review flag?
Comment 24 Jeremy Allison 2011-09-12 16:48:00 UTC
Comment on attachment 6855 [details]
Patch for v3-6-test

LGTM for 3.6.1.
Comment 25 Jeremy Allison 2011-09-12 17:56:38 UTC
Re-assigning to Karolin for inclusion in 3.6.1
Jeremy.
Comment 26 Karolin Seeger 2011-09-15 18:27:41 UTC
Pushed to v3-6-test.
Closing out bug report.

Thanks!