Consider the following packet stream: A Compound consisting of: SMB2_CREATE on directory "foo"; SMB2_FIND on handle returned by SMB2_CREATE followed by a new request of : SMB2_CREATE file "foo\bar" In our current server code, we can get the situation where the compound is processed, including the SMB2_FIND on directory "foo" (which won't return the new file "bar" as it hasn't been created yet). The reply from the compound is created, but not yet enqueued on the outgoing tstream queue. Then the SMB2_CREATE of "foo\bar" is dispatched and completes. The outgoing packet stream now looks like: Reply to SMB2_CREATE of "foo\bar" Reply to SMB2_CREATE on directory "foo"; Reply of SMB2_FIND of directory "foo" **** NOT CONTAINING A LISTING FOR FILE "bar" ! This causes the client directory cache to become inconsistent and applications to error out. Patch stream to fix this will follow. Jeremy.
22:50 < metze> wanon: I think 42cde0480bd6a5e2dddaa66917e1fa71e6a4edcd should make it easier to fix it 22:51 < metze> wanon: I think smbd_smb2_request_next_incoming() could check if we're currently processing an compound request
Created attachment 6807 [details] Patch for master. As soon as I get confirmation it works, this is what I'm going to push. I have a cleanup patch to follow also. Jeremy.
Comment on attachment 6807 [details] Patch for master. I think this patch is too complex. I think we should avoid reading more requests from the socket and fill the queue, when we're processing a compound request. We may also handle related compound requests different from unrelated compound requests.
I disagree (obviously :-). This patch allows us to continue queuing incoming requests whilst processing our existing request stream. It took me a *long* time to get this logic right, and it's robust against event library changes. I'm wary of adding a simple patch which depends on event library semantics - at least I know this is safe against event reentrancy. This doesn't prevent us handling different types of compound requests differently, it just doesn't do so in this patch. Jeremy.
My cleanup patch (coming soon :-) removes the "dispatched" boolean flag from the request struct and splits the requests into 2 queues - one a "pending queue" where the reader code diposits the requests from the wire, and the current queue which only holds dispatched requests. This is cleaner, as it separates the requests out cleanly. Jeremy.
Created attachment 6808 [details] Untested patch for master I was thinking about something like this patches, not tested nor compiled yet...
The testing is the hard part :-). It's timing related and only has been reproduced on a heavily loaded system. I think we're doing the same thing here (or at least very similar :-), but I prefer my approach as it gives less disruption to our current codepaths. I *want* us to keep reading asynchronously whilst we're processing - the more parallelism we get the faster we'll end up going (IMHO). Jeremy.
At least smbd_smb2_request_next_incoming needs to check the client does not fill the queue too much.
That's what the credits are for :-). We can use the crediting to tune how many outstanding requests we'll allow on this queue. Remember, we're only reading from the tstream queue and putting on the pending queue whilst we're processing a compound. That is a transient state so shouldn't allow the queue to fill up too quickly. Jeremy.
Created attachment 6809 [details] git-am patch for master This version includes the tidyup fix that separates the queues into "pending_requests" and "requests". Jeremy.
Created attachment 6810 [details] git-am fix for 3.6.1 Here is the version for 3.6.1. It contains metze's 42cde0480bd6a5e2dddaa66917e1fa71e6a4edcd fix that introduces smbd_smb2_request_next_incoming() followed by the other two patches I'm proposing for master. Jeremy.
Ok, I have confirmation that my patch does fix the test case. However (and I *hate* to say this :-). Metze is probably right.. I'm going to attach a modified version of metze's "untested" patch, and the same version for 3.6.1. In tests here it works and does not exhibit the problem behavior. I've asked Ira to test in his production environment, and if it passed we should use metze's version instead (grrrr :-). It *is* simpler. Jeremy.
Created attachment 6811 [details] git-am fix for master based on Metze's patch I'm going to leave the other attachments as active until I get confirmation that this fixes the bug, then recommend we use this instead. Jeremy.
Created attachment 6812 [details] git-am fix for 3.6.1 based on Metze's patch git-am fix for 3.6.1 based on Metze's patch I'm going to leave the other attachments as active until I get confirmation that this fixes the bug, then recommend we use this instead. Jeremy
(In reply to comment #13) > Created attachment 6811 [details] > git-am fix for master based on Metze's patch > > I'm going to leave the other attachments as active until I get confirmation > that this fixes the bug, then recommend we use this instead. > > Jeremy. Thanks for fixing my code. Can you change comments to match the style proposed in README.Coding? If this fix the problem I think we should take this patches. We can always improve things later, first we need to fix the bug.
Created attachment 6814 [details] git-am fix for master based on Metze's patch Updated to fix comments.
Created attachment 6815 [details] git-am fix for 3.6.1 based on Metze's patch Updated with comments fix.
This may be unrelated, BUT, there's a similar behavior going on w/regard to ....file creates and future attempts to rename or delete that file...even though the permissions appear to be correct -- the file "can't be renamed or deleted (immediately)".... It' is like a rename/delete op comes in before a lock on the file that was initially created, is released. Bugs 8388, 8379 seem to have some root cause related to this. I saw it but due to probs I had /w my winbind and account, I thought it was some flakey permission prob ... but others are reporting this .. inability to save files in office, some browsers, and I noticed it in photoshop.... happens when windows first 'creates' the empty file name, (verifying access as well as zeroing out or deleting any previous file/w same name), then writes the new file to a .tmp file and at the end of it all, tries to do a rename (which fails), -- for me, this left (either, not sure when it does which) a 0len file of the new name, or only the old-file & non-zero len, AND a full-length, saved version of my file named, '~pxxxx.tmp', in my Samba recycling bindir under the appropriate hierarchical location. At other times, I saw the permission problem. I don't know if everyone is running into the tdb-backend-'corruption' (caused by removal of params to allow different ranges for UID/GID's in backend) problems I ran into ( I don't think so -- some report successfully going back to 3.5.10, but maybe there are other reasons their worked) and thus it's all a permission problem, OR if it has something to do with SMB2 caching multiple handles to the same file on the client for 'effficiency'... Maybe it opens a file r/w and gets lock, but then releases write lock and maintains a read-avisory type lock, but samba, perhaps isn't getting that state change? ------ Sorry to horn in here, but seemed like this could be a timing issue causing a request to be lost somewhere... If it is, will fixing this bug fix those bugs or is further work needed? -l
Ok, got confirmation that the 3.6.1 fix is good from the original tester. The master patch has been applied. Metze, can you please give a +1 on the outstanding patch review and I can then re-assign to Karolin for inclusion in 3.6.1. Jeremy.
Comment on attachment 6815 [details] git-am fix for 3.6.1 based on Metze's patch Looks good, but the "cherry picked from commit ..." information is wrong!
Comment on attachment 6815 [details] git-am fix for 3.6.1 based on Metze's patch Jeremy, please
Created attachment 6855 [details] Patch for v3-6-test Updated patches, with the correct git hashes in the commit messages. Jeremy, please only use git cherry-pick -x for commits from official branches. Your commits from master already point to invalid (private) git hashes...
(In reply to comment #22) > Created attachment 6855 [details] > Patch for v3-6-test > > Updated patches, with the correct git hashes in the commit messages. > > Jeremy, please only use git cherry-pick -x for commits from official branches. > Your commits from master already point to invalid (private) git hashes... Jeremy, can you add your review flag?
Comment on attachment 6855 [details] Patch for v3-6-test LGTM for 3.6.1.
Re-assigning to Karolin for inclusion in 3.6.1 Jeremy.
Pushed to v3-6-test. Closing out bug report. Thanks!