Created attachment 7640 [details] Fix for 3.6 We hit this while doing various testing, it turns out if the persistent file IDs are not unique the windows client seems to show some strange behaviors if multiple connections are open. Patch enclosed. Notes: The algorithm for creating persistent file id's is very simple. This is intentional, because I'm assuming that this code will look totally different in master as the team works on SMB 3.0.
Marked as blocker for 3.6.next. I'll review asap tomorrow.
Comment on attachment 7640 [details] Fix for 3.6 This introduces an incompatible chaange, as it changes files_struct. We already have 'gen_id' in files_struct which is supposed to be a unqiue number for the handle. We should change get_gen_count(), to start with a random number. Or we could use fsp->open_time, or a combination of both. Also just removing the checks arround in_file_id_persistent != in_file_id_volatile is wrong! We need to backport the changes up to s3:smb2_ioctl: make use of file_fsp_smb2() (046daccba59) And let file_fsp_smb2 check that file_id_persistent matches the one we gave the client.
(In reply to comment #0) > Created attachment 7640 [details] > Fix for 3.6 > > We hit this while doing various testing, it turns out if the persistent file > IDs are not unique the windows client seems to show some strange behaviors if > multiple connections are open. Please be more specific, what fails exactly? Is it related to oplocks? The only thing I can think of is that the client indexes the local files by server_guid + persistent_file_id.
(In reply to comment #3) > (In reply to comment #0) > > Created attachment 7640 [details] [details] > > Fix for 3.6 > > > > We hit this while doing various testing, it turns out if the persistent file > > IDs are not unique the windows client seems to show some strange behaviors if > > multiple connections are open. > > Please be more specific, what fails exactly? > > Is it related to oplocks? > > The only thing I can think of is that the client indexes the local files by > server_guid + persistent_file_id. That's the one we found. I don't know what else might be having issues. But from the traces we saw it was clear the client was confused.
(In reply to comment #2) > Comment on attachment 7640 [details] > Fix for 3.6 > > This introduces an incompatible chaange, > as it changes files_struct. > We already have 'gen_id' in files_struct which is supposed to be a unqiue > number for the handle. It is part of the fd_handle. That's not a clear location to put such a value. Put it in the files_struct, right next to the fnum so the intent is clear. I also tend not to go reaching into structures where I am not 100% sure on the semantics. > We should change get_gen_count(), to start with a random number. We could? I have no idea why we would... I don't know what it supposed to solve. I'm always leery of overlapping things that "could" be the same, > Or we could use fsp->open_time, or a combination of both. Time is a very poor choice. I'm worried about these values being unique across all smbds. I've thought of ways to guarantee that, but decided for 3.6 it is far more light weight to just patch it as I did. This is a bikeshed. I was given no less than 5 algorithms locally to solve it, and I've come up with 3 on my own. > Also just removing the checks arround in_file_id_persistent != > in_file_id_volatile is wrong! Here we might agree, except I thought they added little/no value outside of someone dinging us in a plugfest. :) My site routinely opens 30,000+ files, 40,000+ is far from unheard of. Is it worth doing that lookup on every op, given that it is O(num_open_files)? My "judgement" was no. > We need to backport the changes up to s3:smb2_ioctl: make use of > file_fsp_smb2() (046daccba59) > And let file_fsp_smb2 check that file_id_persistent matches the one we gave the > client. Can you please give me the full range suggested, so I understand what you are suggesting?
Created attachment 7641 [details] Additional patches for master I'm going to push to master
(In reply to comment #6) > Created attachment 7641 [details] > Additional patches for master > > I'm going to push to master The comment in the commit doesn't even address why you are making it random and what you are trying to fix. Wait for another reviewer.
Ok, reading through I think I understand metze's patch for master. It does need more commenting though - I'll add and re-post a patch for Ira and metze to review. Then let's work out the minimal way to fix this for 3.6.next. FYI. I thought I'd marked this as a blocker for 3.6.next. Did someone disagree ? :-). Jeremy.
(In reply to comment #8) > Ok, reading through I think I understand metze's patch for master. > > It does need more commenting though - I'll add and re-post a patch for Ira and > metze to review. Then let's work out the minimal way to fix this for 3.6.next. > > FYI. I thought I'd marked this as a blocker for 3.6.next. Did someone disagree > ? :-). > > Jeremy. I did, I didn't mark it a blocker originally. I viewed it as "nice to have". But that's me.
Created attachment 7643 [details] Updated patch for master. Ira, Metze - please review for master. Ira, can you confirm it fixes the issue on your site ? I'll look at what we need to make this work for 3.6.next. Jeremy.
Hmmm. Looks like a bigger change to 3.6.next for this. Back-porting metze's file_fsp_smb2() changes looks like the right move. Will proceed on this tomorrow. Jeremy.
(In reply to comment #10) > Created attachment 7643 [details] > Updated patch for master. > > Ira, Metze - please review for master. Ira, can you confirm it fixes the issue > on your site ? I'll look at what we need to make this work for 3.6.next. > > Jeremy. Given that master doesn't work on site, at the moment, I can try... but well, it is really a matter of the uniqueness. That should be something anyone can verify. -Ira
My patches hang in samba3.smbtorture_s3.plain(s3dc).OPLOCK4, I'm currently debugging this.
Created attachment 7644 [details] Tested patches for master I have this in a job that pushes it to the master autobuild after some private autobuilds
(In reply to comment #13) > My patches hang in samba3.smbtorture_s3.plain(s3dc).OPLOCK4, > I'm currently debugging this. The problem was that the oplock code truncates fsp->fh->gen_id to uint32_t via IVAL(). We now have gen_id in a range from 1 to UINT32_MAX -1. We have a fsp_persistent_id() function which generates a 64-bit value based on gen_id, open_time.tv_usec & UINT16_MAX and fnum.
Created attachment 7651 [details] git-am patchstream containing the backport from master. Includes all the fixes needed from master. Based on Ira and Metze's changes. I had to do some work on some of these to enable them to backport cleanly. Metze + Ira please review. Ira, if you could test in your environment that would be most appreciated as this is essentially what is in master, back ported to 3.6.next. Metze, I hate you for making me to this work (you should have had to do it :-) :-). Cheers, Jeremy.
Comment on attachment 7651 [details] git-am patchstream containing the backport from master. Looks good, thanks!
Karolin, please pick it for the release
(In reply to comment #17) > Comment on attachment 7651 [details] > git-am patchstream containing the backport from master. > > Looks good, thanks! Pushed to v3-6-test. Closing out bug report. Thanks!