Bug 8995 - SMB2 - Persistent File IDs must be unique.
SMB2 - Persistent File IDs must be unique.
Status: RESOLVED FIXED
Product: Samba 3.6
Classification: Unclassified
Component: File services
3.6.5
All All
: P5 minor
: ---
Assigned To: Karolin Seeger
Samba QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-13 02:23 UTC by Ira Cooper
Modified: 2012-06-17 19:02 UTC (History)
3 users (show)

See Also:


Attachments
Fix for 3.6 (8.98 KB, patch)
2012-06-13 02:23 UTC, Ira Cooper
metze: review-
Details
Additional patches for master (4.74 KB, patch)
2012-06-13 13:57 UTC, Stefan Metzmacher
no flags Details
Updated patch for master. (5.03 KB, patch)
2012-06-13 23:20 UTC, Jeremy Allison
no flags Details
Tested patches for master (8.40 KB, patch)
2012-06-14 12:18 UTC, Stefan Metzmacher
no flags Details
git-am patchstream containing the backport from master. (59.00 KB, patch)
2012-06-15 21:13 UTC, Jeremy Allison
metze: review+
jra: review? (ira)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ira Cooper 2012-06-13 02:23:39 UTC
Created attachment 7640 [details]
Fix for 3.6

We hit this while doing various testing, it turns out if the persistent file IDs are not unique the windows client seems to show some strange behaviors if multiple connections are open.

Patch enclosed.

Notes: The algorithm for creating persistent file id's is very simple.  This is intentional, because I'm assuming that this code will look totally different in master as the team works on SMB 3.0.
Comment 1 Jeremy Allison 2012-06-13 03:43:33 UTC
Marked as blocker for 3.6.next. I'll review asap tomorrow.
Comment 2 Stefan Metzmacher 2012-06-13 08:27:05 UTC
Comment on attachment 7640 [details]
Fix for 3.6

This introduces an incompatible chaange,
as it changes files_struct.

We already have 'gen_id' in files_struct which is supposed to be a unqiue number for the handle.

We should change get_gen_count(), to start with a random number.

Or we could use fsp->open_time, or a combination of both.

Also just removing the checks arround in_file_id_persistent != in_file_id_volatile is wrong!

We need to backport the changes up to s3:smb2_ioctl: make use of file_fsp_smb2() (046daccba59)

And let file_fsp_smb2 check that file_id_persistent matches the one we gave the client.
Comment 3 Stefan Metzmacher 2012-06-13 08:30:16 UTC
(In reply to comment #0)
> Created attachment 7640 [details]
> Fix for 3.6
> 
> We hit this while doing various testing, it turns out if the persistent file
> IDs are not unique the windows client seems to show some strange behaviors if
> multiple connections are open.

Please be more specific, what fails exactly?

Is it related to oplocks?

The only thing I can think of is that the client indexes the local files by
server_guid + persistent_file_id.
Comment 4 Ira Cooper 2012-06-13 10:36:44 UTC
(In reply to comment #3)
> (In reply to comment #0)
> > Created attachment 7640 [details] [details]
> > Fix for 3.6
> > 
> > We hit this while doing various testing, it turns out if the persistent file
> > IDs are not unique the windows client seems to show some strange behaviors if
> > multiple connections are open.
> 
> Please be more specific, what fails exactly?
> 
> Is it related to oplocks?
> 
> The only thing I can think of is that the client indexes the local files by
> server_guid + persistent_file_id.

That's the one we found.  I don't know what else might be having issues.  But from the traces we saw it was clear the client was confused.
Comment 5 Ira Cooper 2012-06-13 13:20:17 UTC
(In reply to comment #2)
> Comment on attachment 7640 [details]
> Fix for 3.6
> 
> This introduces an incompatible chaange,
> as it changes files_struct.

> We already have 'gen_id' in files_struct which is supposed to be a unqiue
> number for the handle.

It is part of the fd_handle.  That's not a clear location to put such a value.  Put it in the files_struct, right next to the fnum so the intent is clear.  I also tend not to go reaching into structures where I am not 100% sure on the semantics.

> We should change get_gen_count(), to start with a random number.

We could?  I have no idea why we would... I don't know what it supposed to solve.  I'm always leery of overlapping things that "could" be the same,

> Or we could use fsp->open_time, or a combination of both.

Time is a very poor choice.  I'm worried about these values being unique across all smbds.  I've thought of ways to guarantee that, but decided for 3.6 it is far more light weight to just patch it as I did.  This is a bikeshed.  I was given no less than 5 algorithms locally to solve it, and I've come up with 3 on my own.

> Also just removing the checks arround in_file_id_persistent !=
> in_file_id_volatile is wrong!

Here we might agree, except I thought they added little/no value outside of someone dinging us in a plugfest.  :)  My site routinely opens 30,000+ files, 40,000+ is far from unheard of.  Is it worth doing that lookup on every op, given that it is O(num_open_files)?  My "judgement" was no.
 
> We need to backport the changes up to s3:smb2_ioctl: make use of
> file_fsp_smb2() (046daccba59)

> And let file_fsp_smb2 check that file_id_persistent matches the one we gave the
> client.

Can you please give me the full range suggested, so I understand what you are suggesting?
Comment 6 Stefan Metzmacher 2012-06-13 13:57:21 UTC
Created attachment 7641 [details]
Additional patches for master

I'm going to push to master
Comment 7 Ira Cooper 2012-06-13 14:01:30 UTC
(In reply to comment #6)
> Created attachment 7641 [details]
> Additional patches for master
> 
> I'm going to push to master

The comment in the commit doesn't even address why you are making it random and what you are trying to fix.

Wait for another reviewer.
Comment 8 Jeremy Allison 2012-06-13 21:42:22 UTC
Ok, reading through I think I understand metze's patch for master.

It does need more commenting though - I'll add and re-post a patch for Ira and metze to review. Then let's work out the minimal way to fix this for 3.6.next.

FYI. I thought I'd marked this as a blocker for 3.6.next. Did someone disagree ? :-).

Jeremy.
Comment 9 Ira Cooper 2012-06-13 21:44:25 UTC
(In reply to comment #8)
> Ok, reading through I think I understand metze's patch for master.
> 
> It does need more commenting though - I'll add and re-post a patch for Ira and
> metze to review. Then let's work out the minimal way to fix this for 3.6.next.
> 
> FYI. I thought I'd marked this as a blocker for 3.6.next. Did someone disagree
> ? :-).
> 
> Jeremy.

I did, I didn't mark it a blocker originally.  I viewed it as "nice to have".  But that's me.
Comment 10 Jeremy Allison 2012-06-13 23:20:29 UTC
Created attachment 7643 [details]
Updated patch for master.

Ira, Metze - please review for master. Ira, can you confirm it fixes the issue on your site ? I'll look at what we need to make this work for 3.6.next.

Jeremy.
Comment 11 Jeremy Allison 2012-06-14 00:57:01 UTC
Hmmm. Looks like a bigger change to 3.6.next for this. Back-porting metze's file_fsp_smb2() changes looks like the right move. Will proceed on this tomorrow.

Jeremy.
Comment 12 Ira Cooper 2012-06-14 01:22:53 UTC
(In reply to comment #10)
> Created attachment 7643 [details]
> Updated patch for master.
> 
> Ira, Metze - please review for master. Ira, can you confirm it fixes the issue
> on your site ? I'll look at what we need to make this work for 3.6.next.
> 
> Jeremy.

Given that master doesn't work on site, at the moment, I can try... but well, it is really a matter of the uniqueness.  That should be something anyone can verify.

-Ira
Comment 13 Stefan Metzmacher 2012-06-14 07:58:05 UTC
My patches hang in samba3.smbtorture_s3.plain(s3dc).OPLOCK4,
I'm currently debugging this.
Comment 14 Stefan Metzmacher 2012-06-14 12:18:51 UTC
Created attachment 7644 [details]
Tested patches for master

I have this in a job that pushes it to the master autobuild after some private autobuilds
Comment 15 Stefan Metzmacher 2012-06-14 12:23:45 UTC
(In reply to comment #13)
> My patches hang in samba3.smbtorture_s3.plain(s3dc).OPLOCK4,
> I'm currently debugging this.

The problem was that the oplock code truncates fsp->fh->gen_id to uint32_t
via IVAL(). We now have gen_id in a range from 1 to UINT32_MAX -1.

We have a fsp_persistent_id() function which generates a 64-bit value
based on gen_id, open_time.tv_usec & UINT16_MAX and fnum.
Comment 16 Jeremy Allison 2012-06-15 21:13:03 UTC
Created attachment 7651 [details]
git-am patchstream containing the backport from master.

Includes all the fixes needed from master. Based on Ira and Metze's changes.

I had to do some work on some of these to enable them to backport cleanly.

Metze + Ira please review. Ira, if you could test in your environment that would be most appreciated as this is essentially what is in master, back ported to 3.6.next.

Metze, I hate you for making me to this work (you should have had to do it :-) :-).

Cheers,

Jeremy.
Comment 17 Stefan Metzmacher 2012-06-16 07:06:55 UTC
Comment on attachment 7651 [details]
git-am patchstream containing the backport from master.

Looks good, thanks!
Comment 18 Stefan Metzmacher 2012-06-16 07:08:13 UTC
Karolin, please pick it for the release
Comment 19 Karolin Seeger 2012-06-17 19:02:15 UTC
(In reply to comment #17)
> Comment on attachment 7651 [details]
> git-am patchstream containing the backport from master.
> 
> Looks good, thanks!

Pushed to v3-6-test.
Closing out bug report.

Thanks!