Bug 8335 - file copy aborts with smb2_validate_message_id: bad message_id
Summary: file copy aborts with smb2_validate_message_id: bad message_id
Alias: None
Product: Samba 3.6
Classification: Unclassified
Component: SMB2 (show other bugs)
Version: 3.6.0rc3
Hardware: All All
: P5 regression
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
Depends on:
Reported: 2011-07-28 16:46 UTC by Christian Ambach
Modified: 2011-07-31 19:12 UTC (History)
0 users

See Also:

Level 10 log (1.57 MB, application/gzip)
2011-07-28 16:47 UTC, Christian Ambach
no flags Details
git-am fix for 3.6.0 (3.73 KB, patch)
2011-07-29 03:29 UTC, Jeremy Allison
no flags Details
git-am fix for 3.6.0 (3.73 KB, patch)
2011-07-29 18:20 UTC, Jeremy Allison
ambi: review+

Note You need to log in before you can comment on or make changes to this bug.
Description Christian Ambach 2011-07-28 16:46:32 UTC
Same test as in Bug 8334

Windows reports "Network name no longer available", smbd logged
smb2_validate_message_id: bad message_id 7621 (low = 7364, max = 128)

numbers vary from attempt to attempt
Comment 1 Christian Ambach 2011-07-28 16:47:36 UTC
Created attachment 6727 [details]
Level 10 log
Comment 2 Jeremy Allison 2011-07-28 19:12:52 UTC
Ok, I've figured this one out. Give me a little while to create a patch.
Comment 3 Jeremy Allison 2011-07-28 21:28:28 UTC
Unfortunately I spoke too soon. This looks like a client bug.

In SMB2 a client should start numbering messages from zero, and can send out-of-order message id's up to the number of credits granted from the server. We (by default) grant up to 128 credits, which means the valid message range once the client has reached it's default is <min continuous number sent from client> - <min continuous number sent from client> + 128.

As the client moves the minimum number forward, this range marches forward too as a sliding window of valid message id's.

However, the Windows client breaks this.

In the log, search for the line containing:

mid = 7364

The next message should be 7365, which would cause us to move our window forward by one. However, the next message has:

mid = 7368

- missing the message id's 7365, 7366, 7367. Until we receive these mids, we can't move our window forward. So the range stays at:

7365 to 7364 + (128 * 2) (7620)

NB. Because Windows behaves strangely w.r.t. crediting we have this multiplier of 2 on the value of the "smb2 max credits" parameter.

Until we get messages 7365 we can't move the window forward - the client is simply neglecting to send the next message id.

What version of Win2k8r2 is this ? Does it have SP1 applied ? If not, try applying SP1 and re-testing. If it does, then try doubling the "smb2 max credits" value in the [global] section of your smb.conf to 256 and see if this avoids the problem.

There's some missmatch here between what the client does and what the server is doing, and I need to see a reproducible case on this. I tried to reproduce here, but my 64-bit Win2k8r2 machine locks my Linux kernel up hard under virtualbox when I'm trying to do file transfers, which is rather annoying (to say the least).

Comment 4 Jeremy Allison 2011-07-28 21:40:49 UTC
Oh ho ! Look what I've discovered in the latest SMB2 doc.... :

Windows Vista SP1, Windows 7, Windows Server 2008, and Windows Server 2008 R2 SMB2 servers support a configurable minimum credit limit below which the client is unconditionally granted all credits it requests, and a configurable maximum credit limit above which credits are never granted, as follows: 

SMB2 server                       Default minimum             Default maximum
Windows Vista SP1 and Windows 7   128                         2048
Windows Server 2008 and Windows   512                         8192
Server 2008 R2

<130> Section A Windows–based server does not currently scale credits based on quality of service features.
<131> Section Windows 7 and Windows Server 2008 R2-based SMB2 servers support only the levels described above, and Windows 7 and Windows Server 2008 R2-based SMB2 clients request only those levels.

So for now try setting "smb2 max credits = 8192" and see if you can reproduce. We might need to set this by default in include/locals.h.

Comment 5 Jeremy Allison 2011-07-28 21:49:27 UTC
FYI. Here's the section in the SMB2 doc that explains the factor of 2 "fudge factor" in allowing message id values above the sliding window of credits granted.

"Windows-based servers will limit the maximum range of sequence numbers. If a client has been granted 10 credits, the server will not allow the difference between the smallest available sequence number and the largest available sequence number to exceed 2*10 = 20. Therefore, if the client has sequence number 10 available and does not send it, the server will stop granting credits as the client nears sequence number 30, and eventually will grant no further credits until the client sends sequence number 10."

Comment 6 Jeremy Allison 2011-07-28 22:05:10 UTC
Oh, just one thing: This comment "<131> Section Windows 7 and Windows Server 2008 R2-based SMB2 servers
support only the levels described above, and Windows 7 and Windows Server 2008
R2-based SMB2 clients request only those levels." doesn't apply to credits, but to leasing.

The correct comment describing client credit requests is :

"<71> Section The Windows-based client will request credits up to a configurable maximum of 128 by default. A Windows-based client sends a CreditRequest value of 0 for an SMB2 NEGOTIATE Request and expects the server to grant at least 1 credit. In subsequent requests, the client will request credits sufficient to maintain its total outstanding limit at the configured maximum."

Comment 7 Jeremy Allison 2011-07-29 03:29:24 UTC
Created attachment 6728 [details]
git-am fix for 3.6.0

This is a 2 part patch.

Part 1 - set the default max credits to 8192 (same as W2K8R2).
Part 2 - modify the crediting algorithm to scale down credit granting in units of 1/16th. Should give smoother credit granting to clients.

Christian - please try these patches in your setup and report back asap (I haven't been able to get a working virtualized w2k8r2 setup yet, but I'll test myself on Win7 tomorrow).

Thanks !

Comment 8 Christian Ambach 2011-07-29 11:01:47 UTC
Comment on attachment 6728 [details]
git-am fix for 3.6.0

First feedback:
applying only the second part that changes the calculation did not help.
Increasing max credits helped for the file sizes (~100M) I was testing with.
Will continue testing with larger file copies
Comment 9 Jeremy Allison 2011-07-29 16:50:25 UTC
Yes, my guess is that this is a client bug due to Microsoft not fully testing with anything but Windows servers. We'll probably need both parts of the patch for 3.6.0 final.

Let me know asap if this allows you to copy mulit-gigabyte files around (it should).

In the meantime I'm going to push to master.

Comment 10 Jeremy Allison 2011-07-29 18:15:04 UTC
Ok, with my Win7 client and this patch applied to 3.6.0 I can happily copy 2.9GB files from Samba share to Samba share.

Christian, once you confirm I think we can apply this one.

Comment 11 Jeremy Allison 2011-07-29 18:20:34 UTC
Created attachment 6731 [details]
git-am fix for 3.6.0

This one applies cleanly to my v3-6-test tree (the other one didn't for some reason).

Christian, once you +1 I'll re-assign to Karolin for inclusion in 3.6.0 final.

Comment 12 Christian Ambach 2011-07-30 09:27:42 UTC
Comment on attachment 6731 [details]
git-am fix for 3.6.0

With patch applied, multi-gigabyte files can be copied, so it seems to work.
Comment 13 Christian Ambach 2011-07-30 09:28:35 UTC

please include the patch into 3.6.0
Comment 14 Karolin Seeger 2011-07-31 19:12:23 UTC
Pushed to v3-6-test.
Closing out bug report.