The Samba-Bugzilla – Bug 8335
file copy aborts with smb2_validate_message_id: bad message_id
Last modified: 2011-07-31 19:12:23 UTC
Same test as in Bug 8334
Windows reports "Network name no longer available", smbd logged
smb2_validate_message_id: bad message_id 7621 (low = 7364, max = 128)
numbers vary from attempt to attempt
Created attachment 6727 [details]
Level 10 log
Ok, I've figured this one out. Give me a little while to create a patch.
Unfortunately I spoke too soon. This looks like a client bug.
In SMB2 a client should start numbering messages from zero, and can send out-of-order message id's up to the number of credits granted from the server. We (by default) grant up to 128 credits, which means the valid message range once the client has reached it's default is <min continuous number sent from client> - <min continuous number sent from client> + 128.
As the client moves the minimum number forward, this range marches forward too as a sliding window of valid message id's.
However, the Windows client breaks this.
In the log, search for the line containing:
mid = 7364
The next message should be 7365, which would cause us to move our window forward by one. However, the next message has:
mid = 7368
- missing the message id's 7365, 7366, 7367. Until we receive these mids, we can't move our window forward. So the range stays at:
7365 to 7364 + (128 * 2) (7620)
NB. Because Windows behaves strangely w.r.t. crediting we have this multiplier of 2 on the value of the "smb2 max credits" parameter.
Until we get messages 7365 we can't move the window forward - the client is simply neglecting to send the next message id.
What version of Win2k8r2 is this ? Does it have SP1 applied ? If not, try applying SP1 and re-testing. If it does, then try doubling the "smb2 max credits" value in the [global] section of your smb.conf to 256 and see if this avoids the problem.
There's some missmatch here between what the client does and what the server is doing, and I need to see a reproducible case on this. I tried to reproduce here, but my 64-bit Win2k8r2 machine locks my Linux kernel up hard under virtualbox when I'm trying to do file transfers, which is rather annoying (to say the least).
Oh ho ! Look what I've discovered in the latest SMB2 doc.... :
Windows Vista SP1, Windows 7, Windows Server 2008, and Windows Server 2008 R2 SMB2 servers support a configurable minimum credit limit below which the client is unconditionally granted all credits it requests, and a configurable maximum credit limit above which credits are never granted, as follows:
SMB2 server Default minimum Default maximum
Windows Vista SP1 and Windows 7 128 2048
Windows Server 2008 and Windows 512 8192
Server 2008 R2
<130> Section 184.108.40.206: A Windows–based server does not currently scale credits based on quality of service features.
<131> Section 220.127.116.11: Windows 7 and Windows Server 2008 R2-based SMB2 servers support only the levels described above, and Windows 7 and Windows Server 2008 R2-based SMB2 clients request only those levels.
So for now try setting "smb2 max credits = 8192" and see if you can reproduce. We might need to set this by default in include/locals.h.
FYI. Here's the section in the SMB2 doc that explains the factor of 2 "fudge factor" in allowing message id values above the sliding window of credits granted.
"Windows-based servers will limit the maximum range of sequence numbers. If a client has been granted 10 credits, the server will not allow the difference between the smallest available sequence number and the largest available sequence number to exceed 2*10 = 20. Therefore, if the client has sequence number 10 available and does not send it, the server will stop granting credits as the client nears sequence number 30, and eventually will grant no further credits until the client sends sequence number 10."
Oh, just one thing: This comment "<131> Section 18.104.22.168: Windows 7 and Windows Server 2008 R2-based SMB2 servers
support only the levels described above, and Windows 7 and Windows Server 2008
R2-based SMB2 clients request only those levels." doesn't apply to credits, but to leasing.
The correct comment describing client credit requests is :
"<71> Section 22.214.171.124.2: The Windows-based client will request credits up to a configurable maximum of 128 by default. A Windows-based client sends a CreditRequest value of 0 for an SMB2 NEGOTIATE Request and expects the server to grant at least 1 credit. In subsequent requests, the client will request credits sufficient to maintain its total outstanding limit at the configured maximum."
Created attachment 6728 [details]
git-am fix for 3.6.0
This is a 2 part patch.
Part 1 - set the default max credits to 8192 (same as W2K8R2).
Part 2 - modify the crediting algorithm to scale down credit granting in units of 1/16th. Should give smoother credit granting to clients.
Christian - please try these patches in your setup and report back asap (I haven't been able to get a working virtualized w2k8r2 setup yet, but I'll test myself on Win7 tomorrow).
Comment on attachment 6728 [details]
git-am fix for 3.6.0
applying only the second part that changes the calculation did not help.
Increasing max credits helped for the file sizes (~100M) I was testing with.
Will continue testing with larger file copies
Yes, my guess is that this is a client bug due to Microsoft not fully testing with anything but Windows servers. We'll probably need both parts of the patch for 3.6.0 final.
Let me know asap if this allows you to copy mulit-gigabyte files around (it should).
In the meantime I'm going to push to master.
Ok, with my Win7 client and this patch applied to 3.6.0 I can happily copy 2.9GB files from Samba share to Samba share.
Christian, once you confirm I think we can apply this one.
Created attachment 6731 [details]
git-am fix for 3.6.0
This one applies cleanly to my v3-6-test tree (the other one didn't for some reason).
Christian, once you +1 I'll re-assign to Karolin for inclusion in 3.6.0 final.
Comment on attachment 6731 [details]
git-am fix for 3.6.0
With patch applied, multi-gigabyte files can be copied, so it seems to work.
please include the patch into 3.6.0
Pushed to v3-6-test.
Closing out bug report.