Bug 12845 - Related requests with SessionSetup fail with INTERNAL_ERROR
Related requests with SessionSetup fail with INTERNAL_ERROR
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services
All All
: P5 normal
: ---
Assigned To: Stefan Metzmacher
Samba QA Contact
Depends on:
  Show dependency treegraph
Reported: 2017-06-15 10:42 UTC by Moritz Bechler
Modified: 2017-06-29 14:03 UTC (History)
1 user (show)

See Also:

Packet capture SessionSetup+TreeConnect (2 bytes, text/plain)
2017-06-15 10:42 UTC, Moritz Bechler
no flags Details
Possible patch for master (as a first step) (1.07 KB, text/plain)
2017-06-15 21:13 UTC, Stefan Metzmacher
no flags Details
git-am fix for 4.6.next, 4.5.next (1.33 KB, patch)
2017-06-19 18:55 UTC, Jeremy Allison
metze: review+

Note You need to log in before you can comment on or make changes to this bug.
Description Moritz Bechler 2017-06-15 10:42:04 UTC
Created attachment 13284 [details]
Packet capture SessionSetup+TreeConnect

Related requests with SessionSetup fail with INTERNAL_ERROR. This is caused by the async checks in smbd_smb2_request_pending_queue (which essentially only allows to compound create) and possibly affects other situtations as well. I'm not so sure why all of
these async checks are applied in such a general way but there already
seems to be a bypass flag for other cases (async_internal).

This can only be triggered when establishing multiple sessions as samba only starts granting extra credits later on.

/usr/sbin/smbd: smbd_smb2_request_error_ex: smbd_smb2_request_error_ex: idx[1] status[NT_STATUS_INTERNAL_ERROR] || at ../source3/smbd/smb2_server.c:1388
/usr/sbin/smbd: smbd_smb2_request_done_ex: idx[1] status[NT_STATUS_INTERNAL_ERROR] body[8] dyn[yes:1] at ../source3/smbd/smb2_server.c:3145

For reproduction I could offer some test cases using the java implementation if you like.
Comment 1 Stefan Metzmacher 2017-06-15 21:13:41 UTC
Created attachment 13287 [details]
Possible patch for master (as a first step)

Compounding SessionSetup and TreeConnect is actually not a useful approach
as that way the TreeConnect can't be signed...
Comment 2 Moritz Bechler 2017-06-16 07:22:07 UTC
I don't think that is entirely true (but yes, requires some special handling in the client),

a) if we don't sign (anonymous session) it's not an issue at all.
b) The GSS context is already half-established and session key available before sending the final SessionSetup so we can properly sign the related requests.
Comment 3 Stefan Metzmacher 2017-06-16 08:47:19 UTC
(In reply to Moritz Bechler from comment #2)

You're probably right for the anonymous case.

It might also work with the NTLMSSP case
and even with Kerberos and arcfour-hmac-md5 (as the initiator and acceptor subkeys are the same).

But modern Kerberos (with aes keys) and any other gss mech in future
will use an acceptor subkey, that is transfered from the server
to the client in the final SessionSetup response.

I'd assume that it's better to keep the logic simple
and have just one logic flow that works for every case.
Comment 4 Moritz Bechler 2017-06-16 09:33:46 UTC
I really don't mind too much if this does not work (can't really enable it anyways when it breaks with some versions - and it's a very minor optimization).

I was under the impression that this failed based on the async check (which was probably hiding the missing session reference) which I guess might also affect other situations. With regards to that, what is the logic there? 

"as we don't allow compound requests after going async." 

It would seem to me that there isn't anything actually going async there (or does this only refer to some internal processing details?).

Even if it were actually going async, I don't think it is correct to return an error - if I haven't missed anything significant a client is never going to know which requests might go async and which don't. Imho it would seem correct to just send interim responses for the outstanding requests as well.
Comment 5 Stefan Metzmacher 2017-06-16 09:42:27 UTC
(In reply to Moritz Bechler from comment #4)

Internally we're always going async for authentication and with trusted
domain support (hopefully coming soon) I'll be really async.

So for now even with this patch it will trigger the same error.

While adding the trust support I'll try to see if we should hide
going async from the client.
Comment 6 Moritz Bechler 2017-06-16 11:12:34 UTC
Out of curiosity I did some general testing of async/compound combinations. So it looks like the difference is really only in the async-ish handling of the session setup. 
Except for CREATE requests the MS implementations bail out with INTERNAL_ERROR as well if they want to go async and the request is not the end of an compound chain. So this seems consistent for at least some other (probably mostly unrealistic) cases I could come up with. Still, seems kind of stupid from a client perspective.
Comment 7 Volker Lendecke 2017-06-16 11:38:07 UTC
(In reply to Moritz Bechler from comment #6)
> end of an compound chain. So this seems consistent for at least some other
> (probably mostly unrealistic) cases I could come up with. Still, seems kind
> of stupid from a client perspective.

From a client perspective you most benefit from compounding if you pass on the file handle from a CREATE to a READ and CLOSE for example. For requests that are independent from each other you could just send them off without waiting for the individual reply. This way you also don't have the head-of-line blocking problem.
Comment 8 Moritz Bechler 2017-06-16 12:07:56 UTC
Sure, that would be the most common usage. But even then, if the READ went async for some reason (not sure whether you'll get an implementation to do that and whether this would trigger the same error) following that logic it could suddenly fail. There are some rules which request should/must go async, but there don't seem to be any rules which don't. So it seems a client has to guess for which requests this might happen and whether the server will handle it correctly or not - or am I missing something here?
Comment 9 Volker Lendecke 2017-06-16 12:49:35 UTC
Not arguing with that. I haven't really deeply looked at compound vs async. To be honest, in practice I see the only real reason for going async is the read or getinfo after a create. A client wants something from a path (!) and has no file handle. So "just" prepend a compound create, and you're done. Large reads are multiple requests anyway, and the close can be shipped asynchronously later.

Apart from that I am genuinely interested in cases where compounding requests gives benefits in real-world workloads.

By the way, this is *exactly* the kind of question you might want to send to dochelp@microsoft.com, CC'ing cifs-protocol@lists.samba.org. This is Microsoft offering assistance on the protocol docs.
Comment 10 Jeremy Allison 2017-06-19 18:55:58 UTC
Created attachment 13294 [details]
git-am fix for 4.6.next, 4.5.next

Cherry-pick from what went into master.
Comment 11 Jeremy Allison 2017-06-20 18:54:48 UTC
Karolin please add to 4.6.next, 4.5.next !
Comment 12 Karolin Seeger 2017-06-23 11:15:12 UTC
(In reply to Jeremy Allison from comment #11)
Pushed to autobuild-v4-{6,5}-test.
Comment 13 Karolin Seeger 2017-06-27 10:47:30 UTC
(In reply to Karolin Seeger from comment #12)
Pushed to both branches.
Closing out bug report.

Comment 14 Stefan Metzmacher 2017-06-29 14:03:29 UTC
Reopen regarding the INTERNAL_ERROR problem.

During the work for trusts I might be able to check how a Windows
member server behaves if the dc is delaying the response.