Bug 11259 - smbd contacts a domain controller for each session
Summary: smbd contacts a domain controller for each session
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.2.1
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-06 11:06 UTC by Uri Simchoni
Modified: 2016-10-25 07:39 UTC (History)
8 users (show)

See Also:


Attachments
Logs (UTC+3 timezone) (1020.00 KB, application/octet-stream)
2015-05-06 11:07 UTC, Uri Simchoni
no flags Details
packet traces (4.23 MB, application/vnd.tcpdump.pcap)
2015-05-06 11:10 UTC, Uri Simchoni
no flags Details
smb.conf (2.00 KB, text/plain)
2015-05-06 11:11 UTC, Uri Simchoni
no flags Details
Keytab for decrypting ldap messages (1.02 KB, application/octet-stream)
2015-05-06 11:12 UTC, Uri Simchoni
no flags Details
git-am fix for master. (20.95 KB, patch)
2015-05-11 21:42 UTC, Jeremy Allison
no flags Details
Packet traces after proposed fix (4.07 MB, application/vnd.tcpdump.pcap)
2015-05-12 10:16 UTC, Uri Simchoni
no flags Details
Logs after proposed fix (UTC+3 timezone) (750.00 KB, application/x-zip-compressed)
2015-05-12 10:17 UTC, Uri Simchoni
no flags Details
git-am fix for master. (21.11 KB, patch)
2015-05-12 17:05 UTC, Jeremy Allison
no flags Details
logs after 2nd proposed patch (UTC+3) (1000.00 KB, application/x-zip-compressed)
2015-05-12 21:27 UTC, Uri Simchoni
no flags Details
packet trace after 2nd proposed fix (2.21 MB, application/vnd.tcpdump.pcap)
2015-05-12 21:28 UTC, Uri Simchoni
no flags Details
git-am fix for master. (21.10 KB, patch)
2015-05-13 19:54 UTC, Jeremy Allison
no flags Details
fix to IPC part of Jeremy's patch (20.50 KB, patch)
2015-05-17 20:39 UTC, Uri Simchoni
no flags Details
packet traces with latest patch - patch4.patch (1.41 MB, application/vnd.tcpdump.pcap)
2015-05-17 20:46 UTC, Uri Simchoni
no flags Details
keytab for latest (patch4-s) capture (1.02 KB, application/octet-stream)
2015-05-17 20:47 UTC, Uri Simchoni
no flags Details
logs for patch4 (UTC+3) (790.00 KB, application/x-zip-compressed)
2015-05-17 20:48 UTC, Uri Simchoni
no flags Details
git-am test patch for master (2.77 KB, patch)
2016-09-27 00:13 UTC, Jeremy Allison
no flags Details
git-am patch for master (128 bytes, patch)
2016-09-28 00:38 UTC, Jeremy Allison
no flags Details
git-am patch for master (5.09 KB, patch)
2016-09-28 00:41 UTC, Jeremy Allison
no flags Details
git-am patch for master (5.38 KB, patch)
2016-09-28 02:26 UTC, Jeremy Allison
cs: review+
gd: review+
Details
git-am fix for 4.5.next, 4.4.next. (14.07 KB, patch)
2016-10-07 22:44 UTC, Jeremy Allison
uri: review+
Details
git-am fix for 4.5.next, 4.4.next. (15.19 KB, patch)
2016-10-18 00:10 UTC, Jeremy Allison
uri: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Uri Simchoni 2015-05-06 11:06:14 UTC
When samba is a domain member, and a client authenticates via kerberos, the client supplies (through the PAC embedded in the AP-REQ ticket) information that should be sufficient for starting the session, without a need to contact a domain controller for more information on the client.

This is not generally true, but true for configurations where the id-mapping does not depend on information from the DC (e.g. rid, autorid). Another caveat is that the the PAC might include "extra-sids" and since smbd doesn't know whether they are group sids or user sids, it has to query the DC.

However, it appears that samba (smbd+winbindd) does contact the DC:
1. There's at least one ldap query to get the user's details by its SID.
2. A new connection is opened to make this query, and this involves full Kerberos authentication (although it should be feasible to keep a connection open or at least have cached service tickets). All-in-all, 40+ packets are exchanged.
3. There is also an LSARPC lsa_LookupSids3, on an already-open connection.

The significance of this is in setups where network connectivity issues, long round-trip delays, or misconfigured site information, causes the action of opening a new ldap connection to take a long time or sometimes fail completely. By comparison, a Windows file server does not need to contact the domain, and hence the network issues do not affect it.
 
see also discussion in samba-technical - https://lists.samba.org/archive/samba-technical/2015-May/107199.html

Notes about the scenario recorded in the packet captures and logs:
1. The daemons are started, device is joined to domain QA (there are some transactions for group mapping, finding trusted domains, etc.)
2. User "QA\uris" connects to "share1" - starting at packet 6237. He then logs off.
3. User "QA\uris2" connects to "share1" - starting at packet 21233
Comment 1 Uri Simchoni 2015-05-06 11:07:39 UTC
Created attachment 11025 [details]
Logs (UTC+3 timezone)
Comment 2 Uri Simchoni 2015-05-06 11:10:09 UTC
Created attachment 11026 [details]
packet traces

samba server is 192.168.91.23
DC is 192.168.90.201
client is 192.168.91.134
Comment 3 Uri Simchoni 2015-05-06 11:11:44 UTC
Created attachment 11027 [details]
smb.conf
Comment 4 Uri Simchoni 2015-05-06 11:12:34 UTC
Created attachment 11028 [details]
Keytab for decrypting ldap messages
Comment 5 Jeremy Allison 2015-05-07 20:37:33 UTC
Oh. You have a very unusual configuration in your smb.conf.

You have:

       passdb backend = smbpasswd

You should remove this. Do you have any local users ? The smbpasswd backend is not capable of storing full user data. This must be removed (or set to the default of tdbsam) and any local users migrated to the proper backend database (using pdbedit).

This shouldn't make a difference (I'm still looking into your logs) but it's a bad sign.

Jeremy.
Comment 6 Jeremy Allison 2015-05-07 23:46:40 UTC
Ok, I think I see the problem here... Are you able to test something for me once I've coded it up ?
Comment 7 Uri Simchoni 2015-05-10 04:09:16 UTC
(In reply to Jeremy Allison from comment #6)
Sure NP
Comment 8 Jeremy Allison 2015-05-11 16:59:36 UTC
Thanks, I should have something for you to test within a few days.
Comment 9 Jeremy Allison 2015-05-11 21:42:30 UTC
Created attachment 11043 [details]
git-am fix for master.

OK, this is a test patch for master only that adds the ability for smbd to save the name -> SID translation into the winbindd cache for a domain, if a successful logon has a PAC associated.

Can you let me know if this fixes the problem for you ? If so, I'll create a back-port for 4.2.next.

Cheers,

Jeremy.
Comment 10 Uri Simchoni 2015-05-12 10:14:16 UTC
(In reply to Jeremy Allison from comment #9)
That got rid of the ldap query. I still get some traffic for the session setup though - on an existing SMB2 connection to the DC. I'm attaching new captures and logs.

For the first session setup I get:
lsa_LookupNames4 (packet 7842)
3 x lsa_LookupSids3 (packets 7854,7859,7867)

For second session setup I get:
lsa_LookupNames4 (packet 14593)
1 x lsa_LookupSids3 (packet 14602)

Thanks!
Uri
Comment 11 Uri Simchoni 2015-05-12 10:16:34 UTC
Created attachment 11044 [details]
Packet traces after proposed fix
Comment 12 Uri Simchoni 2015-05-12 10:17:39 UTC
Created attachment 11045 [details]
Logs after proposed fix (UTC+3 timezone)
Comment 13 Jeremy Allison 2015-05-12 16:42:35 UTC
Hmmm. Patch isn't working for some reason. There's no:

+
+       DEBUG(3, ("winbindd_cache_name_to_sid %s\n", request->data.username));
+

output in the winbindd log. Let me investigate and post an updated patch with more debug so I can follow the logic flow along.
Comment 14 Jeremy Allison 2015-05-12 16:49:14 UTC
Ah - I see the problem. It's not calling the new 'cache name -> sid code'.

The logging on user has account flags in the PAC of : 

                  acct_flags               : 0x00000210 (528)
                         0: ACB_DISABLED             
                         0: ACB_HOMDIRREQ            
                         0: ACB_PWNOTREQ             
                         0: ACB_TEMPDUP              
                         1: ACB_NORMAL               
                         0: ACB_MNS                  
                         0: ACB_DOMTRUST             
                         0: ACB_WSTRUST              
                         0: ACB_SVRTRUST             
                         1: ACB_PWNOEXP              

and my code checks:

               if (info3->base.acct_flags == ACB_NORMAL) {

Give me a few minutes to upload a modified patchset...
Comment 15 Jeremy Allison 2015-05-12 17:05:17 UTC
Created attachment 11047 [details]
git-am fix for master.

OK, here is an updated version with the :

if (info3->base.acct_flags == ACB_NORMAL) {

changed to:

if (info3->base.acct_flags & ACB_NORMAL) {

and a little more debug.

Sorry for the problem, can you retest ?

Thanks,

Jeremy.
Comment 16 Uri Simchoni 2015-05-12 21:27:15 UTC
Created attachment 11049 [details]
logs after 2nd proposed patch (UTC+3)
Comment 17 Uri Simchoni 2015-05-12 21:28:28 UTC
Created attachment 11050 [details]
packet trace after 2nd proposed fix
Comment 18 Uri Simchoni 2015-05-12 21:35:22 UTC
I see the CACHE_NAME_TO_SID now in the logs but the external behavior hasn't changed - I still see the lsa_LookupNames4 and lsa_LookupSids3.
Comment 19 Jeremy Allison 2015-05-13 19:21:03 UTC
I messed up the patch somehow, sorry. In the logs we have:

smbd:

  Caching domain QA name QA\uris to sid S-1-5-21-3738800907-3116475622-1073917613-121441 returned WBC_ERR_DOMAIN_NOT_FOUND

winbindd:

  process_request: Handling async request 862:CACHE_NAME_TO_SID
[2015/05/13 00:10:40.480830,  3, pid=828, effective(0, 0), real(0, 0), class=winbind] ../source3/winbindd/winbindd_cache_name_to_sid.c:63(winbindd_cache_name_to_
  winbindd_cache_name_to_sid ^A
[2015/05/13 00:10:40.481132,  5, pid=828, effective(0, 0), real(0, 0), class=winbind] ../source3/winbindd/winbindd_cache_name_to_sid.c:83(winbindd_cache_name_to_
  domain name missmatch: QA 
[2015/05/13 00:10:40.481454, 10, pid=828, effective(0, 0), real(0, 0), class=winbind] ../source3/winbindd/winbindd.c:790(wb_request_done)
  wb_request_done[862:CACHE_NAME_TO_SID]: NT_STATUS_INVALID_PARAMETER
[2015/05/13 00:10:40.483870, 10, pid=828, effective(0, 0), real(0, 0), class=winbind] ../source3/winbindd/winbindd.c:851(winbind_client_response_written)
  winbind_client_response_written[862:CACHE_NAME_TO_SID]: delivered response to client

I'll investigate, maybe with better debug...
Comment 20 Jeremy Allison 2015-05-13 19:54:59 UTC
Created attachment 11053 [details]
git-am fix for master.

I've found and fixed the bug. I'm an utter and complete moron, and
cut-and-pasted code for the client-side of the RPC that put the
domain and username in the wrong place :-(.

The changes were in wbcCtxCacheNameToSid(), which now uses:

+       strncpy(request.domain_name, domain,
+               sizeof(request.domain_name)-1);
+
+       strncpy(request.data.username, name,
+               sizeof(request.data.username)-1);

to put the data in the right place.

The attached version might have a better chance of working. Really
sorry to waste your time on that version :-(.
Comment 21 Uri Simchoni 2015-05-17 20:39:44 UTC
Created attachment 11059 [details]
fix to IPC part of Jeremy's patch
Comment 22 Uri Simchoni 2015-05-17 20:46:47 UTC
Created attachment 11060 [details]
packet traces with latest patch - patch4.patch
Comment 23 Uri Simchoni 2015-05-17 20:47:38 UTC
Created attachment 11061 [details]
keytab for latest (patch4-s) capture
Comment 24 Uri Simchoni 2015-05-17 20:48:40 UTC
Created attachment 11062 [details]
logs for patch4 (UTC+3)
Comment 25 Uri Simchoni 2015-05-17 20:53:27 UTC
Tried the latest patch, still failed to cache name->sid. I had a look and hopefully fixed that - see attached patch4.

This indeed eliminated the lookupnames4, but the lookupsids remain - see latest logs and packet traces.

Thanks!
Uri.
Comment 26 Jeremy Allison 2015-05-18 00:03:57 UTC
Thanks for the IPC fix. OK, this is getting closer. We've eliminated one more DC request, only one more case to go.

I'm on a plane to Europe and travelling for the next 24 hours or so, I'll try and catch up once I'm settled in Germany (for the rest of the week).

Cheers,

Jeremy.
Comment 27 Jeremy Allison 2015-05-19 10:07:16 UTC
OK, I know what this is (thanks for fixing my crappy code btw :-). It's in the lookup of user SID -> passwd->pw_uid and group SID -> passwd->pw_gid.

Let me look and see what can be done here.
Comment 28 Jeremy Allison 2016-09-27 00:13:15 UTC
Created attachment 12513 [details]
git-am test patch for master

Uri, Volker had a clever idea here @ the plugfest. We have an existing wbc call:

+               params.level = WBC_AUTH_USER_LEVEL_PAC;
+               params.password.pac.data = pac_blob->data;
+               params.password.pac.length = pac_blob->length;
+
+               become_root();
+               wbc_err = wbcAuthenticateUserEx(&params, &info, &err);
+               unbecome_root();

that can pass the PAC to winbind (and thus have it call netsamlogon_cache_store() on smbd's behalf.

Don't think this will fix the LookupSid call, but should be a cleaner way of implementing the needed functionality.
Comment 29 Uri Simchoni 2016-09-27 11:56:59 UTC
According to my testing, we have prior to any patch:

One LookupNames4 call
One LDAP
A bunch of LookupSids3 - one per SID in the PAC.

After the recent patch, the LDAP is eliminated, and the rest remains. Looking at the history of this bug suggests that Jeremy's patch also eliminated the LookupNames4.

It's possible that recent advances in LookupSids squashed the multiple calls into one (I seem to remember there was such a patch by Volker, or something similar) - I can test against master later if needed.

One possible big win whether or not the RPC is eliminated or not, is that eliminating LDAP possibly removes the need to contact another domain - see bug 11759. I will also test later whether this one is fixed.

Thanks,
Uri
Comment 30 Jeremy Allison 2016-09-27 16:41:44 UTC
So checking against master would be good. One question I have is does winbindd continue to contact the DC on subsequent logins from the same user ?

I can understand this lookup being done on the first login to store the SID -> name mappings in the winbindd cache, but once that first login is done subsequent logins should not need any traffic.

I'm going to try and look into that here up at the Microsoft interop event.

Is that the case ?
Comment 31 Uri Simchoni 2016-09-27 18:15:22 UTC
(In reply to Jeremy Allison from comment #30)
Still in 4.3.x, a second connection from same user does not create any AD traffic (I believe nothing is new here since the netsamlogon and winbindd caches were populated).
Comment 32 Jeremy Allison 2016-09-27 18:42:59 UTC
OK - so long as subsequent logons with the same user don't cause a DC lookup, I think the patch in:

https://bugzilla.samba.org/attachment.cgi?id=12513

is correct for master/4.5.x (and possibly before). The nice thing is that the WBC_AUTH_USER_LEVEL_PAC is already tested in source4/torture/winbind/winbind.c, so we can be sure this won't regress.

I believe this patch obsoletes Volker's patch in:

https://bugzilla.samba.org/show_bug.cgi?id=11759

https://bugzilla.samba.org/attachment.cgi?id=11882

as that patch ensures the netsamlogon_cache.tdb is updated before the user lookup, but doesn't add fix the problem of the missing cache_name2sid() call. The proposed patch here - because it sends the PAC to winbindd, does ensure the cache_name2sid() is called from inside winbindd as part of the WBC_AUTH_USER_LEVEL_PAC codepath.

I think this might be the best we can do so far. Inside the PAC we have the sid list in the token, but we don't have the sid -> name mappings - so don't we have to ask the DC at least the first time we see these sids ?

Can you +1 this and I'll get it into master (and create the back-ports for all supported versions).
Comment 33 Jeremy Allison 2016-09-27 19:07:43 UTC
Arghh. Dammit, we're not doing the required post-processing here (just went through it with Guenther).

Updated patch to follow :-(.
Comment 34 Uri Simchoni 2016-09-27 20:23:58 UTC
(In reply to Uri Simchoni from comment #29)
OK, made a test whether patch prevents connection to foreign domain. It doesn't. It seems like although the patch prevents the ldap query about the user, and although the remaining RPC queries are made to our domain (even if the names and SIDs are foreign), we still bring the foreign domain online "for good measure" (to obtain the USN), and I suppose (though haven't verified) that a failure there would fail the logon, which is a :(.
Comment 35 Uri Simchoni 2016-09-27 20:54:32 UTC
(In reply to Uri Simchoni from comment #34)
Correction - all the extra churn I was talking about in comment #34, which brings the foreign domain online, does not fail the whole operation if it fails - it's part of "refresh_sequence_number()". So even if we fail to refresh the sequence number, we go on and the actual operations that we do only contact our domain.

Therefore I believe that the new patch allows smbd to accept connections from users of foreign domains, where the member server cannot retrieve information from the foreign domain (due to permission issues), and that's its main value (it kind of fixes bug 11759). Haven't encountered such cases in the field but Volker mentioned them.
Comment 36 Jeremy Allison 2016-09-28 00:38:21 UTC
Created attachment 12514 [details]
git-am patch for master

OK Uri, here is what we tested at the Microsoft plugfest.

If you can confirm it works in your env I'll push and back-port. Sorry for all the back-and forths - this one does "the right thing (tm)" :-).

Jeremy.
Comment 37 Jeremy Allison 2016-09-28 00:39:15 UTC
Comment on attachment 12514 [details]
git-am patch for master

Christof, if you could review this it would help also !

Thanks,

Jeremy.
Comment 38 Jeremy Allison 2016-09-28 00:41:14 UTC
Created attachment 12515 [details]
git-am patch for master

Sigh - remembered --stdout this time :-).
Comment 39 Jeremy Allison 2016-09-28 02:26:31 UTC
Created attachment 12516 [details]
git-am patch for master

Update after comment from metze :-). Now don't treat winbindd not running as a fatal error when priming the caches.
Comment 40 Guenther Deschner 2016-09-28 02:41:23 UTC
Comment on attachment 12516 [details]
git-am patch for master

LGTM, I'll push that to master if people don't mind
Comment 41 Uri Simchoni 2016-09-28 05:34:36 UTC
(In reply to Guenther Deschner from comment #40)
Please consider that the cache priming causes an ldap query every 5 minutes from the main winbindd process - to check the sequence number.

From my testing, it does eliminate the name->sid lookup.
Comment 42 Jeremy Allison 2016-10-07 22:44:42 UTC
Created attachment 12553 [details]
git-am fix for 4.5.next, 4.4.next.

Here's what went into master cherry-picked and bug number added.
Comment 43 Jeremy Allison 2016-10-13 00:18:12 UTC
Ping Uri - do you have time to look at this ?
Comment 44 Uri Simchoni 2016-10-13 17:04:31 UTC
(In reply to Jeremy Allison from comment #43)
Oops, must slipped my attention with the holidays here. Will get right on to it.
Comment 45 Uri Simchoni 2016-10-13 20:34:52 UTC
Comment on attachment 12553 [details]
git-am fix for 4.5.next, 4.4.next.

I do have some questions, but they are only about improving this further, so I'll discuss them on the list.
Comment 46 Jeremy Allison 2016-10-18 00:10:14 UTC
Created attachment 12586 [details]
git-am fix for 4.5.next, 4.4.next.

Updated patch that adds the latest change from master.
Comment 47 Uri Simchoni 2016-10-18 04:02:20 UTC
Assigning to Karolin for inclusion in 4.5.next and 4.4.next.

As to whether or not to close the bug after applying the patches - we still have the lsa_LookupSids3 packets, which verify that the SIDs listed in the PAC are groups. I just verified that Windows makes no such lookups (at least in the case I tested, a user not previously known to the machine connects to a share). Therefore at the very least this requires some discussion.

Thanks,
Uri.
Comment 48 Jeremy Allison 2016-10-18 17:04:21 UTC
We should be able to short-circuit that lookup for anything that came from the PAC by pre-populating the cache. Let me look into that.
Comment 49 Karolin Seeger 2016-10-19 06:44:22 UTC
Pushed to autobuild-v4-{5,4}-test.
Comment 50 Karolin Seeger 2016-10-25 07:39:15 UTC
(In reply to Karolin Seeger from comment #49)
Pushed to both branches.
Closing out bug report.

Thanks!