When samba is a domain member, and a client authenticates via kerberos, the client supplies (through the PAC embedded in the AP-REQ ticket) information that should be sufficient for starting the session, without a need to contact a domain controller for more information on the client. This is not generally true, but true for configurations where the id-mapping does not depend on information from the DC (e.g. rid, autorid). Another caveat is that the the PAC might include "extra-sids" and since smbd doesn't know whether they are group sids or user sids, it has to query the DC. However, it appears that samba (smbd+winbindd) does contact the DC: 1. There's at least one ldap query to get the user's details by its SID. 2. A new connection is opened to make this query, and this involves full Kerberos authentication (although it should be feasible to keep a connection open or at least have cached service tickets). All-in-all, 40+ packets are exchanged. 3. There is also an LSARPC lsa_LookupSids3, on an already-open connection. The significance of this is in setups where network connectivity issues, long round-trip delays, or misconfigured site information, causes the action of opening a new ldap connection to take a long time or sometimes fail completely. By comparison, a Windows file server does not need to contact the domain, and hence the network issues do not affect it. see also discussion in samba-technical - https://lists.samba.org/archive/samba-technical/2015-May/107199.html Notes about the scenario recorded in the packet captures and logs: 1. The daemons are started, device is joined to domain QA (there are some transactions for group mapping, finding trusted domains, etc.) 2. User "QA\uris" connects to "share1" - starting at packet 6237. He then logs off. 3. User "QA\uris2" connects to "share1" - starting at packet 21233
Created attachment 11025 [details] Logs (UTC+3 timezone)
Created attachment 11026 [details] packet traces samba server is 192.168.91.23 DC is 192.168.90.201 client is 192.168.91.134
Created attachment 11027 [details] smb.conf
Created attachment 11028 [details] Keytab for decrypting ldap messages
Oh. You have a very unusual configuration in your smb.conf. You have: passdb backend = smbpasswd You should remove this. Do you have any local users ? The smbpasswd backend is not capable of storing full user data. This must be removed (or set to the default of tdbsam) and any local users migrated to the proper backend database (using pdbedit). This shouldn't make a difference (I'm still looking into your logs) but it's a bad sign. Jeremy.
Ok, I think I see the problem here... Are you able to test something for me once I've coded it up ?
(In reply to Jeremy Allison from comment #6) Sure NP
Thanks, I should have something for you to test within a few days.
Created attachment 11043 [details] git-am fix for master. OK, this is a test patch for master only that adds the ability for smbd to save the name -> SID translation into the winbindd cache for a domain, if a successful logon has a PAC associated. Can you let me know if this fixes the problem for you ? If so, I'll create a back-port for 4.2.next. Cheers, Jeremy.
(In reply to Jeremy Allison from comment #9) That got rid of the ldap query. I still get some traffic for the session setup though - on an existing SMB2 connection to the DC. I'm attaching new captures and logs. For the first session setup I get: lsa_LookupNames4 (packet 7842) 3 x lsa_LookupSids3 (packets 7854,7859,7867) For second session setup I get: lsa_LookupNames4 (packet 14593) 1 x lsa_LookupSids3 (packet 14602) Thanks! Uri
Created attachment 11044 [details] Packet traces after proposed fix
Created attachment 11045 [details] Logs after proposed fix (UTC+3 timezone)
Hmmm. Patch isn't working for some reason. There's no: + + DEBUG(3, ("winbindd_cache_name_to_sid %s\n", request->data.username)); + output in the winbindd log. Let me investigate and post an updated patch with more debug so I can follow the logic flow along.
Ah - I see the problem. It's not calling the new 'cache name -> sid code'. The logging on user has account flags in the PAC of : acct_flags : 0x00000210 (528) 0: ACB_DISABLED 0: ACB_HOMDIRREQ 0: ACB_PWNOTREQ 0: ACB_TEMPDUP 1: ACB_NORMAL 0: ACB_MNS 0: ACB_DOMTRUST 0: ACB_WSTRUST 0: ACB_SVRTRUST 1: ACB_PWNOEXP and my code checks: if (info3->base.acct_flags == ACB_NORMAL) { Give me a few minutes to upload a modified patchset...
Created attachment 11047 [details] git-am fix for master. OK, here is an updated version with the : if (info3->base.acct_flags == ACB_NORMAL) { changed to: if (info3->base.acct_flags & ACB_NORMAL) { and a little more debug. Sorry for the problem, can you retest ? Thanks, Jeremy.
Created attachment 11049 [details] logs after 2nd proposed patch (UTC+3)
Created attachment 11050 [details] packet trace after 2nd proposed fix
I see the CACHE_NAME_TO_SID now in the logs but the external behavior hasn't changed - I still see the lsa_LookupNames4 and lsa_LookupSids3.
I messed up the patch somehow, sorry. In the logs we have: smbd: Caching domain QA name QA\uris to sid S-1-5-21-3738800907-3116475622-1073917613-121441 returned WBC_ERR_DOMAIN_NOT_FOUND winbindd: process_request: Handling async request 862:CACHE_NAME_TO_SID [2015/05/13 00:10:40.480830, 3, pid=828, effective(0, 0), real(0, 0), class=winbind] ../source3/winbindd/winbindd_cache_name_to_sid.c:63(winbindd_cache_name_to_ winbindd_cache_name_to_sid ^A [2015/05/13 00:10:40.481132, 5, pid=828, effective(0, 0), real(0, 0), class=winbind] ../source3/winbindd/winbindd_cache_name_to_sid.c:83(winbindd_cache_name_to_ domain name missmatch: QA [2015/05/13 00:10:40.481454, 10, pid=828, effective(0, 0), real(0, 0), class=winbind] ../source3/winbindd/winbindd.c:790(wb_request_done) wb_request_done[862:CACHE_NAME_TO_SID]: NT_STATUS_INVALID_PARAMETER [2015/05/13 00:10:40.483870, 10, pid=828, effective(0, 0), real(0, 0), class=winbind] ../source3/winbindd/winbindd.c:851(winbind_client_response_written) winbind_client_response_written[862:CACHE_NAME_TO_SID]: delivered response to client I'll investigate, maybe with better debug...
Created attachment 11053 [details] git-am fix for master. I've found and fixed the bug. I'm an utter and complete moron, and cut-and-pasted code for the client-side of the RPC that put the domain and username in the wrong place :-(. The changes were in wbcCtxCacheNameToSid(), which now uses: + strncpy(request.domain_name, domain, + sizeof(request.domain_name)-1); + + strncpy(request.data.username, name, + sizeof(request.data.username)-1); to put the data in the right place. The attached version might have a better chance of working. Really sorry to waste your time on that version :-(.
Created attachment 11059 [details] fix to IPC part of Jeremy's patch
Created attachment 11060 [details] packet traces with latest patch - patch4.patch
Created attachment 11061 [details] keytab for latest (patch4-s) capture
Created attachment 11062 [details] logs for patch4 (UTC+3)
Tried the latest patch, still failed to cache name->sid. I had a look and hopefully fixed that - see attached patch4. This indeed eliminated the lookupnames4, but the lookupsids remain - see latest logs and packet traces. Thanks! Uri.
Thanks for the IPC fix. OK, this is getting closer. We've eliminated one more DC request, only one more case to go. I'm on a plane to Europe and travelling for the next 24 hours or so, I'll try and catch up once I'm settled in Germany (for the rest of the week). Cheers, Jeremy.
OK, I know what this is (thanks for fixing my crappy code btw :-). It's in the lookup of user SID -> passwd->pw_uid and group SID -> passwd->pw_gid. Let me look and see what can be done here.
Created attachment 12513 [details] git-am test patch for master Uri, Volker had a clever idea here @ the plugfest. We have an existing wbc call: + params.level = WBC_AUTH_USER_LEVEL_PAC; + params.password.pac.data = pac_blob->data; + params.password.pac.length = pac_blob->length; + + become_root(); + wbc_err = wbcAuthenticateUserEx(¶ms, &info, &err); + unbecome_root(); that can pass the PAC to winbind (and thus have it call netsamlogon_cache_store() on smbd's behalf. Don't think this will fix the LookupSid call, but should be a cleaner way of implementing the needed functionality.
According to my testing, we have prior to any patch: One LookupNames4 call One LDAP A bunch of LookupSids3 - one per SID in the PAC. After the recent patch, the LDAP is eliminated, and the rest remains. Looking at the history of this bug suggests that Jeremy's patch also eliminated the LookupNames4. It's possible that recent advances in LookupSids squashed the multiple calls into one (I seem to remember there was such a patch by Volker, or something similar) - I can test against master later if needed. One possible big win whether or not the RPC is eliminated or not, is that eliminating LDAP possibly removes the need to contact another domain - see bug 11759. I will also test later whether this one is fixed. Thanks, Uri
So checking against master would be good. One question I have is does winbindd continue to contact the DC on subsequent logins from the same user ? I can understand this lookup being done on the first login to store the SID -> name mappings in the winbindd cache, but once that first login is done subsequent logins should not need any traffic. I'm going to try and look into that here up at the Microsoft interop event. Is that the case ?
(In reply to Jeremy Allison from comment #30) Still in 4.3.x, a second connection from same user does not create any AD traffic (I believe nothing is new here since the netsamlogon and winbindd caches were populated).
OK - so long as subsequent logons with the same user don't cause a DC lookup, I think the patch in: https://bugzilla.samba.org/attachment.cgi?id=12513 is correct for master/4.5.x (and possibly before). The nice thing is that the WBC_AUTH_USER_LEVEL_PAC is already tested in source4/torture/winbind/winbind.c, so we can be sure this won't regress. I believe this patch obsoletes Volker's patch in: https://bugzilla.samba.org/show_bug.cgi?id=11759 https://bugzilla.samba.org/attachment.cgi?id=11882 as that patch ensures the netsamlogon_cache.tdb is updated before the user lookup, but doesn't add fix the problem of the missing cache_name2sid() call. The proposed patch here - because it sends the PAC to winbindd, does ensure the cache_name2sid() is called from inside winbindd as part of the WBC_AUTH_USER_LEVEL_PAC codepath. I think this might be the best we can do so far. Inside the PAC we have the sid list in the token, but we don't have the sid -> name mappings - so don't we have to ask the DC at least the first time we see these sids ? Can you +1 this and I'll get it into master (and create the back-ports for all supported versions).
Arghh. Dammit, we're not doing the required post-processing here (just went through it with Guenther). Updated patch to follow :-(.
(In reply to Uri Simchoni from comment #29) OK, made a test whether patch prevents connection to foreign domain. It doesn't. It seems like although the patch prevents the ldap query about the user, and although the remaining RPC queries are made to our domain (even if the names and SIDs are foreign), we still bring the foreign domain online "for good measure" (to obtain the USN), and I suppose (though haven't verified) that a failure there would fail the logon, which is a :(.
(In reply to Uri Simchoni from comment #34) Correction - all the extra churn I was talking about in comment #34, which brings the foreign domain online, does not fail the whole operation if it fails - it's part of "refresh_sequence_number()". So even if we fail to refresh the sequence number, we go on and the actual operations that we do only contact our domain. Therefore I believe that the new patch allows smbd to accept connections from users of foreign domains, where the member server cannot retrieve information from the foreign domain (due to permission issues), and that's its main value (it kind of fixes bug 11759). Haven't encountered such cases in the field but Volker mentioned them.
Created attachment 12514 [details] git-am patch for master OK Uri, here is what we tested at the Microsoft plugfest. If you can confirm it works in your env I'll push and back-port. Sorry for all the back-and forths - this one does "the right thing (tm)" :-). Jeremy.
Comment on attachment 12514 [details] git-am patch for master Christof, if you could review this it would help also ! Thanks, Jeremy.
Created attachment 12515 [details] git-am patch for master Sigh - remembered --stdout this time :-).
Created attachment 12516 [details] git-am patch for master Update after comment from metze :-). Now don't treat winbindd not running as a fatal error when priming the caches.
Comment on attachment 12516 [details] git-am patch for master LGTM, I'll push that to master if people don't mind
(In reply to Guenther Deschner from comment #40) Please consider that the cache priming causes an ldap query every 5 minutes from the main winbindd process - to check the sequence number. From my testing, it does eliminate the name->sid lookup.
Created attachment 12553 [details] git-am fix for 4.5.next, 4.4.next. Here's what went into master cherry-picked and bug number added.
Ping Uri - do you have time to look at this ?
(In reply to Jeremy Allison from comment #43) Oops, must slipped my attention with the holidays here. Will get right on to it.
Comment on attachment 12553 [details] git-am fix for 4.5.next, 4.4.next. I do have some questions, but they are only about improving this further, so I'll discuss them on the list.
Created attachment 12586 [details] git-am fix for 4.5.next, 4.4.next. Updated patch that adds the latest change from master.
Assigning to Karolin for inclusion in 4.5.next and 4.4.next. As to whether or not to close the bug after applying the patches - we still have the lsa_LookupSids3 packets, which verify that the SIDs listed in the PAC are groups. I just verified that Windows makes no such lookups (at least in the case I tested, a user not previously known to the machine connects to a share). Therefore at the very least this requires some discussion. Thanks, Uri.
We should be able to short-circuit that lookup for anything that came from the PAC by pre-populating the cache. Let me look into that.
Pushed to autobuild-v4-{5,4}-test.
(In reply to Karolin Seeger from comment #49) Pushed to both branches. Closing out bug report. Thanks!