Bug 8599 - WINBINDD_PAM_AUTH_CRAP returns invalid user session key
WINBINDD_PAM_AUTH_CRAP returns invalid user session key
Status: RESOLVED FIXED
Product: Samba 3.5
Classification: Unclassified
Component: Winbind
3.5.11
All All
: P5 regression
: ---
Assigned To: Karolin Seeger
Samba QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-11-10 21:05 UTC by Jeff Venable
Modified: 2012-03-20 20:36 UTC (History)
8 users (show)

See Also:


Attachments
Successful auth; includes both level 3 and 6 validation examples (88.88 KB, application/octet-stream)
2011-11-10 21:05 UTC, Jeff Venable
no flags Details
Failed child auth; level 3 validation, pass through parent domain (21.61 KB, text/plain)
2011-11-10 21:06 UTC, Jeff Venable
no flags Details
Configuration used (620 bytes, text/plain)
2011-11-10 21:07 UTC, Jeff Venable
no flags Details
Proposed (untested) patch for 3.6 and master (1.49 KB, patch)
2011-12-14 23:37 UTC, Andrew Bartlett
no flags Details
Proposed (untested) patch for 3.5 (1.24 KB, patch)
2011-12-14 23:39 UTC, Andrew Bartlett
no flags Details
Archive of winbind logs when problem occurs (52.20 KB, application/x-compressed-tar)
2012-02-02 06:51 UTC, Matthieu Patou
no flags Details
Archive of winbind logs when problem didn't occurs (32.09 KB, application/x-compressed-tar)
2012-02-02 07:54 UTC, Matthieu Patou
no flags Details
tcpdump of the traffic when corruption occurs (157.68 KB, application/octet-stream)
2012-02-02 07:59 UTC, Matthieu Patou
no flags Details
keytab for packet decryption (23.18 KB, application/octet-stream)
2012-02-02 07:59 UTC, Matthieu Patou
no flags Details
tcpdump of the traffic when corruption doesn't occurs (157.78 KB, application/octet-stream)
2012-02-02 08:00 UTC, Matthieu Patou
no flags Details
patch to add some debug and trigger a level 3 response on SamLogonEx (2.29 KB, patch)
2012-02-02 08:03 UTC, Matthieu Patou
no flags Details
git-am fix for 3.6.next. (1.19 KB, patch)
2012-02-15 19:41 UTC, Jeremy Allison
jra: review+
Details
Rework on Andrew's proposal (4.80 KB, patch)
2012-03-10 00:14 UTC, Matthieu Patou
no flags Details
Rework on Andrew's proposal (1.71 KB, patch)
2012-03-12 05:18 UTC, Matthieu Patou
jra: review+
ab: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jeff Venable 2011-11-10 21:05:36 UTC
Created attachment 7081 [details]
Successful auth; includes both level 3 and 6 validation examples

Possibly related to bug #6563, or that solution was incomplete.

When dealing with a Win2008R2 simple forest (parent + child domains), NTLMv2 (MSCHAPv2) works in most cases:

   * [Validation level 6]
     netr_LogonSamLogonEx: struct netr_LogonSamLogonEx
        out: struct netr_LogonSamLogonEx
            validation               : *
                validation               : union netr_Validation(case 6)
                sam6                     : *
                    sam6: struct netr_SamInfo6

     This works in all cases because schannel seals the transport and the user
     session key is unencrypted.  Here is the CORRECT session key (only changes
     when machine account password changes):

         key: struct netr_UserSessionKey
             key                      : cfc88bee670bac999f12311e06219a41

   * [Validation level 3]
     netr_LogonSamLogonEx: struct netr_LogonSamLogonEx
        out: struct netr_LogonSamLogonEx
            validation               : *
                validation               : union netr_Validation(case 3)
                sam3                     : *
                    sam3: struct netr_SamInfo3

     This works when winbindd is connected directly to the child domain.
     We see that the session key is RC4 sealed:

         key: struct netr_UserSessionKey
             key                      : 5e63ddf5c0ed9d1fcd24f11fb45a235f

     But when stored and decrypted is CORRECT:

     netsamlogon_cache_store:
     SID [S-1-5-21-298864289-1109787924-2511658638-1104]
         &r: struct netsamlogoncache_entry
            timestamp                : Thu Nov 10 20:34:54 2011 UTC
            info3: struct netr_SamInfo3
                base: struct netr_SamBaseInfo
     ...
                key: struct netr_UserSessionKey
                    key                      : cfc88bee670bac999f12311e06219a41

     Now, it's not always reproducible because I cannot always control the
     order of connection logic performed by winbindd, but if I kill winbindd
     and it restarts so that it has no connection cache state, then perform an
     operation to make it open a connection to the parent domain, and finally
     perform an authentiction for a machine account in the child domain, we
     see the following:

     netr_LogonSamLogonEx: struct netr_LogonSamLogonEx
        out: struct netr_LogonSamLogonEx
            validation               : *
                validation               : union netr_Validation(case 3)
                sam3                     : *
                    sam3: struct netr_SamInfo3
                        base: struct netr_SamBaseInfo
     ...
         key: struct netr_UserSessionKey
             key                      : cd7cc953864a906a7251ed966915d21f

     But when stored and decrypted is CORRUPTED:

     netsamlogon_cache_store:
     SID [S-1-5-21-298864289-1109787924-2511658638-1104]
         &r: struct netsamlogoncache_entry
            timestamp                : Thu Nov 10 20:34:54 2011 UTC
            info3: struct netr_SamInfo3
                base: struct netr_SamBaseInfo
     ...
                key: struct netr_UserSessionKey
                    key                      : a0b133d62901d36caf329bf5ce3572bf

I'm assuming winbindd is improperly handling session credential chaining when NTLMv2 authentication is passing through an intermediate DC?

Excised logs and configuration are attached; I can obtain more logs if required.
Comment 1 Jeff Venable 2011-11-10 21:06:38 UTC
Created attachment 7082 [details]
Failed child auth; level 3 validation, pass through parent domain
Comment 2 Jeff Venable 2011-11-10 21:07:03 UTC
Created attachment 7083 [details]
Configuration used
Comment 3 Michael Adam 2011-12-13 11:17:54 UTC
One preliminary question:
How do you trigger the use of validation level 3?

Assigning to Günther: GD, please have a look!

Cheers - Michael
Comment 4 Andrew Bartlett 2011-12-14 23:37:44 UTC
Created attachment 7183 [details]
Proposed (untested) patch for 3.6 and master

This patch ensures that we do not use the SamLogonEx call unless we can also use level 6.  That way, the whole calls should fail rather than getting the session key decryption wrong.  (the failure should then be recovered and a new connection established)
Comment 5 Andrew Bartlett 2011-12-14 23:39:15 UTC
Created attachment 7184 [details]
Proposed (untested) patch for 3.5

This patch is the same idea for 3.5, do not use SamLogonEx unless we can do level 6.  Tested in make test, but needs real world testing.
Comment 6 Stefan Metzmacher 2011-12-15 00:03:07 UTC
I think the question is more why is can_do_validation6 == false?

In what mode are you running 'winbindd'?
Could you run it in deamon mode,
with 'debug pid = yes' 'debug hires timestamp = yes'
and 'log level = 10' in smb.conf.

And then upload all log.w* files as a tar.gz?
It's also important to see what happens between success and failure.

What are the DNS/Netbios names of the domains and dc's
and what is the configuration in smb.conf?
Comment 7 Jeff Venable 2011-12-15 03:24:37 UTC
I will adjust the log settings to the requested configuration.

Winbindd is running in '--interactive' mode with '--stdout' redirected to a socket.  I don't think that would influence it will it?  Here is what it's running with:

[environment]
        LD_LIBRARY_PATH=/mnt/src/latest/out/install/runtime/jails/samba.vc0/lib
        KRB5_CONFIG=
        KRB5_KTNAME=WRFILE:/mnt/src/latest/out/install/runtime/jails/samba.vc0/etc/3.keytab
[arguments]
        /mnt/src/latest/out/install/runtime/jails/samba.vc0/sbin/winbindd
        --interactive
        --stdout
        --configfile=/mnt/src/latest/out/install/runtime/jails/samba.vc0/etc/samba/3.conf

[global]
    workgroup = NIN
    realm = NIN.ASGLAB.JUNIPER.NET
    netbios name = 0161M100HU1050
    security = ads
    client ldap sasl wrapping = sign
    winbind use default domain = yes
    name resolve order = hosts bcast
    allow trusted domains = yes
    winbind expand groups = 1
    client ntlmv2 auth = yes
    winbindd:socket dir = /mnt/src/latest/out/install/runtime/jails/samba.vc0/tmp/.3.winbindd
    private dir = /mnt/src/latest/out/install/runtime/jails/samba.vc0/var/lib/.3.winbindd
    lock directory = /mnt/src/latest/out/install/runtime/jails/samba.vc0/var/locks/.3.winbindd
    pid directory = /mnt/src/latest/out/install/runtime/jails/samba.vc0/var/locks/.3.winbindd
    kerberos method = dedicated keytab
    dedicated keytab file = /mnt/src/latest/out/install/runtime/jails/samba.vc0/etc/3.keytab
    log level = 10
    debug timestamp = no

It might take me a day or so to reproduce and upload the requested information.

Did you want me to attempt to apply and test the provided patch?
Comment 8 Stefan Metzmacher 2011-12-15 07:11:19 UTC
At least the --stdout makes the debug output unuseable to debug this problem.

Don't you want the --foreground instead of --interactive?
Comment 9 Karolin Seeger 2011-12-16 12:47:42 UTC
Making this one a blocker for 3.6.2 (proposed by Matthieu).
Comment 10 Jeff Venable 2012-01-04 01:03:02 UTC
Sorry for not following up faster, I was out on holiday.

I tried reproducing this but haven't been successful, but it was difficult to reproduce originally so I'm not convinced.  I've asked our QA organization to prioritize a reproduction.  My lack of reproduction might be related to another problem I fixed in our implementation where for machine authentication via MSCHAP, username 'host/machine.realm.net' would be translated to 'realm.net\machine$'.  Now in such cases I query winbindd for the NetBIOS short name and always make requests with 'DOMAIN\machine$'.

Matthieu Patou is on-board with us locally now.  If QA can reproduce this issue, Matthieu has indicated an interest in generating a perl script using ntlm_auth to reproduce the issue for the Samba Team to examine.  Failing that, we can update this bug again to remove the critical status.

Thanks for being patient!
Comment 11 Karolin Seeger 2012-01-30 19:52:11 UTC
Any news on this one?
We do really need to schedule 3.5.13 and this is the only blocker right now.
Comment 12 Matthieu Patou 2012-01-30 20:40:30 UTC
Karolin,

I'm currently working on this bug, I'm trying to get something reproducible first. Then I hope to get something for the analysis.
Comment 13 Matthieu Patou 2012-02-01 01:25:33 UTC
I have a kind of reproduction.

In order to reproduce I did:

* force samba to always use level 3
* ntlm_auth --domain=CHILD --password=fooBAR999 --username=mat
* copy the nt_response, challenge and lm_response in the winbind log (when running at level 10)
* ntlm_auth --domain=CHILD --username=mat --challenge=xxx --nt-response=xx --lm-response=xx --request-nt-key

At this moment the nt-key is ok

* kill winbind
* restart winbind
* wbinfo -g 
* ntlm_auth --domain=CHILD --username=mat --challenge=xxx --nt-response=xx --lm-response=xx --request-nt-key


We can see the nt-key is not ok
Comment 14 Matthieu Patou 2012-02-02 06:51:34 UTC
Created attachment 7280 [details]
Archive of winbind logs when problem occurs
Comment 15 Stefan Metzmacher 2012-02-02 07:42:52 UTC
I only see(In reply to comment #14)
> Created attachment 7280 [details]
> Archive of winbind logs when problem occurs

I only see 2 references:

bug-8599/bad$ grep netr_SamInfo *
log.wb-NIN:                      sam3: struct netr_SamInfo3
log.wb-NIN:          info3: struct netr_SamInfo3

which meams the log files are incomplete.

On IRC you said you found the root cause of the problem, what is it?
Comment 16 Matthieu Patou 2012-02-02 07:54:25 UTC
Created attachment 7281 [details]
Archive of winbind logs when problem didn't occurs
Comment 17 Matthieu Patou 2012-02-02 07:59:07 UTC
Created attachment 7282 [details]
tcpdump of the traffic when corruption occurs
Comment 18 Matthieu Patou 2012-02-02 07:59:33 UTC
Created attachment 7283 [details]
keytab for packet decryption
Comment 19 Matthieu Patou 2012-02-02 08:00:09 UTC
Created attachment 7284 [details]
tcpdump of the traffic when corruption doesn't occurs
Comment 20 Matthieu Patou 2012-02-02 08:03:17 UTC
Created attachment 7285 [details]
patch to add some debug and trigger a level 3 response on SamLogonEx
Comment 21 Stefan Metzmacher 2012-02-02 08:06:22 UTC
(In reply to comment #14)
> Created attachment 7280 [details]
> Archive of winbind logs when problem occurs

Ok, the problem that it fails with forced level 3,
is clear. Two winbindd child processes connect to the dc.

log.wb-NIN:

[2012/02/01 15:36:09.303246,  1] ../librpc/ndr/ndr.c:251(ndr_print_function_debug)
       netr_ServerAuthenticate2: struct netr_ServerAuthenticate2
          in: struct netr_ServerAuthenticate2
              server_name              : *
                  server_name              : '\\ROOT.NIN.ASGLAB.JUNIPER.NET'

log.wb-WRETCHED:

[2012/02/01 15:36:26.486571,  1] ../librpc/ndr/ndr.c:251(ndr_print_function_debug)
       netr_ServerAuthenticate2: struct netr_ServerAuthenticate2
          in: struct netr_ServerAuthenticate2
              server_name              : *
                  server_name              : '\\ROOT.NIN.ASGLAB.JUNIPER.NET'

log.wb-NIN:

[2012/02/01 15:36:29.820778,  1] ../librpc/ndr/ndr.c:251(ndr_print_function_debug)
       netr_LogonSamLogonEx: struct netr_LogonSamLogonEx
          in: struct netr_LogonSamLogonEx
              server_name              : *
                  server_name              : '\\root.nin.asglab.juniper.net'
Comment 22 Stefan Metzmacher 2012-02-02 08:08:15 UTC
(In reply to comment #16)
> Created attachment 7281 [details]
> Archive of winbind logs when problem didn't occurs

Here only one process connects to \\root.nin.asglab.juniper.net

good$ grep netr_ServerAuthenticate2 *
log.wb-NIN:       netr_ServerAuthenticate2: struct netr_ServerAuthenticate2
log.wb-NIN:          in: struct netr_ServerAuthenticate2
log.wb-NIN:       netr_ServerAuthenticate2: struct netr_ServerAuthenticate2
log.wb-NIN:          out: struct netr_ServerAuthenticate2
Comment 23 Matthieu Patou 2012-02-02 08:32:48 UTC
I made more investigation on this, I was suspecting a corruption of the session key used for encrypt/decrypt nt session key in the reply of a SamLogonEx when response level is 3.

After putting some debug it seems that the session key used both for schannel encryption and encryption of the user's nt session key is the same (as it is expected to be ...).

I'm starting to think that MS AD only keeps the latest session key for sensitive user's information encryption (when doing passthrough). In a way it seems not too stupid, if a workstation recreate a new schannel session then the DC can legitimately thinks that the previous one won't be used any more as a workstation or a server is not supposed to have more than 1 active schannel with a given DC (does it ?).

Here in our case we can have 1 connection per domain and all to one DC, I'll to make a tttrace and ask dochelp at this subject just to confirm my hypothesis.

In the mean time I think the way to avoid to go in level3 is to enforce client schannel = true so that we try always to do with level 6.

Also there is those reasons that could cause level 3 to be used instead of level 6

        } else if (auth->auth_level != DCERPC_AUTH_LEVEL_PRIVACY) {             
            domain->can_do_validation6 = false;                                 
        } else if (!(neg_flags & NETLOGON_NEG_CROSS_FOREST_TRUSTS)) {           
            domain->can_do_validation6 = false;                                 
        } else if (!(neg_flags & NETLOGON_NEG_AUTHENTICATED_RPC)) {             
            domain->can_do_validation6 = false;                                 
        }    

What about a small patch that forbids level3 if client schannel = true ?
Comment 24 Matthieu Patou 2012-02-02 08:33:23 UTC
(In reply to comment #22)
> (In reply to comment #16)
> > Created attachment 7281 [details] [details]
> > Archive of winbind logs when problem didn't occurs
> 
> Here only one process connects to \\root.nin.asglab.juniper.net
> 
> good$ grep netr_ServerAuthenticate2 *
> log.wb-NIN:       netr_ServerAuthenticate2: struct netr_ServerAuthenticate2
> log.wb-NIN:          in: struct netr_ServerAuthenticate2
> log.wb-NIN:       netr_ServerAuthenticate2: struct netr_ServerAuthenticate2
> log.wb-NIN:          out: struct netr_ServerAuthenticate2

Yeah see my comment 23.
Comment 25 Matthieu Patou 2012-02-02 08:42:15 UTC
One additional comment is that, if even if you force level 3 and use wbinfo -g to increase your chances of having a second netlogon session (from the child).

But it can happen so that the GetGroupList call won't need a GetDC call and so a second session won't be created and the session key won't be corrupted.
Comment 26 Stefan Metzmacher 2012-02-02 09:48:06 UTC
(In reply to comment #23)
> I made more investigation on this, I was suspecting a corruption of the session
> key used for encrypt/decrypt nt session key in the reply of a SamLogonEx when
> response level is 3.
> 
> After putting some debug it seems that the session key used both for schannel
> encryption and encryption of the user's nt session key is the same (as it is
> expected to be ...).
> 
> I'm starting to think that MS AD only keeps the latest session key for
> sensitive user's information encryption (when doing passthrough). In a way it
> seems not too stupid, if a workstation recreate a new schannel session then the
> DC can legitimately thinks that the previous one won't be used any more as a
> workstation or a server is not supposed to have more than 1 active schannel
> with a given DC (does it ?).
> 
> Here in our case we can have 1 connection per domain and all to one DC, I'll to
> make a tttrace and ask dochelp at this subject just to confirm my hypothesis.

You don't need to the session key is a known to be global state between
do machines. The correct fix is to maintain the state in a tdb in the
client case too (our server already does that).

All tricks with using level 6 are just to minimize the risk.
Comment 27 Matthieu Patou 2012-02-02 17:04:24 UTC
> 
> You don't need to the session key is a known to be global state between
> do machines. The correct fix is to maintain the state in a tdb in the
> client case too (our server already does that).
> 
Well for the client we need to keep in common the session key that is used to decrypt user's session key, but the session key for the schannel part must be kept separate as when we create a new schannel with MS AD we are still able to use the old "session key" for schannel encryption/decryption.
> All tricks with using level 6 are just to minimize the risk.
Yeah that's true, I didn't think  that we had the infrastructure to cope with it.
Comment 28 Matthieu Patou 2012-02-07 23:59:56 UTC
We are able to reproduce the problem even with client schannel = yes.

It means that winbindd is using the level 3 validation instead of level 6

Given the fact that this choice is driven by the domain->can_do_validation6 or
by certain flags on the connection or by the security level and the presence of
schannel.

I put some debug and the schannel is here with a negiotiated level of 6
(privacy) and the flags are 0x6000ffff. So it boils down to can_do_validation6
being set to false before this tests:

                if (auth == NULL) {                                             
                        domain->can_do_validation6 = false;                     
                } else if (auth->auth_type != DCERPC_AUTH_TYPE_SCHANNEL) {      
                        domain->can_do_validation6 = false;                     
                } else if (auth->auth_level != DCERPC_AUTH_LEVEL_PRIVACY) {     
                        domain->can_do_validation6 = false;                     
                } else if (!(neg_flags & NETLOGON_NEG_CROSS_FOREST_TRUSTS)) {   
                        domain->can_do_validation6 = false;                     
                } else if (!(neg_flags & NETLOGON_NEG_AUTHENTICATED_RPC)) {     
                        domain->can_do_validation6 = false;                     
                }          


The place where can_do_validation6 is set in set_dc_type_and_flags_connect.

There is a couple of conditions when this function can do an early exit, but it
should leave a trace in the log. So I'm wondering if in some case it didn't
skips this step.
Comment 29 Jeremy Allison 2012-02-15 19:41:12 UTC
Created attachment 7325 [details]
git-am fix for 3.6.next.

Matthieu, please review for 3.6.next.
Jeremy.
Comment 30 Matthieu Patou 2012-02-16 00:01:50 UTC
Comment on attachment 7325 [details]
git-am fix for 3.6.next.

+
Comment 31 Matthieu Patou 2012-02-16 01:08:05 UTC
So I've been able to reproduce the change from level 6 to level 3 in order to do so you need at least 2 domain in the forest.

I used a ntlm_auth to authenticate a user that is not the same domain and patched it so that is has the following winbind flag: WBFLAG_PAM_CONTACT_TRUSTDOM.

By doing so when winbind tries to contact the child winbind process related to the domain of the user (it's not the common case, if WBFLAG_PAM_CONTACT_TRUSTDOM is not specified the child related to the domain of samba will always be used it seems), for trusted domain the can_do_validation6 was is not set to it's always false so we use level3.

As said in a previous comment having a level validation doesn't always imply session corruption. In order to occur you need also to have more than 1 schannel connection active.

In practice I found that you need to have at least 3 domains in your forest and use the WBFLAG_PAM_CONTACT_TRUSTDOM and force more than 1 schannel to be established (by using certain winbind calls).

Note if you are using use_kerberos in the pam configuration the flag WBFLAG_PAM_CONTACT_TRUSTDOM is implied.

With the posted patch we have a way to avoid winbindd to use level 3 but still remains the problem with multiple schannel.
Comment 32 Jeremy Allison 2012-02-16 21:31:04 UTC
Comment on attachment 7325 [details]
git-am fix for 3.6.next.

Matthieu +1'ed the patch.
Comment 33 Jeremy Allison 2012-02-16 21:31:57 UTC
Re-assigning to Karolin for inclusion in 3.5.next.
Jeremy.
Comment 34 Andrew Bartlett 2012-02-18 06:56:00 UTC
I still think my original patch should be seriously considered.  It (essentially) provides a checksum on the session key decryption, as otherwise we have no way to know if the netlogon key in use for encryption (netlogon creds and session key) on this packet is correct.
Comment 35 Stefan Metzmacher 2012-02-18 08:55:53 UTC
Matthieu, can you try andrews patch (without your patch) and check if it's
able to detect and recover the situation, so that the caller doesn't notice the problem?

If so, then I think we should apply both patches until we fixed it correctly,
by storing the session key (and related stuff) in a tdb.
Comment 36 Matthieu Patou 2012-02-19 07:22:11 UTC
Stephan, Andrew,
Patches from andrew won't fix the problem.
Basically the patch of andrew is to fallback to NetrLogonSamLogon if can_do_validation6 is false and can_do_samlogon_ex is false.
In master at least when we use NetrLogonSamLogon we still have the test on can_do_validation6 to decide if we have to use level 3 or level 6.

In NetrLogonSamLogon level 3 (and a couple of other levels) mandates the encryption of sensitive data as well, it also mandate the use of schannel according to the spec:

"3.4.5.3.4 Calling NetrLogonSamLogon
The client MUST do the following:Have a secure channel established with a domain controller in the domain identified by domain-
name, and pass its name as the LogonServer parameter.
"
So the logic in source3's winbind is wrong as is basically set can_do_samlogon_ex to false (and so force the use of NetrLogonSamLogon) if the target DC doesn't support schannel.

First I must say that I don't think we need another patch here apart from the one to have 1 shared schannel key per DC (stored in tdb).

Then if a fix has to be made we were thinking with jeff of having can_do_validation6 as true by default because the winbind_pam.c code has a try/fail/retry construction that will allow it to try another call to NetrLogonSamLogon with a level3 if the NetrLogonSamLogonEx fails.
Given the fact that Level 6 is implemented by Windows 2000, I think it means that this will be an issue with NT domains, in this case we will try once the Logon with level6 it will fail, then we will try Logon with level3 and it should work. But with this kind of setup the corruption shouldn't occur I think as we shouldn't have more than 1 schannel connection per DC (even if the NT domain trusts some others).
Comment 37 Andrew Bartlett 2012-02-19 10:37:16 UTC
I strongly feel that we should never use a method that relies on an encrypted session key without some form of active check that the session key is correct. 

Using a TDB is still not safe - what if a request was 'in flight' when the session key was changed?  We just go from easy to reproduce bugs to difficult to reproduce bugs. 

The idea of using SamLogon is indeed to (double) encrypt the session keys.  That is expected, but because the call also has the netlogon credentials chain (the netr_Authenticator elements), and because we know that the netlogon credentials chain is tied to the effective session key, we essentially gain a checkum on the keys, and therefore some safety. 

Andrew Bartlett
Comment 38 Volker Lendecke 2012-02-19 10:43:44 UTC
(In reply to comment #37)
> I strongly feel that we should never use a method that relies on an encrypted
> session key without some form of active check that the session key is correct. 
> 
> Using a TDB is still not safe - what if a request was 'in flight' when the
> session key was changed?  We just go from easy to reproduce bugs to difficult
> to reproduce bugs. 

Isn't that what locks are for? Yes, we have to sequentialize certain operations, but that is what the protocol gives you.

Volker
Comment 39 Matthieu Patou 2012-02-20 23:45:06 UTC
(In reply to comment #37)
> I strongly feel that we should never use a method that relies on an encrypted
> session key without some form of active check that the session key is correct. 
> 
> Using a TDB is still not safe - what if a request was 'in flight' when the
> session key was changed?  We just go from easy to reproduce bugs to difficult
> to reproduce bugs. 
> 
The idea of the key in TDB is to never have more than 1 key as I understood.
So when the first schannel is requested a key is negotiated for a given DC, then all other winbindd process (and smbd ?) will use the same key.


> The idea of using SamLogon is indeed to (double) encrypt the session keys. 
> That is expected, but because the call also has the netlogon credentials chain
> (the netr_Authenticator elements), and because we know that the netlogon
> credentials chain is tied to the effective session key, we essentially gain a
> checkum on the keys, and therefore some safety. 
In theory the key should be always the same, I checked that from winbind point of view it's the case: we use the same key to do the schannel decryption and the user's session decryption. Winbind manage to decrypt the response and the content is good but the decryption of the secret doesn't yield the expected result.

I did a quick test by forcing the LogonSamLogon call with a requested validation of level 3. As the session key is not correct the verification seems to fail and winbind invalidate the session key and recreate a new one.
Basically in our use case it means that everytime we authenticate users from another domain we try and fail, we recreate a connection and then retry, it works but it's rather suboptimal.


 > 
> Andrew Bartlett
Comment 40 Andrew Bartlett 2012-02-20 23:56:23 UTC
First, there is an issue with names for things here that is quite unfortunate:

Secure Channel in the microsoft docs referrers to the agreed session key between a member server and DC, based on the netbios names involved.  It is set up with a ServerAuthenticate{,2,3}

This can then lead to an netlogon secure channel SSPI (what we call schannel) connection over DCE/RPC.  Once set up, such a connection continues to be encrypted with the same key for the life of the TCP or named pipe connection.

The user session key in the SamLogon response is decrypted with the last agreed secure channel key between the client and server.  It need not be the same as the SSPI session, if a new ServerAuthenticate call has been made. 

For SamLogon variants with authenticator elements, the netlogon credentials chain (a poor attempt at ensuring security of the RPC packets before the SSPI layer was done) has state, which along with the session key is shared between the two netbios names.  For SamLogon varients with an encrypted session key, the same session key is used. 

Finally, if we are using the wrong session key consistently, and so therefore fail the SamLogon each time (rather than just once after a ServerAuthenticate has been issued), then this is a different bug, and is is this bug that should be addressed. 

Is the issue that we have multiple winbindd processes which connect to our primary DC?

Andrew Bartlett
Comment 41 Karolin Seeger 2012-02-26 20:08:55 UTC
ping
Comment 42 Matthieu Patou 2012-02-27 00:31:44 UTC
(In reply to comment #40)

> For SamLogon variants with authenticator elements, the netlogon credentials
> chain (a poor attempt at ensuring security of the RPC packets before the SSPI
> layer was done) has state, which along with the session key is shared between
> the two netbios names.  For SamLogon varients with an encrypted session key,
> the same session key is used. 

Ok. 
 
> Finally, if we are using the wrong session key consistently, and so therefore
> fail the SamLogon each time (rather than just once after a ServerAuthenticate
> has been issued), then this is a different bug, and is is this bug that should
> be addressed. 

No we have it always once the ServerAuthenticate has been issued

> 
> Is the issue that we have multiple winbindd processes which connect to our
> primary DC?
Right, and more exactly the fact that we have more than 1 ServerAuthenticate call to the same DC.

I think that metze proposal to share the key in the TDB is the good solution for a very long term solution. I can't afford for the moment to implement this solution.

The attached patch is quite ok workaround.
Comment 43 Matthieu Patou 2012-02-27 00:41:29 UTC
(In reply to comment #41)
> ping

Karolin,

you can pick the patch of Jeremy for 3.6.x and it's applicable to 3.5 as well.
Comment 44 Karolin Seeger 2012-02-28 19:54:01 UTC
(In reply to comment #43)
> (In reply to comment #41)
> > ping
> 
> Karolin,
> 
> you can pick the patch of Jeremy for 3.6.x and it's applicable to 3.5 as well.

Pushed to v3-6-test and v3-5-test.
Can we close the bug report or is another patch/more discussion needed?

Thanks for commenting, Matthieu!

Karolin
Comment 45 Karolin Seeger 2012-03-02 19:52:17 UTC
Please clarify if I can go ahead with the release or an additional patch is needed.

Thanks!
Comment 46 Stefan Metzmacher 2012-03-02 20:33:35 UTC
Matthieu, your comments regarding Andrews patch are a bit confusing.

1. Your patch is good in order use level 6 if possible,
   that should be our first choice.

2. But if for some reason the server doesn't support level 6,
   we have to use level 3. In that case I think it's good
   to have some extra protection, even if that means
   we'll have to recreate the schannel from scratch each time an
   other winbind child does a connection, but that shouldn't happen
   to often.

So, would Andrews patch make anything worse? If not we should also
include it.
Comment 47 Karolin Seeger 2012-03-09 20:26:22 UTC
Can we ship 3.5.13 without Andrew's patch or should I delay the release?
Comment 48 Andrew Bartlett 2012-03-09 21:15:22 UTC
I still do not see why we should *not* include my patch.  If it were to cause a secure channel reset or similar, that should only happen if we are already in the dangerous state where we rely on a session key that is not certain. 

What am I missing?  Why do we expose our users to this risk?
Comment 49 Andrew Bartlett 2012-03-09 21:22:47 UTC
I should note that we are still better than we were before, so on that basis we do not need to block the release, but we in my view call this fixed without the additional protection.
Comment 50 Matthieu Patou 2012-03-10 00:14:58 UTC
Created attachment 7367 [details]
Rework on Andrew's proposal

Andrew's patch for 3.5 wasn't addressing the PAM_AUTH_CRAP path.

Adapted his patch.
Comment 51 Matthieu Patou 2012-03-10 00:21:27 UTC
Karoline,

Thanks for comming back to this, I had a look last week but got distracted.

So yes the adapted patch of Andrew really help just in case we end up misdetecting the can_do_validation6 (which should be now pretty rare but I'm sure we can found pretty weird case where this can occur).

I didn't completely checked the master/3.6 patch because I try my testcase I run into another bug.
But this shouldn't be a blocker for 3.5.13.
Comment 52 Karolin Seeger 2012-03-11 19:25:46 UTC
Ok, then I will go ahead without the patch for 3.5.13.
The new patch will be in 3.5.14 (most likely).

Thanks!
Comment 53 Alexander Bokovoy 2012-03-11 20:05:53 UTC
Comment on attachment 7367 [details]
Rework on Andrew's proposal

Attached patch seems have nothing to do with PAM winbindd handling, it looks like python code change in upgradeprovision.

Maybe you have uploaded wrong patch, Matthieu?
Comment 54 Matthieu Patou 2012-03-12 05:18:07 UTC
Created attachment 7377 [details]
Rework on Andrew's proposal
Comment 55 Matthieu Patou 2012-03-12 05:19:37 UTC
Reuploaded the correct patch my fingers has slipped on the wrong one.
Comment 56 Alexander Bokovoy 2012-03-12 07:19:25 UTC
Comment on attachment 7377 [details]
Rework on Andrew's proposal

Thanks. Checked it now and it follows the logic of master/3.6 patches.
Comment 57 Matthieu Patou 2012-03-17 07:32:52 UTC
Comment on attachment 7377 [details]
Rework on Andrew's proposal

Jeremy can you review this patch
Comment 58 Matthieu Patou 2012-03-17 07:33:49 UTC
Comment on attachment 7183 [details]
Proposed (untested) patch for 3.6 and master

Jeremy can you review this patch so that it can integrate master + 3.6.x branches ?
Comment 59 Matthieu Patou 2012-03-17 07:34:55 UTC
Karolin,

Once Jeremy has reviewed the two patches of Andrew for 3.6/master and for 3.5 I think this bug can be resolved.

Jeff can you confirm ?
Comment 60 Jeremy Allison 2012-03-19 18:58:10 UTC
Comment on attachment 7183 [details]
Proposed (untested) patch for 3.6 and master

Looks good to me for v3-6-test.
Jeremy
Comment 61 Jeremy Allison 2012-03-19 18:59:58 UTC
Comment on attachment 7377 [details]
Rework on Andrew's proposal

Looks good to me for 3.5.x.
Jeremy.
Comment 62 Karolin Seeger 2012-03-20 20:30:00 UTC
(In reply to comment #61)
> Comment on attachment 7377 [details]
> Rework on Andrew's proposal
> 
> Looks good to me for 3.5.x.
> Jeremy.

Pushed to v3-5-test.
Comment 63 Karolin Seeger 2012-03-20 20:31:37 UTC
(In reply to comment #61)
> Comment on attachment 7377 [details]
> Rework on Andrew's proposal
> 
> Looks good to me for 3.5.x.
> Jeremy.

Does not apply to current v3-6-test (28150366a95):

--- snip ---
user@host:/data/git/samba/v3-6-test> git am 0001-s3-winbindd-set-the-can_do_validation6-also-for-trus.patch
Applying: s3-winbindd: set the can_do_validation6 also for trusted domain
error: patch failed: source3/winbindd/winbindd_cm.c:1926
error: source3/winbindd/winbindd_cm.c: patch does not apply
Patch failed at 0001 s3-winbindd: set the can_do_validation6 also for trusted domain
--- snap ---
Comment 64 Karolin Seeger 2012-03-20 20:35:19 UTC
(In reply to comment #63)
> (In reply to comment #61)
> > Comment on attachment 7377 [details] [details]
> > Rework on Andrew's proposal
> > 
> > Looks good to me for 3.5.x.
> > Jeremy.
> 
> Does not apply to current v3-6-test (28150366a95):
> 
> --- snip ---
> user@host:/data/git/samba/v3-6-test> git am
> 0001-s3-winbindd-set-the-can_do_validation6-also-for-trus.patch
> Applying: s3-winbindd: set the can_do_validation6 also for trusted domain
> error: patch failed: source3/winbindd/winbindd_cm.c:1926
> error: source3/winbindd/winbindd_cm.c: patch does not apply
> Patch failed at 0001 s3-winbindd: set the can_do_validation6 also for trusted
> domain
> --- snap ---

My fault, please ignore this comment.
Comment 65 Karolin Seeger 2012-03-20 20:36:20 UTC
(In reply to comment #60)
> Comment on attachment 7183 [details]
> Proposed (untested) patch for 3.6 and master
> 
> Looks good to me for v3-6-test.
> Jeremy

Pushed to v3-6-test.
Closing out bug report.

Thanks!