A customer first noticed this on 4.6.9.
The KVNO gets changed properly on the DC and increments one, however the local keytab did not increment.
1. In smb.conf kerberos method = system keytab
2. klist -k shows KVNO 3
3. Perform net ads changetrustpw says it was successful
4. Querying ldap shows the KVNO as incremented to 4 (good)
5. klist -k still shows KVNO as 3 (bad)
If I try running net ads changetrustpw again, it increments the KVNO on the KDC to 5 AND now it increments the KVNO on the host to 4. In this state, now every time you run the changetrustpw after the first time, the KVNOs will be mismatched.
The problem is in net_ads_changetrustpw() between the call to ads_change_trust_account_password() and ads_keytab_create_default(). In ads_keytab_create_default() we call ads_get_machine_kvno() to request the kvno from the DC via ldap. In this customer's setup, the kerberos DC is *not* the same as the ldap DC. There's enough of a ldap replication delay, that the ldap DC returns the *old* kvno here, providing us with the wrong kvno when we store it in the keytab. This happens pretty consistently for this one customer.
Created attachment 14193 [details]
potential fix that calculates the kvno
Here is a potential fix which calculates the kvno based on the kvno *before* the password change.
Another potential fix would be to have the 'ldap replication sleep' smb.conf parameter apply to this situation, so that increasing the sleep would cause ads_get_kvno() to sleep briefly prior to making this call.
I think this needs a careful test, and to handle a few more cases.
It needs to handle the case where, between when the keytab was last modified and now, the server-side had a password reset. This would change the server-side kvno but not the keytab kvno, and so get things out of skew again.
(In reply to Andrew Bartlett from comment #3)
I think my current patch already handles that situation, since it queries ldap for the kvno, resets the password, then sets the local kvno to the retrieved kvno+1.
I'm not sure how we'd test this, since it's caused by a race condition on the server side.
The kvno in our keytab should be completely ignored!
We need to check all keytab entries, there's no interaction with any KDC needed.
We have another customer with the same problem.
Metze, could you please elaborate on your response? Is David's patch approach wrong?