Scenario: using winbindd to manage a machine account against a Windows AD server.
In busy AD deployments, it can take over 20 seconds for an AD server to respond to the NetrServerPasswordSet2 netlogon message for changing the machine account password.
In winbindd, NT_STATUS_IO_TIMEOUT fires after exactly 10 seconds.
This ends up with an inconsistent state: winbindd does not update its local password, but the AD server has updated its copy. So future preauthentication attempts fail.
The simplest solution to this issue is to have a longer timeout (30 seconds?).
However, there is also a more complex and potentially more robust solution. Windows clients do not have this issue against the same AD server, despite the long NetrServerPasswordSet2 response. One reason for this is (I believe) that they update the local machine account password immediately, rather than waiting for the AD server's response. If the new password fails on the next preauthentication attempt, it reverts to using the previous password.
Given that winbindd stores both the current and past password, could it behave in this way instead? i.e. update its local password on sending the request, and try both when it is next required (falling back to the previous password if necessary).
Windows also uses the old password for an hour or so,
I think winbindd should do the same. This way it works
without having to care about replication latency.
In other places we use a timeout of 35 senconds,
I think we should also use that here.
This is fixed with commit b9a15f1bfad30a824f9ec87bc9f7c65adf50dae0
in 4.0 and master.
A similar problem exists during net ads join. The timeout there also needs to be increased. I already have a patch for this, not sure if it already landed in master.
See 9755541ed156d71df98607375ee3b925266c3c74 for the net ads join piece.
Created attachment 9205 [details]
Patches for v3-6-test
Pushed to v3-6-test.
Closing out bug report.