When using the ntlm_auth authenticator, I'm seeing very slow responses when the first DC listed in smb.conf is unavailable, even when cache time is high enough to eliminate most of the average wait for wbinfo requests. It's hard to quantify exactly what I'm seeing, as some requests are fast, but enough requests are slow that IE appears to hang for about 20-25 seconds on just about every page load (because one or more objects takes 20-25 seconds to load, and IE often does not display while an object is loading). Perhaps ntlm_auth is using some other cache time, like the default 15 seconds of winbind. Anyway, I would expect winbind cache time to apply to all winbindd clients, including ntlm_auth. Perhaps I am mistaken in this assumption, but I would very much like to be able to use a backup AD server in my Squid deployments with ntlm_auth, and such high response times make it unfeasible. I've been unable to find any documentation on setting up multiple backing AD servers, so perhaps I'm making some configuration mistak. wbinfo is also slow to respond to each request the first time, but then respects the cache time, and for several minutes after the first request is as fast as when querying the primary. Perhaps quicker failover, and a negative cache to keep up with when the first server is down would be a better solution to this problem. Some details: AD servers are 2k3 and 2k. Samba is 3.0.7, built from Fedora Core 1 SRPM, with winbind options enabled. This version seems to have reduced the response time with the primary down from more than 2 minutes in the prior attempted version 3.0.2. Squid is 2.5, and I'm using the squid-2.5-ntlmssp mode of ntlm_auth.
It is not possible to cache authentication requests, due to the challenge-response nature of the protocol. It sounds like the complaint here is that winbind is slow to respond to a DC that times out (rather than returning an error).
Yes, if we can't cache then it does seem that the real issue is long hang time when a DC is not responding. So the solution would then be to cache the status of a failed DC, rather than trying it over and over on seemingly every request, I suppose?
several winbind fixes post 3.0.7. Please retest (more still to come in >=3.0.12).
sorry for the same, cleaning up the database to prevent unecessary reopens of bugs.