1866 – ntlm_auth doesn't use "winbind cache time", secondary DC very slow

Bug 1866 - ntlm_auth doesn't use "winbind cache time", secondary DC very slow

Summary: ntlm_auth doesn't use "winbind cache time", secondary DC very slow

Status:	CLOSED FIXED

Alias:	None

Product:	Samba 3.0
Classification:	Unclassified
Component:	ntlm_auth tool (show other bugs)
Version:	3.0.7
Hardware:	All Linux

Importance:	P3 normal
Target Milestone:	none
Assignee:	Andrew Bartlett
QA Contact:	Samba QA Contact

URL:
Keywords:

Depends on:
Blocks:

Reported:	2004-10-02 00:24 UTC by Joe Cooper
Modified:	2005-08-24 10:17 UTC (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Joe Cooper 2004-10-02 00:24:22 UTC

When using the ntlm_auth authenticator, I'm seeing very slow responses when the
first DC listed in smb.conf is unavailable, even when cache time is high enough
to eliminate most of the average wait for wbinfo requests.

It's hard to quantify exactly what I'm seeing, as some requests are fast, but
enough requests are slow that IE appears to hang for about 20-25 seconds on just
about every page load (because one or more objects takes 20-25 seconds to load,
and IE often does not display while an object is loading).  Perhaps ntlm_auth is
using some other cache time, like the default 15 seconds of winbind.

Anyway, I would expect winbind cache time to apply to all winbindd clients,
including ntlm_auth.  Perhaps I am mistaken in this assumption, but I would very
much like to be able to use a backup AD server in my Squid deployments with
ntlm_auth, and such high response times make it unfeasible.  I've been unable to
find any documentation on setting up multiple backing AD servers, so perhaps I'm
making some configuration mistak.

wbinfo is also slow to respond to each request the first time, but then respects
the cache time, and for several minutes after the first request is as fast as
when querying the primary.  Perhaps quicker failover, and a negative cache to
keep up with when the first server is down would be a better solution to this
problem.

Some details:

AD servers are 2k3 and 2k.
Samba is 3.0.7, built from Fedora Core 1 SRPM, with winbind options enabled. 
This version seems to have reduced the response time with the primary down from
more than 2 minutes in the prior attempted version 3.0.2.
Squid is 2.5, and I'm using the squid-2.5-ntlmssp mode of ntlm_auth.

Comment 1 Andrew Bartlett 2004-10-02 02:30:17 UTC

It is not possible to cache authentication requests, due to the
challenge-response nature of the protocol.

It sounds like the complaint here is that winbind is slow to respond to a DC
that times out (rather than returning an error).

Comment 2 Joe Cooper 2004-10-02 10:55:07 UTC

Yes, if we can't cache then it does seem that the real issue is long hang time
when a DC is not responding.  So the solution would then be to cache the status
of a failed DC, rather than trying it over and over on seemingly every request,
I suppose?

Comment 3 Gerald (Jerry) Carter (dead mail address) 2005-02-09 08:43:08 UTC

several winbind fixes post 3.0.7.  Please retest (more still 
to come in >=3.0.12).

Comment 4 Gerald (Jerry) Carter (dead mail address) 2005-08-24 10:17:01 UTC

sorry for the same, cleaning up the database to prevent unecessary reopens of bugs.