Bug 7207 - flush_cache must not touch uninitialized domains.
Summary: flush_cache must not touch uninitialized domains.
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.4
Classification: Unclassified
Component: Winbind (show other bugs)
Version: 3.4.3
Hardware: Other Linux
: P3 normal
Target Milestone: ---
Assignee: Jim McDonough
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-03-03 23:14 UTC by Bo Yang
Modified: 2010-08-18 14:37 UTC (History)
3 users (show)

See Also:


Attachments
v3-4-test (3.34 KB, patch)
2010-03-03 23:15 UTC, Bo Yang
jmcd: review+
Details
v3-5-test (3.34 KB, patch)
2010-03-03 23:15 UTC, Bo Yang
jmcd: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bo Yang 2010-03-03 23:14:45 UTC
winbindd must not flush cache for uninitialized domains. It causes a lot of traffic unnecessary, as each domain tries to contact primary domain to do initialization.
Comment 1 Bo Yang 2010-03-03 23:15:20 UTC
Created attachment 5454 [details]
v3-4-test
Comment 2 Bo Yang 2010-03-03 23:15:46 UTC
Created attachment 5455 [details]
v3-5-test
Comment 3 Volker Lendecke 2010-03-03 23:31:11 UTC
Two questions: Why does winbind get SIGHUP so often that it is a problem? Second: Can we change the cache flush so that we only do one traverse instead of one per domain?
Comment 4 Bo Yang 2010-03-04 00:54:09 UTC
(In reply to comment #3)
> Two questions: Why does winbind get SIGHUP so often that it is a problem?

heave traffic between primary domain and winbindd might causes unresponsive domain controller. It appeared when there were dozens of domains, init_dc_connection() connects to primary domain for each uninitialized domain to get trusted domain information.

> Second: Can we change the cache flush so that we only do one traverse instead
> of one per domain?

I am not sure about this. Maybe it is doable. Delete all "UL/"/"GL/" records in one traverse for each process. Then return when one domain successfully flushes the cache.
> 

Comment 5 Volker Lendecke 2010-03-04 02:43:22 UTC
(In reply to comment #4)

> heave traffic between primary domain and winbindd might causes unresponsive
> domain controller. It appeared when there were dozens of domains,
> init_dc_connection() connects to primary domain for each uninitialized domain
> to get trusted domain information.

Sure, but why is this more than a one-time thing? How can I reproduce this?

> I am not sure about this. Maybe it is doable. Delete all "UL/"/"GL/" records in
> one traverse for each process. Then return when one domain successfully flushes
> the cache.

I'm asking because usually there are FAR more entries in winbindd_cache.tdb than in the domain list. We might prepare an array of domains to be referenced in the traverse function so that it needs to do its work only once. Haven't looked at it more closely, it just struck me that we traverse a potentially very large database more than once.

Volker
Comment 6 Volker Lendecke 2010-03-04 06:49:17 UTC
Jim, you did ack the patches. Do you have any opinion to my questions?

Thanks,

Volker
Comment 7 Jim McDonough 2010-03-04 07:10:54 UTC
>> heave traffic between primary domain and winbindd might causes unresponsive
>> domain controller. It appeared when there were dozens of domains,
>> init_dc_connection() connects to primary domain for each uninitialized domain
>> to get trusted domain information.
>
>Sure, but why is this more than a one-time thing? How can I reproduce this?
A one-time thing happening on a large scale on every dhcp client renewal.  It's our scripts, but every time a laptop goes home and back in, changes networks, etc...not uncommon in a large customer.
Comment 8 Jim McDonough 2010-03-04 07:12:14 UTC
I'll answer the other question later today, meetings atm...
Comment 9 Bo Yang 2010-03-04 07:23:26 UTC
(In reply to comment #5)
> (In reply to comment #4)
> 
> > heave traffic between primary domain and winbindd might causes unresponsive
> > domain controller. It appeared when there were dozens of domains,
> > init_dc_connection() connects to primary domain for each uninitialized domain
> > to get trusted domain information.
> 
> Sure, but why is this more than a one-time thing? How can I reproduce this?

As Jim explained. :-)

> 
> > I am not sure about this. Maybe it is doable. Delete all "UL/"/"GL/" records in
> > one traverse for each process. Then return when one domain successfully flushes
> > the cache.
> 
> I'm asking because usually there are FAR more entries in winbindd_cache.tdb
> than in the domain list. We might prepare an array of domains to be referenced
> in the traverse function so that it needs to do its work only once. Haven't
> looked at it more closely, it just struck me that we traverse a potentially
> very large database more than once.

I'll look at it. It is feasible.

> 
> Volker
> 

Comment 10 Jim McDonough 2010-03-04 19:35:10 UTC
> I'm asking because usually there are FAR more entries in winbindd_cache.tdb
> than in the domain list. We might prepare an array of domains to be referenced
> in the traverse function so that it needs to do its work only once. Haven't
> looked at it more closely, it just struck me that we traverse a potentially
> very large database more than once.
Actually, upon closer looking, I think we can just do it once anyway.  The traverse fn is currently just looking for UL\ and GL\ only.  Nothing specific to any domain...
Comment 11 Michael Adam 2010-03-31 16:46:42 UTC
When you want reviewed patches to be applied to the release branches,
you should assign the correponding bugs to Karolin. Otherwise she won't notice.

But what is the status of this bug? There are two ACKed patches, which
have also gone to master, but the discussion does not seem to have
come to a final conclusion.

Should we reassign to Karolin for inclusion?
Or does this need more work?
Comment 12 Michael Adam 2010-03-31 18:29:38 UTC
Jim, I am reassigning the bug to you for a decision what to do with the patches regarding 3.4 and 3.5.

Thanks - Michael
Comment 13 Jim McDonough 2010-08-18 14:37:44 UTC
As it seems only to be hit by our customers, I'm happy to leave it to the newer releases and not disrupt existing ones.