I ran into the same problem as discussed here: http://samba.2283325.n4.nabble.com/Winbind-using-100-CPU-td4646572.html In my case the problem seems to be that the 'customer' has two domains with the same DNS name: XCHANGE xchange.some.dom (the currently active domain), and EXCHANGE xchange.some.dom (seems to be inactive.) For some reason, in this customer's case, they have a domain called EXCHANGE and one called XCHANGE, but both seem to have the same DNS name (xchange.some.dom). One of them seems permanently offline as well, but that does not matter here. When we get the list of trusted domains, some times, we already have one of them, EXCHANGE, and we receive an entry for XCHANGE (I think it happens in that order.) We search for the domain in rescan_forest_trusts, but the search routine doesn't find it. However, add_trusted_domain does find the existing one because it also compares the alt_name (dns_name passed in) and returns the other entry. We then call setup_domain_child on that domain, which calls setup_child. In setup_child we do: child->sock = -1; child->domain = domain; which then causes us to call fork_domain_child in wb_child_request_trigger and bang, we insert the same entry again and corrupt the list. I am going to prevent the call to setup_domain_child if the name passed in does not match the name we found to see if I can prevent this crash.
This patch seems to fix the problem: samba-3.6.12/source3/winbindd/winbindd_util.c @@ -498,12 +500,19 @@ dom_list[i].dns_name, &cache_methods, &dom_list[i].sid ); - if (d != NULL) { + /* Only add it if it has the correct name */ + if (d != NULL && (dom_list[i].domain_name[0] && + !strcmp(dom_list[i].domain_name, + d->name))) { setup_domain_child(d); } } - if (d == NULL) { + if (d == NULL || (dom_list[i].domain_name[0] && + strcmp(dom_list[i].domain_name, + d->name))) { + DEBUG(0, ("skipping duplicate/bad domain %s\n", + d->name)); continue; I would have modified add_trusted_domain to do the correct thing, but was not sure if that would cause other problems, so the above, more ugly patch was used. It seems to work well, and has been running on the affected site for more than an hour without problems, whereas before we ran into the problem within about 6 minutes.
Still waiting on info from the customer on how to repro their setup.
Created attachment 9555 [details] git-am fix for master. Ok, I chatted with Richard and the underlying problem is a disconnect with setup_domain_child() being called on domain structs that have already been added into the list, instead of only being called when a struct is allocated and added into the list. The solution to that is to move the call to setup_domain_child() inside of the add_trusted_domain() function, so it is only called when a new domain struct is added. Attached is a small patchset that does this, and also keeps the logic identical by also moving the setting of 'domain->primary = true' inside add_trusted_domain() to ensure the current ordering of variable changes is the same. That could also be done by adding a 'bool primary' parameter to the add_trusted_domain() parameter list, but I know how much Volker hates new arbitrary bool parameters, so I though it was cleaner to do it this way :-). I'm pretty sure this is the correct way to fix the logic, but getting a reproducer (or a site that can reproduce and test the patches) will be needed.
Created attachment 9557 [details] Updated fix for master. Updated with comment additions requested by Richard on samba-technical.
Comment on attachment 9557 [details] Updated fix for master. The patch (slightly modified for Samba 3.6.12) has survived in the customer environment without hitting the problem it previously was.
(In reply to comment #5) > Comment on attachment 9557 [details] > Updated fix for master. > > The patch (slightly modified for Samba 3.6.12) has survived in the customer > environment without hitting the problem it previously was. And what I failed to say was that it has survived for over 12 hours now when it would hit within 6 minutes before the patch.
Created attachment 9564 [details] git-am fix for 4.1.next Fix cherry-picked from what went into master.
Created attachment 9565 [details] git-am fix for 4.0.next Fix cherry-picked and back-ported from what went into master. Richard, please +1 these two and I'll re-assign to Karolin to get them into 4.1.next and 4.0.next. Jeremy.
This seems to work on Samba 3.6.x samba-3.6.12/source3/winbindd/winbindd_util.c @@ -89,7 +89,10 @@ } -/* Add a trusted domain to our list of domains */ +/* Add a trusted domain to our list of domains. + If the domain already exists in the list, + return it and don't re-initialize. */ + static struct winbindd_domain *add_trusted_domain(const char *domain_name, const char *alt_name, struct winbindd_methods *methods, const struct dom_sid *sid) @@ -99,6 +102,7 @@ char *idmap_config_option; const char *param; const char **ignored_domains, **dom; + int role = lp_server_role(); ignored_domains = lp_parm_string_list(-1, "winbind", "ignore domains", NULL); for (dom=ignored_domains; dom && *dom; dom++) { @@ -146,7 +150,10 @@ if (domain != NULL) { /* - * We found a match. Possibly update the SID + * We found a match on domain->name or + * domain->alt_name. Possibly update the SID + * if the stored SID was the NULL SID + * and return the matching entry. */ if ((sid != NULL) && dom_sid_equal(&domain->sid, &global_sid_NULL)) { @@ -192,6 +199,15 @@ sid_copy(&domain->sid, sid); } + /* Is this our primary domain ? */ + if (strequal(domain_name, get_global_sam_name()) && + (role != ROLE_DOMAIN_MEMBER)) { + domain->primary = true; + } else if (strequal(domain_name, lp_workgroup()) && + (role == ROLE_DOMAIN_MEMBER)) { + domain->primary = true; + } + /* Link to domain list */ DLIST_ADD_END(_domain_list, domain, struct winbindd_domain *); @@ -228,6 +244,8 @@ done: + setup_domain_child(domain); + DEBUG(2,("Added domain %s %s %s\n", domain->name, domain->alt_name, &domain->sid?sid_string_dbg(&domain->sid):"")); @@ -341,18 +359,10 @@ necessary. This is important because we need the SID for sibling domains */ - if ( find_domain_from_name_noinit(p) != NULL ) { - domain = add_trusted_domain(p, alternate_name, - &cache_methods, - &sid); - } else { - domain = add_trusted_domain(p, alternate_name, - &cache_methods, - &sid); - if (domain) { - setup_domain_child(domain); - } - } + (void)add_trusted_domain(p, alternate_name, + &cache_methods, + &sid); + p=q; if (p != NULL) p += 1; @@ -413,6 +423,12 @@ the domain_list() as our primary domain may not have been initialized. */ + DEBUG(10, ("Entry[%02d]: name: %s, DNS: %s, SID: %s," + " trust_flags: %x, trust_type: %u\n", + i, dom_list[i].domain_name, dom_list[i].dns_name, + &dom_list[i].sid?sid_string_dbg(&dom_list[i].sid):"", + dom_list[i].trust_flags, dom_list[i].trust_type)); + if ( !(dom_list[i].trust_flags & NETR_TRUST_FLAG_TREEROOT) ) { continue; } @@ -422,13 +438,10 @@ d = find_domain_from_name_noinit( dom_list[i].domain_name ); if ( !d ) { - d = add_trusted_domain( dom_list[i].domain_name, + (void)add_trusted_domain( dom_list[i].domain_name, dom_list[i].dns_name, &cache_methods, &dom_list[i].sid ); - if (d != NULL) { - setup_domain_child(d); - } } if (d == NULL) { @@ -479,6 +492,12 @@ uint32 type = dom_list[i].trust_type; uint32 attribs = dom_list[i].trust_attribs; + DEBUG(10, ("Entry[%02d]: name: %s, DNS: %s, SID: %s," + " trust_flags: %x, trust_type: %u\n", + i, dom_list[i].domain_name, dom_list[i].dns_name, + &dom_list[i].sid?sid_string_dbg(&dom_list[i].sid):"", + dom_list[i].trust_flags, dom_list[i].trust_type)); + d = find_domain_from_name_noinit( dom_list[i].domain_name ); /* ignore our primary and internal domains */ @@ -494,13 +513,10 @@ about it */ if ( !d ) { - d = add_trusted_domain( dom_list[i].domain_name, + (void)add_trusted_domain( dom_list[i].domain_name, dom_list[i].dns_name, &cache_methods, &dom_list[i].sid ); - if (d != NULL) { - setup_domain_child(d); - } } if (d == NULL) { @@ -601,7 +617,6 @@ /* Look up global info for the winbind daemon */ bool init_domain_list(void) { - struct winbindd_domain *domain; int role = lp_server_role(); /* Free existing list */ @@ -609,26 +624,18 @@ /* BUILTIN domain */ - domain = add_trusted_domain("BUILTIN", NULL, &cache_methods, + (void)add_trusted_domain("BUILTIN", NULL, &cache_methods, &global_sid_Builtin); - if (domain) { - setup_domain_child(domain); - } /* Local SAM */ - domain = add_trusted_domain(get_global_sam_name(), NULL, + (void)add_trusted_domain(get_global_sam_name(), NULL, &cache_methods, get_global_sam_sid()); - if (domain) { - if ( role != ROLE_DOMAIN_MEMBER ) { - domain->primary = True; - } - setup_domain_child(domain); - } /* Add ourselves as the first entry. */ if ( role == ROLE_DOMAIN_MEMBER ) { + struct winbindd_domain *domain; struct dom_sid our_sid; if (!secrets_fetch_domain_sid(lp_workgroup(), &our_sid)) { @@ -639,9 +646,6 @@ domain = add_trusted_domain( lp_workgroup(), lp_realm(), &cache_methods, &our_sid); if (domain) { - domain->primary = True; - setup_domain_child(domain); - /* Even in the parent winbindd we'll need to talk to the DC, so try and see if we can contact it. Theoretically this isn't neccessary
(In reply to comment #9) > This seems to work on Samba 3.6.x I'm not sure if we're doing more 3.6.x releases, but if so you should probably add this as an attachment, not as inline text :-). Thanks ! Jeremy.
Re-assigning to Karolin for inclusion in 4.1.next, 4.0.next. Jeremy.
(In reply to comment #10) > (In reply to comment #9) > > This seems to work on Samba 3.6.x > > I'm not sure if we're doing more 3.6.x releases, but if so you should probably > add this as an attachment, not as inline text :-). > > Thanks ! > > Jeremy. Samba 3.6 has been turned into the security fixes only mode on 2013-11-29. So there won't be further bugfix releases. Please see https://wiki.samba.org/index.php/Samba_Release_Planning for details.
(In reply to comment #11) > Re-assigning to Karolin for inclusion in 4.1.next, 4.0.next. > > Jeremy. Pushed to autobuild-v4-1-test and autobuild-v4-0-test.
Richard, could you please add it as an attachment? :)
Created attachment 9567 [details] The patch for Samba 3.6.12+ This should work for any version of Samba 3.6.x and maybe even for 3.5.x with some changes. You will have to manually apply it, and you might want to remove the extra debug statements there ...
Pushed to v4-1-test and v4-0-test. Closing out bug report. Thanks!