Bug 10358 - 100% CPU utilization in winbindd when trying to free memory in winbindd_reinit_after_fork
Summary: 100% CPU utilization in winbindd when trying to free memory in winbindd_reini...
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.6
Classification: Unclassified
Component: Winbind (show other bugs)
Version: unspecified
Hardware: All All
: P5 normal
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-04 20:21 UTC by Richard Sharpe
Modified: 2017-05-16 14:20 UTC (History)
1 user (show)

See Also:


Attachments
git-am fix for master. (6.09 KB, patch)
2014-01-06 23:43 UTC, Jeremy Allison
no flags Details
Updated fix for master. (6.97 KB, patch)
2014-01-07 18:32 UTC, Jeremy Allison
rsharpe: review+
Details
git-am fix for 4.1.next (7.31 KB, patch)
2014-01-09 19:51 UTC, Jeremy Allison
rsharpe: review+
Details
git-am fix for 4.0.next (7.41 KB, patch)
2014-01-09 19:52 UTC, Jeremy Allison
rsharpe: review+
Details
The patch for Samba 3.6.12+ (5.12 KB, patch)
2014-01-10 17:50 UTC, Richard Sharpe
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Sharpe 2014-01-04 20:21:03 UTC
I ran into the same problem as discussed here:

http://samba.2283325.n4.nabble.com/Winbind-using-100-CPU-td4646572.html

In my case the problem seems to be that the 'customer' has two domains with the same DNS name:

XCHANGE xchange.some.dom (the currently active domain), and

EXCHANGE xchange.some.dom (seems to be inactive.)

For some reason, in this customer's case, they have a domain called
EXCHANGE and one called XCHANGE, but both seem to have the same DNS
name (xchange.some.dom). One of them seems permanently offline as
well, but that does not matter here.

When we get the list of trusted domains, some times, we already have
one of them, EXCHANGE, and we receive an entry for XCHANGE (I think it
happens in that order.) We search for the domain in
rescan_forest_trusts, but the search routine doesn't find it. However,
add_trusted_domain does find the existing one because it also compares
the alt_name (dns_name passed in) and returns the other entry. We then
call setup_domain_child on that domain, which calls setup_child.

In setup_child we do:

        child->sock = -1;
        child->domain = domain;

which then causes us to call fork_domain_child in
wb_child_request_trigger and bang, we insert the same entry again and
corrupt the list.

I am going to prevent the call to setup_domain_child if the name
passed in does not match the name we found to see if I can prevent
this crash.
Comment 1 Richard Sharpe 2014-01-04 20:23:08 UTC
This patch seems to fix the problem:

samba-3.6.12/source3/winbindd/winbindd_util.c
@@ -498,12 +500,19 @@
                                                        dom_list[i].dns_name,
                                                        &cache_methods,
                                                        &dom_list[i].sid );
-                               if (d != NULL) {
+                               /* Only add it if it has the correct name */
+                               if (d != NULL && (dom_list[i].domain_name[0] &&
+                                               !strcmp(dom_list[i].domain_name,
+                                                       d->name))) {
                                        setup_domain_child(d);
                                }
                        }

-                       if (d == NULL) {
+                       if (d == NULL || (dom_list[i].domain_name[0] &&
+                                       strcmp(dom_list[i].domain_name,
+                                       d->name))) {
+                               DEBUG(0, ("skipping duplicate/bad domain %s\n",
+                                       d->name));
                                continue;

I would have modified add_trusted_domain to do the correct thing, but was not sure if that would cause other problems, so the above, more ugly patch was used.

It seems to work well, and has been running on the affected site for more than an hour without problems, whereas before we ran into the problem within about 6 minutes.
Comment 2 Richard Sharpe 2014-01-04 21:00:35 UTC
Still waiting on info from the customer on how to repro their setup.
Comment 3 Jeremy Allison 2014-01-06 23:43:43 UTC
Created attachment 9555 [details]
git-am fix for master.

Ok, I chatted with Richard and the underlying problem
is a disconnect with setup_domain_child() being called on
domain structs that have already been added into the list,
instead of only being called when a struct is allocated
and added into the list.

The solution to that is to move the call to setup_domain_child()
inside of the add_trusted_domain() function, so it is
only called when a new domain struct is added.

Attached is a small patchset that does this, and
also keeps the logic identical by also moving the
setting of 'domain->primary = true' inside
add_trusted_domain() to ensure the current
ordering of variable changes is the same.

That could also be done by adding a 'bool primary'
parameter to the add_trusted_domain() parameter
list, but I know how much Volker hates new arbitrary
bool parameters, so I though it was cleaner to
do it this way :-).

I'm pretty sure this is the correct way to
fix the logic, but getting a reproducer (or
a site that can reproduce and test the patches)
will be needed.
Comment 4 Jeremy Allison 2014-01-07 18:32:53 UTC
Created attachment 9557 [details]
Updated fix for master.

Updated with comment additions requested by Richard on samba-technical.
Comment 5 Richard Sharpe 2014-01-08 17:59:05 UTC
Comment on attachment 9557 [details]
Updated fix for master.

The patch (slightly modified for Samba 3.6.12) has survived in the customer environment without hitting the problem it previously was.
Comment 6 Richard Sharpe 2014-01-08 18:04:43 UTC
(In reply to comment #5)
> Comment on attachment 9557 [details]
> Updated fix for master.
> 
> The patch (slightly modified for Samba 3.6.12) has survived in the customer
> environment without hitting the problem it previously was.

And what I failed to say was that it has survived for over 12 hours now when it would hit within 6 minutes before the patch.
Comment 7 Jeremy Allison 2014-01-09 19:51:22 UTC
Created attachment 9564 [details]
git-am fix for 4.1.next

Fix cherry-picked from what went into master.
Comment 8 Jeremy Allison 2014-01-09 19:52:24 UTC
Created attachment 9565 [details]
git-am fix for 4.0.next

Fix cherry-picked and back-ported from what went into master.

Richard, please +1 these two and I'll re-assign to Karolin to get them into 4.1.next and 4.0.next.

Jeremy.
Comment 9 Richard Sharpe 2014-01-09 20:40:32 UTC
This seems to work on Samba 3.6.x

samba-3.6.12/source3/winbindd/winbindd_util.c 
@@ -89,7 +89,10 @@
 }


-/* Add a trusted domain to our list of domains */
+/* Add a trusted domain to our list of domains.
+   If the domain already exists in the list,
+   return it and don't re-initialize.  */
+
 static struct winbindd_domain *add_trusted_domain(const char *domain_name, const char *alt_name,
                                                  struct winbindd_methods *methods,
                                                  const struct dom_sid *sid)
@@ -99,6 +102,7 @@
        char *idmap_config_option;
        const char *param;
        const char **ignored_domains, **dom;
+       int role = lp_server_role();

        ignored_domains = lp_parm_string_list(-1, "winbind", "ignore domains", NULL);
        for (dom=ignored_domains; dom && *dom; dom++) {
@@ -146,7 +150,10 @@

        if (domain != NULL) {
                /*
-                * We found a match. Possibly update the SID
+                * We found a match on domain->name or
+                * domain->alt_name. Possibly update the SID
+                * if the stored SID was the NULL SID
+                * and return the matching entry.
                 */
                if ((sid != NULL)
                    && dom_sid_equal(&domain->sid, &global_sid_NULL)) {
@@ -192,6 +199,15 @@
                sid_copy(&domain->sid, sid);
        }

+       /* Is this our primary domain ? */
+       if (strequal(domain_name, get_global_sam_name()) &&
+                       (role != ROLE_DOMAIN_MEMBER)) {
+               domain->primary = true;
+       } else if (strequal(domain_name, lp_workgroup()) &&
+                       (role == ROLE_DOMAIN_MEMBER)) {
+               domain->primary = true;
+       }
+
        /* Link to domain list */
        DLIST_ADD_END(_domain_list, domain, struct winbindd_domain *);

@@ -228,6 +244,8 @@

 done:

+       setup_domain_child(domain);
+
        DEBUG(2,("Added domain %s %s %s\n",
                 domain->name, domain->alt_name,
                 &domain->sid?sid_string_dbg(&domain->sid):""));
@@ -341,18 +359,10 @@
                   necessary.  This is important because we need the
                   SID for sibling domains */

-               if ( find_domain_from_name_noinit(p) != NULL ) {
-                       domain = add_trusted_domain(p, alternate_name,
-                                                   &cache_methods,
-                                                   &sid);
-               } else {
-                       domain = add_trusted_domain(p, alternate_name,
-                                                   &cache_methods,
-                                                   &sid);
-                       if (domain) {
-                               setup_domain_child(domain);
-                       }
-               }
+               (void)add_trusted_domain(p, alternate_name,
+                                           &cache_methods,
+                                           &sid);
+
                p=q;
                if (p != NULL)
                        p += 1;
@@ -413,6 +423,12 @@
                   the domain_list() as our primary domain may not
                   have been initialized. */

+               DEBUG(10, ("Entry[%02d]: name: %s, DNS: %s, SID: %s,"
+                       " trust_flags: %x, trust_type: %u\n",
+                       i, dom_list[i].domain_name, dom_list[i].dns_name,
+                       &dom_list[i].sid?sid_string_dbg(&dom_list[i].sid):"",
+                       dom_list[i].trust_flags, dom_list[i].trust_type));
+
                if ( !(dom_list[i].trust_flags & NETR_TRUST_FLAG_TREEROOT) ) {
                        continue;
                }
@@ -422,13 +438,10 @@
                d = find_domain_from_name_noinit( dom_list[i].domain_name );

                if ( !d ) {
-                       d = add_trusted_domain( dom_list[i].domain_name,
+                       (void)add_trusted_domain( dom_list[i].domain_name,
                                                dom_list[i].dns_name,
                                                &cache_methods,
                                                &dom_list[i].sid );
-                       if (d != NULL) {
-                               setup_domain_child(d);
-                       }
                }

                if (d == NULL) {
@@ -479,6 +492,12 @@
                uint32 type    = dom_list[i].trust_type;
                uint32 attribs = dom_list[i].trust_attribs;

+               DEBUG(10, ("Entry[%02d]: name: %s, DNS: %s, SID: %s,"
+                       " trust_flags: %x, trust_type: %u\n",
+                       i, dom_list[i].domain_name, dom_list[i].dns_name,
+                       &dom_list[i].sid?sid_string_dbg(&dom_list[i].sid):"",
+                       dom_list[i].trust_flags, dom_list[i].trust_type));
+
                d = find_domain_from_name_noinit( dom_list[i].domain_name );

                /* ignore our primary and internal domains */
@@ -494,13 +513,10 @@
                           about it */

                        if ( !d ) {
-                               d = add_trusted_domain( dom_list[i].domain_name,
+                               (void)add_trusted_domain( dom_list[i].domain_name,
                                                        dom_list[i].dns_name,
                                                        &cache_methods,
                                                        &dom_list[i].sid );
-                               if (d != NULL) {
-                                       setup_domain_child(d);
-                               }
                        }

                        if (d == NULL) {
@@ -601,7 +617,6 @@
 /* Look up global info for the winbind daemon */
 bool init_domain_list(void)
 {
-       struct winbindd_domain *domain;
        int role = lp_server_role();

        /* Free existing list */
@@ -609,26 +624,18 @@

        /* BUILTIN domain */

-       domain = add_trusted_domain("BUILTIN", NULL, &cache_methods,
+       (void)add_trusted_domain("BUILTIN", NULL, &cache_methods,
                                    &global_sid_Builtin);
-       if (domain) {
-               setup_domain_child(domain);
-       }

        /* Local SAM */

-       domain = add_trusted_domain(get_global_sam_name(), NULL,
+       (void)add_trusted_domain(get_global_sam_name(), NULL,
                                    &cache_methods, get_global_sam_sid());
-       if (domain) {
-               if ( role != ROLE_DOMAIN_MEMBER ) {
-                       domain->primary = True;
-               }
-               setup_domain_child(domain);
-       }

        /* Add ourselves as the first entry. */

        if ( role == ROLE_DOMAIN_MEMBER ) {
+               struct winbindd_domain *domain;
                struct dom_sid our_sid;

                if (!secrets_fetch_domain_sid(lp_workgroup(), &our_sid)) {
@@ -639,9 +646,6 @@
                domain = add_trusted_domain( lp_workgroup(), lp_realm(),
                                             &cache_methods, &our_sid);
                if (domain) {
-                       domain->primary = True;
-                       setup_domain_child(domain);
-
                        /* Even in the parent winbindd we'll need to
                           talk to the DC, so try and see if we can
                           contact it. Theoretically this isn't neccessary
Comment 10 Jeremy Allison 2014-01-09 22:15:06 UTC
(In reply to comment #9)
> This seems to work on Samba 3.6.x

I'm not sure if we're doing more 3.6.x releases, but if so you should probably
add this as an attachment, not as inline text :-).

Thanks !

Jeremy.
Comment 11 Jeremy Allison 2014-01-09 22:15:27 UTC
Re-assigning to Karolin for inclusion in 4.1.next, 4.0.next.

Jeremy.
Comment 12 Karolin Seeger 2014-01-10 08:11:14 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > This seems to work on Samba 3.6.x
> 
> I'm not sure if we're doing more 3.6.x releases, but if so you should probably
> add this as an attachment, not as inline text :-).
> 
> Thanks !
> 
> Jeremy.

Samba 3.6 has been turned into the security fixes only mode on  	2013-11-29. So there won't be further bugfix releases.
Please see https://wiki.samba.org/index.php/Samba_Release_Planning for details.
Comment 13 Karolin Seeger 2014-01-10 08:50:41 UTC
(In reply to comment #11)
> Re-assigning to Karolin for inclusion in 4.1.next, 4.0.next.
> 
> Jeremy.

Pushed to autobuild-v4-1-test and autobuild-v4-0-test.
Comment 14 Andreas Schneider 2014-01-10 09:41:22 UTC
Richard, could you please add it as an attachment? :)
Comment 15 Richard Sharpe 2014-01-10 17:50:54 UTC
Created attachment 9567 [details]
The patch for Samba 3.6.12+

This should work for any version of Samba 3.6.x and maybe even for 3.5.x with some changes.

You will have to manually apply it, and you might want to remove the extra debug statements there ...
Comment 16 Karolin Seeger 2014-01-13 09:13:09 UTC
Pushed to v4-1-test and v4-0-test.
Closing out bug report.

Thanks!