Bug 8409 - f7d97868 created a new race condition
Summary: f7d97868 created a new race condition
Alias: None
Product: Samba 3.6
Classification: Unclassified
Component: Winbind (show other bugs)
Version: 3.6.0
Hardware: All All
: P5 normal
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
: 8667 (view as bug list)
Depends on:
Blocks: 8399
  Show dependency treegraph
Reported: 2011-08-26 15:33 UTC by Volker Lendecke
Modified: 2011-12-16 17:57 UTC (History)
1 user (show)

See Also:

Patch (2.27 KB, patch)
2011-08-26 15:33 UTC, Volker Lendecke
jra: review+

Note You need to log in before you can comment on or make changes to this bug.
Description Volker Lendecke 2011-08-26 15:33:00 UTC
Created attachment 6813 [details]

Commit message on its way to autobuild:

Fix a winbind race leading to 100% CPU

This fixes a race condition that leads to the winbindd_children list becoming
corrupted. It happens when on a busy winbind SIGCHLD is a bit late.

Imagine a winbind with multiple requests in the queue for a single child. Child
dies, and before the SIGCHLD handler is called we find the socket to be dead.
wb_child_request_done is called, receiving an error from wb_simple_trans_recv.
It closes the socket. Then immediately the wb_child_request_trigger will do
another fork_domain_child before the signal handler is called. This means that
we do another fork_domain_child, we have child->sock==-1 at this point.
fork_domain_child will do a DLIST_ADD(winbindd_children, child) a second time
where the child is already part of that list. This corrupts the list. Then the
signal handler kicks in, spinning in

for (child = winbindd_children; child != NULL; child = child->next) {

forever. Not good. This patch makes sure that both conditions (sock==-1 and not
part of the list) for a winbindd_child struct match up.

This is not in 3.5
Comment 1 Jeremy Allison 2011-08-26 21:16:49 UTC
Comment on attachment 6813 [details]

Ok - went through this very carefully - this essential for 3.6.1. Thanks Volker !

Comment 2 Christian Ambach 2011-08-27 09:22:56 UTC
should we add 
e0e3d21 vl@samba.org s3: Use sys_write in fork_domain_child
964e809 vl@samba.org s3: Use sys_read in fork_domain_child

as well?
Comment 3 Volker Lendecke 2011-08-28 09:08:48 UTC
Yes, please. Christian, ack?

Comment 4 Christian Ambach 2011-08-28 12:30:37 UTC
yes, to be on on a safer side in regards to signals
Comment 5 Karolin Seeger 2011-08-30 19:06:04 UTC
Pushed these three patches to v3-6-test.
Closing out bug report.

Comment 6 Herb Lewis 2011-12-16 17:57:41 UTC
*** Bug 8667 has been marked as a duplicate of this bug. ***