Created attachment 6813 [details] Patch Commit message on its way to autobuild: Fix a winbind race leading to 100% CPU This fixes a race condition that leads to the winbindd_children list becoming corrupted. It happens when on a busy winbind SIGCHLD is a bit late. Imagine a winbind with multiple requests in the queue for a single child. Child dies, and before the SIGCHLD handler is called we find the socket to be dead. wb_child_request_done is called, receiving an error from wb_simple_trans_recv. It closes the socket. Then immediately the wb_child_request_trigger will do another fork_domain_child before the signal handler is called. This means that we do another fork_domain_child, we have child->sock==-1 at this point. fork_domain_child will do a DLIST_ADD(winbindd_children, child) a second time where the child is already part of that list. This corrupts the list. Then the signal handler kicks in, spinning in for (child = winbindd_children; child != NULL; child = child->next) { forever. Not good. This patch makes sure that both conditions (sock==-1 and not part of the list) for a winbindd_child struct match up. This is not in 3.5
Comment on attachment 6813 [details] Patch Ok - went through this very carefully - this essential for 3.6.1. Thanks Volker ! Jeremy.
should we add e0e3d21 vl@samba.org s3: Use sys_write in fork_domain_child 964e809 vl@samba.org s3: Use sys_read in fork_domain_child as well?
Yes, please. Christian, ack? Volker
yes, to be on on a safer side in regards to signals
Pushed these three patches to v3-6-test. Closing out bug report. Thanks!
*** Bug 8667 has been marked as a duplicate of this bug. ***