Bug 11175 - Lots of winbindd zombie processes on Solaris platform
Summary: Lots of winbindd zombie processes on Solaris platform
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Winbind (show other bugs)
Version: 4.2.0rc4
Hardware: All Solaris
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-03-20 03:38 UTC by YOUZHONG YANG
Modified: 2015-06-08 11:19 UTC (History)
4 users (show)

See Also:


Attachments
git-am proposed patch for master. (2.29 KB, patch)
2015-03-20 18:00 UTC, Jeremy Allison
no flags Details
*Working* git-am patch for master. (2.41 KB, patch)
2015-03-25 20:37 UTC, Jeremy Allison
no flags Details
git-am fix for 4.2.next (2.58 KB, patch)
2015-03-26 16:36 UTC, Jeremy Allison
obnox: review+
asn: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description YOUZHONG YANG 2015-03-20 03:38:03 UTC
We observed lots of winbindd zombie processes after Samba 4.2 rans for a while. 

After spending a few days chasing this issue, I found out its root cause: function tdb_runtime_check_for_robust_mutexes() in lib/tdb/common/mutex.c uses signal() function to set SIGCHLD signal handler. On Solaris, signal(SIGCHLD, ...) sets SA_RESETHAND flag internally, so this will cause SIGCHLD signal handler to be reset to default. Once SIGCHLD handler is reset to default, any child process forked by the main winbindd daemon will be left as zombie when it exits.

The following C code can be used to show that signal() indeed sets SA_RESETHAND flag.

#include <stdio.h>
#include <string.h>
#include <signal.h>

static void (*old_handler)(int) = SIG_ERR;

static void sig_chld_handler(int sig)
{
        printf("Caught signal %d\n", sig);
}

void (*CatchSignal(int signum,void (*handler)(int )))(int)
{
        struct sigaction act;
        struct sigaction oldact;

        memset((char*)&act, 0, sizeof(act));

        act.sa_handler = handler;
#ifdef SA_RESTART
        if(signum != SIGALRM)
                act.sa_flags = SA_RESTART;
#endif
        sigemptyset(&act.sa_mask);
        sigaddset(&act.sa_mask,signum);
        sigaction(signum,&act,&oldact);
        return oldact.sa_handler;
}

void print_sa_flags(int sa_flags)
{
        if(sa_flags & SA_NOCLDSTOP)
                printf("NOCLDSTOP ");
        if(sa_flags & SA_ONSTACK)
                printf("ONSTACK ");
        if(sa_flags & SA_RESETHAND)
                printf("RESETHAND ");
        if(sa_flags & SA_RESTART)
                printf("RESTART ");
        if(sa_flags & SA_SIGINFO)
                printf("SIGINFO ");
        if(sa_flags & SA_NODEFER)
                printf("NODEFER ");
        if(sa_flags & SA_NOCLDWAIT)
                printf("NOCLDWAIT ");
        printf("\n");
}

void print_signal(int signum)
{
        struct sigaction oldact;

        memset((char*)&oldact, 0, sizeof(oldact));
        sigaction(signum, NULL, &oldact);
        printf("Signal # %d handler %p flags 0x%X\n", signum, oldact.sa_handler, oldact.sa_flags);
        print_sa_flags(oldact.sa_flags);
}

int main(int argc, char **argv)
{
        print_signal(SIGCHLD);
        printf("Set SIGCLD handler using sigaction().\n");
        old_handler = CatchSignal(SIGCHLD, sig_chld_handler);
        printf("Old handler = %p\n", old_handler);
        print_signal(SIGCHLD);

        printf("Now set SIGCLD handler using signal() function.\n");
        old_handler = signal(SIGCHLD, sig_chld_handler);
        printf("Old handler = %p\n", old_handler);
        print_signal(SIGCHLD);

        return 0;
}
Comment 1 YOUZHONG YANG 2015-03-20 12:31:19 UTC
stack traces:

  3  21297                  setsigact:entry
              libc.so.1`__sigaction+0xa
              libc.so.1`signal+0x71
              libtdb.so.1.3.4`tdb_runtime_check_for_robust_mutexes+0x197
              libtdb-wrap-samba4.so`tdb_wrap_open+0x120
              libsmbconf.so.0`gencache_init+0x317
              libsmbconf.so.0`gencache_parse+0x64
              libsmbconf.so.0`idmap_cache_find_uid2sid+0x91
              winbindd`wb_uid2sid_send+0x71
              winbindd`winbindd_uid_to_sid_send+0xd9
              winbindd`process_request+0x1d8
              winbindd`winbind_client_request_read+0x192
              libtevent.so.0.9.22`_tevent_req_notify_callback+0x6a
              libtevent.so.0.9.22`tevent_req_finish+0x78
              libtevent.so.0.9.22`_tevent_req_done+0x25
              winbindd`wb_req_read_done+0x122
              libtevent.so.0.9.22`_tevent_req_notify_callback+0x6a
              libtevent.so.0.9.22`tevent_req_finish+0x78
              libtevent.so.0.9.22`_tevent_req_done+0x25
              libsmb-transport-samba4.so`read_packet_handler+0x21d
              libtevent.so.0.9.22`epoll_event_loop+0x3a5

  3  21297                  setsigact:entry
              libc.so.1`__sigaction+0xa
              libc.so.1`signal+0x71
              libtdb.so.1.3.4`tdb_runtime_check_for_robust_mutexes+0x39d
              libtdb-wrap-samba4.so`tdb_wrap_open+0x120
              libsmbconf.so.0`gencache_init+0x317
              libsmbconf.so.0`gencache_parse+0x64
              libsmbconf.so.0`idmap_cache_find_uid2sid+0x91
              winbindd`wb_uid2sid_send+0x71
              winbindd`winbindd_uid_to_sid_send+0xd9
              winbindd`process_request+0x1d8
              winbindd`winbind_client_request_read+0x192
              libtevent.so.0.9.22`_tevent_req_notify_callback+0x6a
              libtevent.so.0.9.22`tevent_req_finish+0x78
              libtevent.so.0.9.22`_tevent_req_done+0x25
              winbindd`wb_req_read_done+0x122
              libtevent.so.0.9.22`_tevent_req_notify_callback+0x6a
              libtevent.so.0.9.22`tevent_req_finish+0x78
              libtevent.so.0.9.22`_tevent_req_done+0x25
              libsmb-transport-samba4.so`read_packet_handler+0x21d
              libtevent.so.0.9.22`epoll_event_loop+0x3a5
Comment 2 Jeremy Allison 2015-03-20 18:00:47 UTC
Created attachment 10899 [details]
git-am proposed patch for master.

Can you test this fix and see if it solves the problem ?

Thanks,

Jeremy.
Comment 3 YOUZHONG YANG 2015-03-22 02:15:27 UTC
(In reply to Jeremy Allison from comment #2)

Yes, the patch works. Thanks!
Comment 4 Jeremy Allison 2015-03-25 20:37:07 UTC
Created attachment 10910 [details]
*Working* git-am patch for master.

Sorry for the problem. Here is a working patch.
Comment 5 Jeremy Allison 2015-03-25 21:01:19 UTC
Comment on attachment 10910 [details]
*Working* git-am patch for master.

The problem was if a handler hadn't been installed already,
then oldact.sa_handler == NULL (#define SIG_DFL ((__sighandler_t)0))
which was returned and confused with the #else clause of
#ifdef HAVE_SIGACTION (which returned NULL as guaranteed
failure).

So we thought we should have working mutexes because
tdb_mutex_locking_supported() would return true, but
tdb_runtime_check_for_robust_mutexes() would always
return false :-(.

New code returns a bool, and is given a pointer to
fill with the returned handler.
Comment 6 Jeremy Allison 2015-03-26 16:36:49 UTC
Created attachment 10914 [details]
git-am fix for 4.2.next

Cherry-pick of patch that went into master.
Comment 7 Andreas Schneider 2015-03-31 15:10:48 UTC
Comment on attachment 10914 [details]
git-am fix for 4.2.next

LGTM
Comment 8 Andreas Schneider 2015-03-31 15:11:34 UTC
Karolin, please add the patch to the next 4.2 release. Thanks!
Comment 9 Karolin Seeger 2015-04-08 19:13:18 UTC
(In reply to Andreas Schneider from comment #8)
Pushed to autobuild-v4-2-test.
Comment 10 Karolin Seeger 2015-04-09 19:21:04 UTC
(In reply to Karolin Seeger from comment #9)
Pushed to v4-2-test.
Closing out bug report.

Thanks!