From Volker's patch: When winbind is restarted, there is a potential crash in tdb. Following situation: We are in a cluster with ctdb. A winbind child hangs in a request to the DC. Cluster monitoring decides the node has a problem. Cluster monitoring decides to kill ctdbd. winbind child still hangs in a RPC request. winbind parent figures that ctdb is dead and immediately commits suicide. winbind parent is restarted by cluster management, overwriting gencache.tdb with CLEAR_IF_FIRST. The CLEAR_IF_FIRST logic as implemented now will not see that a child still has the tdb open, only the parent holds the ACTIVE_LOCK due to performance reasons. During the CLEAR_IF_FIRST logic is done, there is a very small window where we ftruncate(tfd, 0) the file and re-write a proper header without a lock. When during this small window the winbind child comes back, wanting to store something into gencache.tdb, that winbind child will crash with a SIGBUS.
Created attachment 8009 [details] git-am fix for 4.0.0rc3. Same fix that went into master. Jeremy.
Created attachment 8010 [details] git-am for 3.6.next. Back-port from master. Jeremy.
Comment on attachment 8009 [details] git-am fix for 4.0.0rc3. Are you sure you want the last patch in the patchset under this bug report?
Comment on attachment 8009 [details] git-am fix for 4.0.0rc3. Oh I missed that - will resubmit.
Created attachment 8011 [details] Correct fix for 4.0.0rc3. Now without extraneous patch :-).
Pushed to v3-6-test and autobuild-v4-0-test. Closing out bug report. Thanks!