When running "ctdb ban" on a node where "ctdb timeout" parameter is set as well, cleanup may exit because of the "ctdb timeout" when attempting ctdb database ops. smbd will then restart cleanupd (since patch for bug 11855), but cleanup will fail to reinitialize messaging with ctdb (remember: ctdb node is banned, this causes failure in ctdb_working()) and immediately exits once again. To fix this we need three things: - keep retrying to start cleanupd at intervals - queue cleanup events in the parent smbd as long as cleanupd is down - once cleanupd comes back, send him the cleanup events Have patch, need bug number.
Fix is in master and 4.5. This fix is for a special corner case (ctdb node banned -> cluster messaging down -> smbd kept running -> ctdb node unbanned -> smbd still fully functional ... imo better just stop managed services in banned state). Not back porting to older releases.