Bug 12022 - Restarting cleanupd when ctdb-messaging is down
Restarting cleanupd when ctdb-messaging is down
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services
All All
: P5 normal
: ---
Assigned To: Ralph Böhme
Samba QA Contact
Depends on:
  Show dependency treegraph
Reported: 2016-07-14 14:27 UTC by Ralph Böhme
Modified: 2016-09-05 06:58 UTC (History)
1 user (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Ralph Böhme 2016-07-14 14:27:39 UTC
When running "ctdb ban" on a node where "ctdb timeout" parameter is set as well, cleanup may exit because of the "ctdb timeout" when attempting ctdb database ops.

smbd will then restart cleanupd (since patch for bug 11855), but cleanup will fail to reinitialize messaging with ctdb (remember: ctdb node is banned, this causes failure in ctdb_working()) and immediately exits once again.

To fix this we need three things:
- keep retrying to start cleanupd at intervals
- queue cleanup events in the parent smbd as long as cleanupd is down
- once cleanupd comes back, send him the cleanup events

Have patch, need bug number.
Comment 1 Ralph Böhme 2016-09-05 06:58:27 UTC
Fix is in master and 4.5. This fix is for a special corner case (ctdb node banned -> cluster messaging down -> smbd kept running -> ctdb node unbanned -> smbd still fully functional ... imo better just stop managed services in banned state). Not back porting to older releases.