Bug 12022 - Restarting cleanupd when ctdb-messaging is down
Summary: Restarting cleanupd when ctdb-messaging is down
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: unspecified
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Ralph Böhme
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-07-14 14:27 UTC by Ralph Böhme
Modified: 2016-09-05 06:58 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ralph Böhme 2016-07-14 14:27:39 UTC
When running "ctdb ban" on a node where "ctdb timeout" parameter is set as well, cleanup may exit because of the "ctdb timeout" when attempting ctdb database ops.

smbd will then restart cleanupd (since patch for bug 11855), but cleanup will fail to reinitialize messaging with ctdb (remember: ctdb node is banned, this causes failure in ctdb_working()) and immediately exits once again.

To fix this we need three things:
- keep retrying to start cleanupd at intervals
- queue cleanup events in the parent smbd as long as cleanupd is down
- once cleanupd comes back, send him the cleanup events

Have patch, need bug number.
Comment 1 Ralph Böhme 2016-09-05 06:58:27 UTC
Fix is in master and 4.5. This fix is for a special corner case (ctdb node banned -> cluster messaging down -> smbd kept running -> ctdb node unbanned -> smbd still fully functional ... imo better just stop managed services in banned state). Not back porting to older releases.