Bug 13617 - CTDB recovery lock has some race conditions
Summary: CTDB recovery lock has some race conditions
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: CTDB (show other bugs)
Version: 4.9.0
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
Depends on:
Reported: 2018-09-14 02:20 UTC by Martin Schwenke
Modified: 2018-09-24 07:24 UTC (History)
1 user (show)

See Also:

Patch for 4.8 and 4.9 (18.16 KB, patch)
2018-09-19 05:17 UTC, Martin Schwenke
amitay: review+

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Schwenke 2018-09-14 02:20:57 UTC
The main issue here is that if a node starts taking the recovery lock and loses an election then it can continue to hold the recovery lock.  This means that the new master will be unable to take the lock.

Another issue is that the cluster mutex child effectively drops SIGTERM until after the desired helper is exec()ed.
Comment 1 Martin Schwenke 2018-09-19 05:17:34 UTC
Created attachment 14492 [details]
Patch for 4.8 and 4.9

These commits cherry-picked cleanly into 4.9.  The resulting patch applies cleanly to 4.8 using "git am".

Smoke tested both branched using relevant simple tests.  This code hasn't changed in a long time, so would not expect anything unusual when backporting to these recent releases.

This patch also applies to 4.7 but this branch is now security-only.  However, worth mentioning that it applies and should go there if there is another bug fix release...  ;-)
Comment 2 Amitay Isaacs 2018-09-20 04:04:30 UTC
Hi Karolin,

This is ready for v4-9 and v4-8.
Comment 3 Karolin Seeger 2018-09-20 07:13:00 UTC
Pushed to autobuild-v4-{8,9}-test.
Comment 4 Karolin Seeger 2018-09-24 07:24:43 UTC
Pushed to both branches.
Closing out bug report.