Bug 14513 - ctdb disable/enable can still fail due to race condition
Summary: ctdb disable/enable can still fail due to race condition
Status: ASSIGNED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: CTDB (show other bugs)
Version: 4.11.13
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-09-26 07:19 UTC by Martin Schwenke
Modified: 2020-10-13 08:13 UTC (History)
1 user (show)

See Also:


Attachments
Patch for 4.13 (6.54 KB, patch)
2020-10-12 05:37 UTC, Martin Schwenke
amitay: review+
Details
Patch for 4.12 (6.54 KB, patch)
2020-10-12 05:38 UTC, Martin Schwenke
amitay: review+
Details
Patch for 4.11 (4.79 KB, patch)
2020-10-12 05:42 UTC, Martin Schwenke
amitay: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Schwenke 2020-09-26 07:19:35 UTC
Bug 14466 fixed a ctdb disable/enable race but left another behind. 

The recovery daemon (node A) updates its flags from remote node B and pushes changed flags for B out to all nodes.  However, if a third node (C) had different flags for node B then the recovery daemon's copy of these flags is not updated.  However, the recovery daemon's main loop sanity checks C's potentially stale copy of flags for B with its own flags for B.  If these differ then it can overwrite flags in an undesirable way, cancelling a disable/enable.

The recovery daemon is better off not doing such sanity check and should just depend on update_flags() working.
Comment 1 Samba QA Contact 2020-10-06 04:33:11 UTC
This bug was referenced in samba master:

3ab52b528673e08caa66f00e963528c591a84fe1
4b01f54041dee469971f244e64064eed46de2ed5
b68105b8f7c20692d23d457f2777edcf44f12bb8
Comment 2 Martin Schwenke 2020-10-12 05:37:53 UTC
Created attachment 16278 [details]
Patch for 4.13

Patch from master cherry-picks cleanly into v4-13-test.  Successfully regression tested, no surprises.
Comment 3 Martin Schwenke 2020-10-12 05:38:57 UTC
Created attachment 16279 [details]
Patch for 4.12

Patch from master cherry-picks cleanly into v4-12-test - this is actually the same patch as for v4-12-test.  Successfully regression tested, no surprises.
Comment 4 Martin Schwenke 2020-10-12 05:42:33 UTC
Created attachment 16280 [details]
Patch for 4.11

I'm not sure if there will be another bug fix 4.11 release.  If not, please ignore this.

The final commit from master that strengthens the relevant testcase does not apply cleanly to v4-11-test so I have dropped it from this "backport", since it mostly exists to protect us from potential breakage from future changes.

The other 2 commits for the code changes apply cleanly.  I did a smoke test (ran the relevant testcase against local daemons) and it passes as expected.
Comment 5 Amitay Isaacs 2020-10-13 08:13:29 UTC
Hi Karolin,

This is ready for v4-11 (if we are taking bug fixes), v4-12 and v4-13.

Thanks.