Bug 14958 - CTDB can get stuck in election and recovery
Summary: CTDB can get stuck in election and recovery
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: CTDB (show other bugs)
Version: 4.16.0rc1
Hardware: All All
: P5 regression (vote)
Target Milestone: ---
Assignee: Jule Anger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-01-24 23:33 UTC by Martin Schwenke
Modified: 2022-03-01 08:55 UTC (History)
2 users (show)

See Also:


Attachments
Patch for v4-16-test (11.33 KB, patch)
2022-02-14 07:18 UTC, Martin Schwenke
amitay: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Schwenke 2022-01-24 23:33:31 UTC
Election-in-progress is set by unknown leader broadcast, so needs to be cleared in all cases when election completes.  The early returns in cluster_lock_election() do not do this.

This was seen in a case where the leader node stalled (due to clock set backwards by ~4 minutes), so didn't send leader broadcasts for some time.  The node continued to hold the cluster lock, so no other node could not become leader.  However, after the node returned to normal it did not send leader broadcasts because election-in-progress was never cleared.
Comment 1 Martin Schwenke 2022-01-24 23:34:57 UTC
(In reply to Martin Schwenke from comment #0)

> The node continued to hold the cluster lock, so no other node could not become 
> leader.

Sorry this should read:

The node continued to hold the cluster lock, so another node could not become leader.
Comment 3 Samba QA Contact 2022-02-14 02:47:03 UTC
This bug was referenced in samba master:

188a9021565bc2c1bec1d7a4830d6f47cdbc44a9
9b3fab052bd2dccf2fc3fe9bd2b4354dff0b9ebb
bf55a0117d045e8ca888f7e01591cc2a2bce9223
0e74e03c9cf83d5dc2d97fa9f38ff8fbaa3d2685
265e44abc42e1f5b7fef6550cd748459dbef80cb
331c435ce520bef1274e076e6ed491400db3b5ad
Comment 4 Martin Schwenke 2022-02-14 07:18:21 UTC
Created attachment 17163 [details]
Patch for v4-16-test

Patch for 4.16 cherry-picks cleanly.  I ran the CTDB test suite against local daemons...
Comment 5 Amitay Isaacs 2022-02-14 23:58:07 UTC
Hi Jule,

This is ready for v4-16.

Thanks.
Amitay.
Comment 6 Jule Anger 2022-02-15 07:51:18 UTC
Pushed to autobuild-v4-16-test.
Comment 7 Samba QA Contact 2022-02-15 09:56:04 UTC
This bug was referenced in samba v4-16-test:

07540a8cf4597f683e6661cc4418b858f59d7312
758e953ee07343e1e3fd0389eb2d82c0654be61c
ddda97dc146179a035485219bca6af2338b360e9
d0133dd3a54acc29949e8351702b0996ba8d66c6
f3047e90a8653284f19ef7138ddbe9ada3b7a303
79b42f0f2bfa539c66ca46adba8383e2465af783
Comment 8 Jule Anger 2022-02-15 13:53:42 UTC
Closing out bug report.

Thanks!
Comment 9 Samba QA Contact 2022-03-01 08:55:37 UTC
This bug was referenced in samba v4-16-stable (Release samba-4.16.0rc4):

07540a8cf4597f683e6661cc4418b858f59d7312
758e953ee07343e1e3fd0389eb2d82c0654be61c
ddda97dc146179a035485219bca6af2338b360e9
d0133dd3a54acc29949e8351702b0996ba8d66c6
f3047e90a8653284f19ef7138ddbe9ada3b7a303
79b42f0f2bfa539c66ca46adba8383e2465af783