Bug 13670 - Misbehaving nodes are sometimes not banned
Summary: Misbehaving nodes are sometimes not banned
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: CTDB (show other bugs)
Version: 4.9.1
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-01 04:54 UTC by Martin Schwenke
Modified: 2018-11-07 07:46 UTC (History)
1 user (show)

See Also:


Attachments
Patch for 4.9 and 4.8 (3.90 KB, patch)
2018-11-05 22:39 UTC, Martin Schwenke
amitay: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Schwenke 2018-11-01 04:54:31 UTC
Since CTDB started using the recovery helper it makes up
to 3 attempts to recover each database during a single run.
The default control timeout of 30 seconds means that a failure
due to timeouts takes at least 90 seconds.  If other controls
are slow then a recovery run can take more than 120 seconds to
fail.

This means that banning credits assigned during one failure
will expire due to RecoveryGracePeriod's default of 120 seconds,
and a subsequent application of banning credits will not cause a
misbehaving node to be banned.

A single recovery run is equivalent to 3 pre-recovery-helper
runs, so the simplest solution is to ban the most misbehaved
node directly from the recovery helper rather than just applying
banning credits.
Comment 1 Martin Schwenke 2018-11-05 22:39:36 UTC
Created attachment 14569 [details]
Patch for 4.9 and 4.8
Comment 2 Amitay Isaacs 2018-11-05 22:52:42 UTC
Hi Karolin,

This is ready for v4-8 and v4-9.

Thanks.
Comment 3 Karolin Seeger 2018-11-06 08:05:20 UTC
(In reply to Amitay Isaacs from comment #2)
Hi Amitay,

pushed to autobuild-v4-{9,8}-test.
Comment 4 Karolin Seeger 2018-11-07 07:46:48 UTC
(In reply to Karolin Seeger from comment #3)
Pushed to both branches.
Closing out bug report.

Thanks!