Since CTDB started using the recovery helper it makes up to 3 attempts to recover each database during a single run. The default control timeout of 30 seconds means that a failure due to timeouts takes at least 90 seconds. If other controls are slow then a recovery run can take more than 120 seconds to fail. This means that banning credits assigned during one failure will expire due to RecoveryGracePeriod's default of 120 seconds, and a subsequent application of banning credits will not cause a misbehaving node to be banned. A single recovery run is equivalent to 3 pre-recovery-helper runs, so the simplest solution is to ban the most misbehaved node directly from the recovery helper rather than just applying banning credits.
Created attachment 14569 [details] Patch for 4.9 and 4.8
Hi Karolin, This is ready for v4-8 and v4-9. Thanks.
(In reply to Amitay Isaacs from comment #2) Hi Amitay, pushed to autobuild-v4-{9,8}-test.
(In reply to Karolin Seeger from comment #3) Pushed to both branches. Closing out bug report. Thanks!