The Samba-Bugzilla – Attachment 14569 Details for
Bug 13670
Misbehaving nodes are sometimes not banned
Home
|
New
|
Browse
|
Search
|
[?]
|
Reports
|
Requests
|
Help
|
New Account
|
Log In
[x]
|
Forgot Password
Login:
[x]
[patch]
Patch for 4.9 and 4.8
BZ13670.patch (text/plain), 3.90 KB, created by
Martin Schwenke
on 2018-11-05 22:39:36 UTC
(
hide
)
Description:
Patch for 4.9 and 4.8
Filename:
MIME Type:
Creator:
Martin Schwenke
Created:
2018-11-05 22:39:36 UTC
Size:
3.90 KB
patch
obsolete
>From f5509794d9633a7ef63d0ce9c6e70e94032ceedd Mon Sep 17 00:00:00 2001 >From: Martin Schwenke <martin@meltin.net> >Date: Mon, 29 Oct 2018 14:33:08 +1100 >Subject: [PATCH] ctdb-recovery: Ban a node that causes recovery failure > >... instead of applying banning credits. > >There have been a couple of cases where recovery repeatedly takes just >over 2 minutes to fail. Therefore, banning credits expire between >failures and a continuously problematic node is never banned, >resulting in endless recoveries. This is because it takes 2 >applications of banning credits before a node is banned, which >generally involves 2 recovery failures. > >The recovery helper makes up to 3 attempts to recover each database >during a single run. If a node causes 3 failures then this is really >equivalent to 3 recovery failures in the model that existed before the >recovery helper added retries. In that case the node would have been >banned after 2 failures. > >So, instead of applying banning credits to the "most failing" node, >simply ban it directly from the recovery helper. > >If multiple nodes are causing recovery failures then this can cause a >node to be banned more quickly than it might otherwise have been, even >pre-recovery-helper. However, 90 seconds (i.e. 3 failures) is a long >time to be in recovery, so banning earlier seems like the best >approach. > >BUG: https://bugzilla.samba.org/show_bug.cgi?id=13670 > >Signed-off-by: Martin Schwenke <martin@meltin.net> >Reviewed-by: Amitay Isaacs <amitay@gmail.com> > >Autobuild-User(master): Amitay Isaacs <amitay@samba.org> >Autobuild-Date(master): Mon Nov 5 06:52:33 CET 2018 on sn-devel-144 > >(cherry picked from commit 27df4f002a594dbb2f2a38afaccf3e22f19818e1) >--- > ctdb/server/ctdb_recovery_helper.c | 46 ++++++++++++++++++++---------- > 1 file changed, 31 insertions(+), 15 deletions(-) > >diff --git a/ctdb/server/ctdb_recovery_helper.c b/ctdb/server/ctdb_recovery_helper.c >index 7495eb3a674..7fdcc2e5a29 100644 >--- a/ctdb/server/ctdb_recovery_helper.c >+++ b/ctdb/server/ctdb_recovery_helper.c >@@ -2571,22 +2571,28 @@ static void recovery_db_recovery_done(struct tevent_req *subreq) > > /* If pulling database fails multiple times */ > if (max_credits >= NUM_RETRIES) { >- struct ctdb_req_message message; >- >- D_ERR("Assigning banning credits to node %u\n", >- max_pnn); >- >- message.srvid = CTDB_SRVID_BANNING; >- message.data.pnn = max_pnn; >- >- subreq = ctdb_client_message_send( >- state, state->ev, state->client, >- ctdb_client_pnn(state->client), >- &message); >+ struct ctdb_ban_state ban_state = { >+ .pnn = max_pnn, >+ .time = state->tun_list->recovery_ban_period, >+ }; >+ >+ D_ERR("Banning node %u for %u seconds\n", >+ ban_state.pnn, >+ ban_state.time); >+ >+ ctdb_req_control_set_ban_state(&request, >+ &ban_state); >+ subreq = ctdb_client_control_send(state, >+ state->ev, >+ state->client, >+ ban_state.pnn, >+ TIMEOUT(), >+ &request); > if (tevent_req_nomem(subreq, req)) { > return; > } >- tevent_req_set_callback(subreq, recovery_failed_done, >+ tevent_req_set_callback(subreq, >+ recovery_failed_done, > req); > } else { > tevent_req_error(req, EIO); >@@ -2609,15 +2615,25 @@ static void recovery_failed_done(struct tevent_req *subreq) > { > struct tevent_req *req = tevent_req_callback_data( > subreq, struct tevent_req); >+ struct recovery_state *state = tevent_req_data( >+ req, struct recovery_state); >+ struct ctdb_reply_control *reply; > int ret; > bool status; > >- status = ctdb_client_message_recv(subreq, &ret); >+ status = ctdb_client_control_recv(subreq, &ret, state, &reply); > TALLOC_FREE(subreq); > if (! status) { >- D_ERR("failed to assign banning credits, ret=%d\n", ret); >+ D_ERR("failed to ban node, ret=%d\n", ret); >+ goto done; >+ } >+ >+ ret = ctdb_reply_control_set_ban_state(reply); >+ if (ret != 0) { >+ D_ERR("control SET_BAN_STATE failed, ret=%d\n", ret); > } > >+done: > tevent_req_error(req, EIO); > } > >-- >2.19.1 >
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Raw
Flags:
amitay
:
review+
Actions:
View
Attachments on
bug 13670
: 14569