Bug 10903 - when delete a node in the cluster,other node may down
Summary: when delete a node in the cluster,other node may down
Status: RESOLVED INVALID
Alias: None
Product: CTDB 2.5.x or older
Classification: Unclassified
Component: ctdb (show other bugs)
Version: 2.5.3
Hardware: x64 Linux
: P5 normal
Target Milestone: ---
Assignee: Amitay Isaacs
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-28 11:11 UTC by fugx
Modified: 2016-09-12 09:11 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description fugx 2014-10-28 11:11:54 UTC
when delete a node on line, change the nodes config file #ip of this node,
and do ctdb reloadnodes on other nodes,
then, other nodes may down because of node->pending_controls timeout can free invalid request, the backtrace:


  
daemon_control_destructor
talloc_free
daemon_control_callback
ctdb_control_timeout

in daemon_control_destructor:

if (state->node) {
   DLIST_REMOVE(state->node->pending_controls, state);
}

but the node is free ctb_reload_nodes_event,so here will be error,

i think it should do call ctdb_daemon_cancel_controls in ctb_reload_nodes_event,

like this:

 if (ctdb->nodes[i]->flags & NODE_FLAGS_DELETED) {
   ctdb_daemon_cancel_controls(ctdb, nodes[i]); //add this line
   continue;
}
Comment 1 Martin Schwenke 2016-09-02 01:47:37 UTC
I think the key here is "delete a node on line".  I take that to mean that the node is being deleted when it is online.  The documentation has always said that CTDB should be shut down on a node that is about to be deleted.  This has been sanity checked by the ctdb tool since Samba 4.3.

Unless I'm misunderstanding this, should close as "invalid"?
Comment 2 Martin Schwenke 2016-09-12 09:11:05 UTC
Invalid.  Can't delete a node that is online/up.  CTDB needs to be shut down first.