From 9b20bf00b905f6bea088048bc7dfa167a0a5a750 Mon Sep 17 00:00:00 2001 From: Ralph Boehme Date: Wed, 26 Feb 2020 10:23:42 +0100 Subject: [PATCH] WIP: ctdb/tcp: free the in_queue in ctdb_tcp_stop_connection() This fixes a regression introduced by commit d0baad257e511280ff3e5c7372c38c43df841070 as part of the fixes for bug 14175. The scenario that triggers this seems to be: - hard power off of a node A - all other nodes in the cluster fail to free struct ctdb_tcp_node.in_queue - restart node A and start ctdb - node A connect to other nodes but the other nodes reject the incoming connection with Feb 21 13:47:13 somenode ctdbd[302424]: ctdb_listen_event: Incoming queue active, rejecting connection from SOMEIP struct ctdb_tcp_node.in_queue is only ever freed in the fd readable handler ctdb_tcp_read_cb(), but this gets never called as the TCP stacks on the nodes doesn't notice the connection is dead. ctdb sets SO_KEEPALIVE on the socket, but the default timeout for tcp_keepalive_time is 2 hours. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295 --- ctdb/tcp/tcp_connect.c | 1 + 1 file changed, 1 insertion(+) diff --git a/ctdb/tcp/tcp_connect.c b/ctdb/tcp/tcp_connect.c index 559442f14bf..79501296054 100644 --- a/ctdb/tcp/tcp_connect.c +++ b/ctdb/tcp/tcp_connect.c @@ -45,6 +45,7 @@ void ctdb_tcp_stop_connection(struct ctdb_node *node) struct ctdb_tcp_node *tnode = talloc_get_type( node->transport_data, struct ctdb_tcp_node); + TALLOC_FREE(tnode->in_queue); TALLOC_FREE(tnode->out_queue); TALLOC_FREE(tnode->connect_te); TALLOC_FREE(tnode->connect_fde); -- 2.24.1