From 9b20bf00b905f6bea088048bc7dfa167a0a5a750 Mon Sep 17 00:00:00 2001
From: Ralph Boehme <slow@samba.org>
Date: Wed, 26 Feb 2020 10:23:42 +0100
Subject: [PATCH] WIP: ctdb/tcp: free the in_queue in
 ctdb_tcp_stop_connection()

This fixes a regression introduced by commit
d0baad257e511280ff3e5c7372c38c43df841070 as part of the fixes for bug 14175.

The scenario that triggers this seems to be:

- hard power off of a node A

- all other nodes in the cluster fail to free
  struct ctdb_tcp_node.in_queue

- restart node A and start ctdb

- node A connect to other nodes but the other nodes
  reject the incoming connection with

  Feb 21 13:47:13 somenode ctdbd[302424]: ctdb_listen_event:
  Incoming queue active, rejecting connection from SOMEIP

struct ctdb_tcp_node.in_queue is only ever freed in the fd readable handler
ctdb_tcp_read_cb(), but this gets never called as the TCP stacks on the nodes
doesn't notice the connection is dead. ctdb sets SO_KEEPALIVE on the socket, but
the default timeout for tcp_keepalive_time is 2 hours.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295
---
 ctdb/tcp/tcp_connect.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ctdb/tcp/tcp_connect.c b/ctdb/tcp/tcp_connect.c
index 559442f14bf..79501296054 100644
--- a/ctdb/tcp/tcp_connect.c
+++ b/ctdb/tcp/tcp_connect.c
@@ -45,6 +45,7 @@ void ctdb_tcp_stop_connection(struct ctdb_node *node)
 	struct ctdb_tcp_node *tnode = talloc_get_type(
 		node->transport_data, struct ctdb_tcp_node);
 
+	TALLOC_FREE(tnode->in_queue);
 	TALLOC_FREE(tnode->out_queue);
 	TALLOC_FREE(tnode->connect_te);
 	TALLOC_FREE(tnode->connect_fde);
-- 
2.24.1