The Samba-Bugzilla – Attachment 15620 Details for
Bug 14175
Incoming queue can be orphaned causing communication breakdown
Home
|
New
|
Browse
|
Search
|
[?]
|
Reports
|
Requests
|
Help
|
New Account
|
Log In
[x]
|
Forgot Password
Login:
[x]
[patch]
Patch for 4.11, 4.10 and possibly 4.9
BZ14175.patch (text/plain), 8.07 KB, created by
Martin Schwenke
on 2019-11-14 22:22:46 UTC
(
hide
)
Description:
Patch for 4.11, 4.10 and possibly 4.9
Filename:
MIME Type:
Creator:
Martin Schwenke
Created:
2019-11-14 22:22:46 UTC
Size:
8.07 KB
patch
obsolete
>From 23684a640a4ce1cb71ad0a129375f9563c736aca Mon Sep 17 00:00:00 2001 >From: Martin Schwenke <martin@meltin.net> >Date: Tue, 29 Oct 2019 15:25:26 +1100 >Subject: [PATCH 1/4] ctdb-tcp: Check incoming queue to see if incoming > connection is up > >This makes it consistent with the reverse case. Also, in_fd will soon >be removed. > >BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175 > >Signed-off-by: Martin Schwenke <martin@meltin.net> >Reviewed-by: Amitay Isaacs <amitay@gmail.com> >(cherry picked from commit e62b3a05a874db13a848573d2e2fb1c157393b9c) >--- > ctdb/tcp/tcp_connect.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > >diff --git a/ctdb/tcp/tcp_connect.c b/ctdb/tcp/tcp_connect.c >index 6123380ca9f..5bcb5216bd2 100644 >--- a/ctdb/tcp/tcp_connect.c >+++ b/ctdb/tcp/tcp_connect.c >@@ -153,7 +153,7 @@ static void ctdb_node_connect_write(struct tevent_context *ev, > * as connected, but only if the corresponding listening > * socket is also connected > */ >- if (tnode->in_fd != -1) { >+ if (tnode->in_queue != NULL) { > node->ctdb->upcalls->node_connected(node); > } > } >-- >2.24.0 > > >From b358907c7ba5a4840f4ea3ae9af5db93fd6a65ad Mon Sep 17 00:00:00 2001 >From: Martin Schwenke <martin@meltin.net> >Date: Tue, 29 Oct 2019 15:29:11 +1100 >Subject: [PATCH 2/4] ctdb-tcp: Avoid orphaning the TCP incoming queue > >CTDB's incoming queue handling does not check whether an existing >queue exists, so can overwrite the pointer to the queue. This used to >be harmless until commit c68b6f96f26664459187ab2fbd56767fb31767e0 >changed the read callback to use a parent structure as the callback >data. Instead of cleaning up an orphaned queue on disconnect, as >before, this will now free the new queue. > >At first glance it doesn't seem possible that 2 incoming connections >from the same node could be processed before the intervening >disconnect. However, the incoming connections and disconnect occur on >different file descriptors. The queue can become orphaned on node A >when the following sequence occurs: > >1. Node A comes up >2. Node A accepts an incoming connection from node B >3. Node B processes a timeout before noticing that outgoing the queue is writable >4. Node B tears down the outgoing connection to node A >5. Node B initiates a new connection to node A >6. Node A accepts an incoming connection from node B > >Node A processes then the disconnect of the old incoming connection >from (2) but tears down the new incoming connection from (6). This >then occurs until the originally affected node is restarted. > >However, due to the number of outgoing connection attempts and >associated teardowns, this induces the same behaviour on the >corresponding incoming queue on all nodes that node A attempts to >connect to. Therefore, other nodes become affected and need to be >restarted too. > >As a result, the whole cluster probably needs to be restarted to >recover from this situation. > >The problem can occur any time CTDB is started on a node. > >The fix is to avoid accepting new incoming connections when a queue >for incoming connections is already present. The connecting node will >simply retry establishing its outgoing connection. > >BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175 > >Signed-off-by: Martin Schwenke <martin@meltin.net> >Reviewed-by: Amitay Isaacs <amitay@gmail.com> >(cherry picked from commit d0baad257e511280ff3e5c7372c38c43df841070) >--- > ctdb/tcp/tcp_connect.c | 7 +++++++ > 1 file changed, 7 insertions(+) > >diff --git a/ctdb/tcp/tcp_connect.c b/ctdb/tcp/tcp_connect.c >index 5bcb5216bd2..df115b783e0 100644 >--- a/ctdb/tcp/tcp_connect.c >+++ b/ctdb/tcp/tcp_connect.c >@@ -312,6 +312,13 @@ static void ctdb_listen_event(struct tevent_context *ev, struct tevent_fd *fde, > return; > } > >+ if (tnode->in_queue != NULL) { >+ DBG_ERR("Incoming queue active, rejecting connection from %s\n", >+ ctdb_addr_to_str(&addr)); >+ close(fd); >+ return; >+ } >+ > ret = set_blocking(fd, false); > if (ret != 0) { > DBG_ERR("Failed to set socket non-blocking (%s)\n", >-- >2.24.0 > > >From 2a12ab334e2b6f08dc9a294872267bb526823b09 Mon Sep 17 00:00:00 2001 >From: Martin Schwenke <martin@meltin.net> >Date: Tue, 29 Oct 2019 17:28:22 +1100 >Subject: [PATCH 3/4] ctdb-tcp: Drop tracking of file descriptor for incoming > connections > >This file descriptor is owned by the incoming queue. It will be >closed when the queue is torn down. > >BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175 > >Signed-off-by: Martin Schwenke <martin@meltin.net> >Reviewed-by: Amitay Isaacs <amitay@gmail.com> >(cherry picked from commit bf47bc18bb8a94231870ef821c0352b7a15c2e28) >--- > ctdb/tcp/ctdb_tcp.h | 1 - > ctdb/tcp/tcp_connect.c | 2 -- > ctdb/tcp/tcp_init.c | 6 ------ > ctdb/tcp/tcp_io.c | 2 -- > 4 files changed, 11 deletions(-) > >diff --git a/ctdb/tcp/ctdb_tcp.h b/ctdb/tcp/ctdb_tcp.h >index 9a615fc6393..daabad74297 100644 >--- a/ctdb/tcp/ctdb_tcp.h >+++ b/ctdb/tcp/ctdb_tcp.h >@@ -37,7 +37,6 @@ struct ctdb_tcp_node { > struct tevent_timer *connect_te; > > struct ctdb_context *ctdb; >- int in_fd; > struct ctdb_queue *in_queue; > }; > >diff --git a/ctdb/tcp/tcp_connect.c b/ctdb/tcp/tcp_connect.c >index df115b783e0..a75f35a809e 100644 >--- a/ctdb/tcp/tcp_connect.c >+++ b/ctdb/tcp/tcp_connect.c >@@ -355,8 +355,6 @@ static void ctdb_listen_event(struct tevent_context *ev, struct tevent_fd *fde, > return; > } > >- tnode->in_fd = fd; >- > /* > * Mark the connecting node as connected, but only if the > * corresponding outbound connected is also up >diff --git a/ctdb/tcp/tcp_init.c b/ctdb/tcp/tcp_init.c >index a9cb9b36a01..3be16bd7d5e 100644 >--- a/ctdb/tcp/tcp_init.c >+++ b/ctdb/tcp/tcp_init.c >@@ -43,11 +43,6 @@ static int tnode_destructor(struct ctdb_tcp_node *tnode) > tnode->out_fd = -1; > } > >- if (tnode->in_fd != -1) { >- close(tnode->in_fd); >- tnode->in_fd = -1; >- } >- > return 0; > } > >@@ -61,7 +56,6 @@ static int ctdb_tcp_add_node(struct ctdb_node *node) > CTDB_NO_MEMORY(node->ctdb, tnode); > > tnode->out_fd = -1; >- tnode->in_fd = -1; > tnode->ctdb = node->ctdb; > > node->private_data = tnode; >diff --git a/ctdb/tcp/tcp_io.c b/ctdb/tcp/tcp_io.c >index e8ebff887e1..2d8ec0f7062 100644 >--- a/ctdb/tcp/tcp_io.c >+++ b/ctdb/tcp/tcp_io.c >@@ -76,8 +76,6 @@ void ctdb_tcp_read_cb(uint8_t *data, size_t cnt, void *args) > > failed: > TALLOC_FREE(tnode->in_queue); >- close(tnode->in_fd); >- tnode->in_fd = -1; > node->ctdb->upcalls->node_dead(node); > > TALLOC_FREE(data); >-- >2.24.0 > > >From b8456f8dc8afff9f51b76b79df649049862998f5 Mon Sep 17 00:00:00 2001 >From: Volker Lendecke <vl@samba.org> >Date: Thu, 7 Nov 2019 15:26:01 +0100 >Subject: [PATCH 4/4] ctdb-tcp: Close inflight connecting TCP sockets after > fork > >Commit c68b6f96f26 changed the talloc hierarchy such that outgoing TCP sockets >while sitting in the async connect() syscall are not freed via >ctdb_tcp_shutdown() anymore, they are hanging off a longer-running structure. >Free this structure as well. > >If an outgoing TCP socket leaks into a long-running child process (possibly the >recovery daemon), this connection will never be closed as seen by the >destination node. Because with recent changes incoming connections will not be >accepted as long as any incoming connection is alive, with that socket leak >into the recovery daemon we will never again be able to successfully connect to >the node that is affected by this leak. Further attempts to connect will be >discarded by the destination as long as the recovery daemon keeps this socket >alive. > >BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175 >RN: Avoid communication breakdown on node reconnect > >Signed-off-by: Martin Schwenke <martin@meltin.net> >Signed-off-by: Volker Lendecke <vl@samba.org> >Reviewed-by: Amitay Isaacs <amitay@gmail.com> >(cherry picked from commit a6d99d9e5c5bc58e6d56be7a6c1dbc7c8d1a882f) >--- > ctdb/tcp/tcp_init.c | 6 ++++++ > 1 file changed, 6 insertions(+) > >diff --git a/ctdb/tcp/tcp_init.c b/ctdb/tcp/tcp_init.c >index 3be16bd7d5e..0eb9799ac4a 100644 >--- a/ctdb/tcp/tcp_init.c >+++ b/ctdb/tcp/tcp_init.c >@@ -137,8 +137,14 @@ static void ctdb_tcp_shutdown(struct ctdb_context *ctdb) > { > struct ctdb_tcp *ctcp = talloc_get_type(ctdb->private_data, > struct ctdb_tcp); >+ uint32_t i; >+ > talloc_free(ctcp); > ctdb->private_data = NULL; >+ >+ for (i=0; i<ctdb->num_nodes; i++) { >+ TALLOC_FREE(ctdb->nodes[i]->private_data); >+ } > } > > /* >-- >2.24.0 >
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Raw
Flags:
amitay
:
review+
Actions:
View
Attachments on
bug 14175
:
15599
| 15620