Bug 14084 - CTDB replies can be lost before nodes are bidirectionally connected
Summary: CTDB replies can be lost before nodes are bidirectionally connected
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: CTDB (show other bugs)
Version: 4.9.11
Hardware: All All
: P5 regression (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks: 14087
  Show dependency treegraph
 
Reported: 2019-08-13 05:16 UTC by Martin Schwenke
Modified: 2019-11-25 19:49 UTC (History)
2 users (show)

See Also:


Attachments
Patch for 4.11 (24.08 KB, patch)
2019-08-22 07:00 UTC, Martin Schwenke
amitay: review+
Details
Patch for 4.10 (28.37 KB, patch)
2019-08-22 07:00 UTC, Martin Schwenke
amitay: review+
Details
Patch for 4.9 (28.37 KB, patch)
2019-08-22 07:01 UTC, Martin Schwenke
amitay: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Schwenke 2019-08-13 05:16:17 UTC
The fix for bug 13888 (commit 38dc6d11a26c2e9a2cae7927321f2216ceb1c5ec) exposes an issue where requests can be sent but the receiving node can not send replies, so it drops them.  The requests can be sent (from node A -> B) because one-way connectivity has been established.  However, node B drops the replies because connectivity from B -> A has not been established, so B can not queue the replies.

Dropping of packets destined for disconnected nodes was introduced in the fix for bug 13056 (commit ddd97553f0a8bfaada178ec4a7460d76fa21f079).  This caused no obvious issues until the above, more recent change sped up node connection times.

The solution is to only mark nodes as connected when bidirectional connectivity is established.  This means that when a request is sent then the corresponding reply can also be sent.
Comment 1 Martin Schwenke 2019-08-22 07:00:01 UTC
Created attachment 15411 [details]
Patch for 4.11
Comment 2 Martin Schwenke 2019-08-22 07:00:58 UTC
Created attachment 15412 [details]
Patch for 4.10
Comment 3 Martin Schwenke 2019-08-22 07:01:29 UTC
Created attachment 15413 [details]
Patch for 4.9
Comment 4 Martin Schwenke 2019-08-22 07:04:31 UTC
Patches from master cherry-pick cleanly into v4-11-test.

For v4-9-test and v4-10-test they cherry-pick cleanly after applying one of the csbuild fixes that touch the relevant code.  So I included that additional commit to make the process cleaner and less error-prone.

Patches for 4-9-test and 4-10-test are identical but are attached separately for clarity.
Comment 5 Amitay Isaacs 2019-08-26 03:32:01 UTC
Hi Karolin,

This is ready for v4-9, v4-10 and v4-11.

Thanks.
Comment 6 Karolin Seeger 2019-08-27 10:29:17 UTC
(In reply to Amitay Isaacs from comment #5)
Hi Amitay,

pushed to autobuild-v4-{11,10,9}-test.
Comment 7 Karolin Seeger 2019-09-03 11:47:49 UTC
(In reply to Karolin Seeger from comment #6)
Pushed to all branches.
Closing out bug report.

Thanks!