The fix for bug 13888 (commit 38dc6d11a26c2e9a2cae7927321f2216ceb1c5ec) exposes an issue where requests can be sent but the receiving node can not send replies, so it drops them. The requests can be sent (from node A -> B) because one-way connectivity has been established. However, node B drops the replies because connectivity from B -> A has not been established, so B can not queue the replies. Dropping of packets destined for disconnected nodes was introduced in the fix for bug 13056 (commit ddd97553f0a8bfaada178ec4a7460d76fa21f079). This caused no obvious issues until the above, more recent change sped up node connection times. The solution is to only mark nodes as connected when bidirectional connectivity is established. This means that when a request is sent then the corresponding reply can also be sent.
Created attachment 15411 [details] Patch for 4.11
Created attachment 15412 [details] Patch for 4.10
Created attachment 15413 [details] Patch for 4.9
Patches from master cherry-pick cleanly into v4-11-test. For v4-9-test and v4-10-test they cherry-pick cleanly after applying one of the csbuild fixes that touch the relevant code. So I included that additional commit to make the process cleaner and less error-prone. Patches for 4-9-test and 4-10-test are identical but are attached separately for clarity.
Hi Karolin, This is ready for v4-9, v4-10 and v4-11. Thanks.
(In reply to Amitay Isaacs from comment #5) Hi Amitay, pushed to autobuild-v4-{11,10,9}-test.
(In reply to Karolin Seeger from comment #6) Pushed to all branches. Closing out bug report. Thanks!