CTDB implements asynchronous input/output queues using the tevent framework. Initially, tevent only reports EPOLLIN events when an incoming request has been received. When CTDB completes processing of the request it immediately tries to send the answer through the socket. However, if there's a lot of output data still pending to be sent through the socket and the kernel buffers are full, this may fail with EGAIN/EWOULDBLOCK. In this case, the output side of the CTDB queue is enabled and tevent is instructed to also report EPOLLOUT events. The problem appears when both EPOLLIN and EPOLLOUT events are active at the same time. In this case the CTDB queue notification function only check EPOLLIN and ignores EPOLLOUT, so data is not really sent until EPOLLIN is not present. In worst conditions, the input side of the queue could keep receiving requests frequently enough to always have EPOLLIN active. In that case, the pending answers in the output queue may get delayed indefinitely.
A patch for this issue: https://gitlab.com/samba-team/samba/-/merge_requests/3688
Backport for 4.19: https://gitlab.com/samba-team/samba/-/merge_requests/3713 Backport for 4.20: https://gitlab.com/samba-team/samba/-/merge_requests/3712
Created attachment 18367 [details] patch from master for v4.19
Created attachment 18368 [details] patch from master for v4.20
Pushed to autobuild-v4-{20,19}-test.
This bug was referenced in samba v4-19-test: 6107f663046a7a762d1c35beeaae0c1b46582f2e
This bug was referenced in samba v4-20-test: 63b47dc0edcd0a1ffe53dd083249d3e9029f4e62
Closing out bug report. Thanks!
This bug was referenced in samba v4-20-stable (Release samba-4.20.3): 63b47dc0edcd0a1ffe53dd083249d3e9029f4e62
This bug was referenced in samba v4-19-stable (Release samba-4.19.8): 6107f663046a7a762d1c35beeaae0c1b46582f2e