Created attachment 15400 [details] FreeBSD sys_sendfile() fix - patch Previous to the patch landed in bug 13537, sendfile would cause spinning on EAGAIN/EWOULDBLOCK resulting in excessive CPU usage and poor performance. The patch in bug 13537 attempted to address this, but it broke the sendfile code almost completely. On all platforms when EAGAIN or EWOULDBLOCK is encountered, the loop is continue; 'd skipping any accounting of data already sent, resulting in whatever data sent before encountering a blocking situation be re-sent, clients error out on this and the transfer fails. I will attach a patch, that I am using now on FreeBSD, that works. I can move forward and apply this pattern to the other platforms, but would be limited to testing on FreeBSD and Linux (which appears to be more testing that what was landed in bug 13537). It is important to note when testing 'use sendfile', you must disable aio as it takes precedence over the sendfile path. # aio read size = 0 aio write size = 0 use sendfile = yes #
Sorry about the breakage for bug 13537. Testing this is hard as you need to trigger EAGAIN/EWOULDBLOCK.
Is there is anything I can do to move this forward? I can produce a Linux patch also but do not have access to other systems for testing.
Comment on attachment 15400 [details] FreeBSD sys_sendfile() fix - patch > if (nwritten == 0) { >@@ -457,6 +449,8 @@ ssize_t sys_sendfile(int tofd, int fromfd, > break; > } > >+ twritten += nwritten; >+ > if (sf_header.hdr_cnt) { > if (io_header.iov_len <= nwritten) { > /* Entire header was sent. */ >@@ -474,27 +468,11 @@ ssize_t sys_sendfile(int tofd, int fromfd, > > offset += nwritten; > count -= nwritten; >+ ret = twritten; > } > >- ret = nwritten; >- I think the two logic changed should be separated: - the nwritten vs. twritten fix - the set_blocking change I think the twritten logic is not completely correct. I guess it should be: offset += nwritten; count -= nwritten; + twritten += nwritten; } - ret = nwritten; + ret = twritten; With that submission in chunks should be fixed, and maybe it's possible to drop the set_blocking change?
Is sendfile() the only zero copy method available on FreeBSD? If possible I'd like to get rid of sendfile support, its api doesn't really fit into the async non-blocking infrastructure we have. For Linux we'll move to io_uring with IORING_OP_SPLICE in future, which makes it possible to decouple the layer violation of the sendfile path and makes it possible to process the next incoming pdus while large read responses are transmitted to the client.
(In reply to Stefan Metzmacher from comment #3) I would be able to fix the patch in bug 13537 if that works better. It would be the addition of twritten (total) and a fix in the logic around EAGAIN/EWOULDBLOCK accounting. From my perspective I went with always setting the socket to blocking for the sys_sendfile() duration as it made for net less lines of code and complexity. --- On FreeBSD at least I am hesitant to want to remove sendfile() , when sendfile() is working I am getting much more efficient CPU impact. I don't really like the layer acrobatics getting it to work, but I don't know the history or design around it. Originally after spotting that sendfile has been broken (it either failed or spun on tiny nonblocking sendfile()'s) since the socket change to non-blocking sockets I had made a failed attempt at reworking sendfile across the layers so it could handle partial sends.