5f6f719 is needed in 3.6.
Created attachment 6510 [details] Possible better patch for master (needs testing) Hi Volker, 5f6f71956460d6840c1433b59e20555268b622ac implements a different logic for this compared to the select and epoll backends of tevent. I think it would be much better if all backends have the same behavior. Does the attached patch also fixes the problem for you? If so I'd add a similar fix for the tevent poll backend. If not I'm fine with having 5f6f71956460d6840c1433b59e20555268b622ac as a solution for 3.6.x. But the longterm fix should look different to 5f6f71956460d6840c1433b59e20555268b622ac. metze
Ok, I'll add this as a customer specific patch then. Sorry for the noise.
Re-opening. We do need *some* fix for this in 3.6.0 - we can't just leave it as a master-only fix. Jeremy.
This isn't in master either. I reverted the patch. Volker
BTW, I agreed with metze that this has to wait until winbind is fully async, i.e. has no child processes anymore. Volker
Ok, after discussion with Volker, re-opening this as we need to address this in 3.6.0. I'll work with Metze to get a patch approved. Jeremy.
Ok, (trying to move this one along :-). I've read and evalutated Volker's patch and it seems fine to me (although why do you care that "smbd monitors EVENT_FD_WRITE in its callback, doing nothing" ? If the callback does nothing there's no harm in it calling that function, and the next time smbd tries to read from the socket it will return the error and close it. Anyway, on to evaluating metze's patch next :-). Jeremy.
FYI. This : http://www.greenend.org.uk/rjk/2001/06/poll.html is a good resource for looking at these kinds of things ! Jeremy.
If you always make a socket writable also on POLLERR|POLLHUP, then you end up with a spinning smbd if a socket dies. Poll detects writability, jumps into smbd_server_connection_handler which first checks for writability. It jumps into smbd_server_connection_write_handler which does nothing but return back to the event loop. We poll again, detect POLLERR|POLLHUP and so on. smbd will never even attempt to read from the socket because when we detect writability we will not even look at it. Volker
Doh - thanks ! Makes sense now. I think I have a really elegant+small fix for this bug though. Still working on it. Jeremy.
Ok, a second fix we need is to remove smbd_server_connection_write_handler() I think :-) We need to remove the code chunks from smbd/process.c that call: if (flags & EVENT_FD_WRITE) { smbd_server_connection_write_handler(conn); return; } as well as smbd_server_connection_write_handler(). But that's a fix for master only (and not part of this bug) I think :-). Jeremy.
Damn damn damn. My elegant fix doesn't (and can't) work :-(. Shame. Ok, looking closer. Jeremy.
Oh, wait a minute... I have a cunning plan :-). My elegant fix may work after all... Jeremy.
Created attachment 6515 [details] git-am fix for 3.6.0 Ok, here is my "cunning plan" :-) to fix this. Metze is correct in that we should only detect fd errors on selecting for read, as this is how the select backend works (and most callers). So inside writev_send() in lib/async_req/async_sock.c ensure we always *start* by selecting for read - even if err_on_readability was set to false. We can detect the error condition in the write callback function instead. I'm planning to add Metze's patch to master, as it's correct, but in order to fix the immediate winbindd issue I think this patch will do the job. Please review and comment. Feel free to call me if you have any questions as I really enjoyed working on this one :-). Jeremy.
Comment on attachment 6515 [details] git-am fix for 3.6.0 Adding metze as patch reviewer also.
Comment on attachment 6515 [details] git-am fix for 3.6.0 If this is tested I guess it's fine, but please also get an ack from Volker.
Right now I'm experimenting to solve it inside async_smb exclusively. All other options seem too much under discussion. Volker
Comment on attachment 6515 [details] git-am fix for 3.6.0 If we take this change, this should also go into master and we should pick dbcdf3e39c359241b7 and 3c9b3b2befc524f21c to 3.6.
I think we should also fix this for 3.5.x (and maybe 3.4.x) as we're also using poll there. metze
Created attachment 6528 [details] Volker's fix for 3.6.x. Moved patch to correct bug. Jeremy.
Ok, in testing against Windows with vl's patch and my patch installed I get NT_STATUS_IO_TIMEOUT when using smbclient to put a large file. Testing with my patch only succeeded. Than my virtual machine crashed hard and took out my desktop with it :-). I'm working this one hard today to get the correct patch reviewed. Jeremy.
Comment on attachment 6528 [details] Volker's fix for 3.6.x. Jeremy found it's broken. I'll submit a new one once I get around to it.
Ok - Karolin - the 3 patches are now in master. Please cherry-pick : dbcdf3e39c359241b743a9455ae695e14a30caa9 3c9b3b2befc524f21c59f46ea9be1602b4b1bfe8 0efcc94fb834aeb03e8edc3034aa0cdeefdc0985 from master to v3-6-test and this one is fixed. Jeremy.
(In reply to comment #23) > Ok - Karolin - the 3 patches are now in master. > > Please cherry-pick : > > dbcdf3e39c359241b743a9455ae695e14a30caa9 > 3c9b3b2befc524f21c59f46ea9be1602b4b1bfe8 > 0efcc94fb834aeb03e8edc3034aa0cdeefdc0985 > > from master to v3-6-test and this one is fixed. > > Jeremy. Done. Closing out bug report. Thanks!
Created attachment 6537 [details] Patchset New Patchset with fix. The first one is the same as the one previously sent, the second one is the fix for the problem Jeremy detected yesterday. For upstream I would squash the two. Volker
Created attachment 6538 [details] Patchset Gna, this time with --stdout.
Comment on attachment 6538 [details] Patchset Volker what's the status of the 2nd part of this attachment, does commit d41d2e93f4e13e7975bcd8d4b7dc125f81ef2559 in master also fixes the problem? What was the situation where cli_smb_req_unset_pending() got called twice?