On Linux, aio is implemented via threads within glibc that signal completion by sending a POSIX realtime signal to the main thread. If an aio thread takes too long to complete, and smbd switches from a non-zero userid (which initiated the IO) to root (in order to do housekeeping tasks) then the aio thread can fail to send the completion signal to the main thread with a permission denied error. smbd uses setresuid() to change from a non-zero uid to root, or from root to a non-zero uid, and there is a race condition inside the glibc setresuid() implementation that can cause the main thread to be uid root, whilst the aio thread is a non-zero uid (threads under linux can have different uids but glibc must present a posix thread interface which specifies all threads in a process must be under the same uid/gid credentials). If the aio completes during this race window, and the aio thread tries to signal the main thread, then the signal is lost and the sending thread gets an EPERM response. smbd then has "lost" the aio reply to the scheduled read or write, and can never reply to the SMB request.
See tridge's junkcode aio_uid.c program: http://samba.org/ftp/unpacked/junkcode/aio_uid.c that demonstrates the problem. Jeremy.
Created attachment 5242 [details] git-am format patch for 3.5.0. This has gone into master. Metze & Volker please review for 3.5.0. Jeremy.
Created attachment 5243 [details] git-am format patch for 3.4.6.
Created attachment 5244 [details] git-am format patch for 3.3.11.
Comment on attachment 5242 [details] git-am format patch for 3.5.0. It won't do any harm (I think) if it does not work right, I don't think we kill other processes out of the blue with this. Another question around this patch: Can we get rid of the become_root calls in message_notify() with this patch in? Volker
Only for Linux I think (get rid of the become_root() calls). So for portability I'd leave them. Also that's *way* more of a change than I'm prepared to make right now :-). Jeremy.
Re-assigning to Karolin for inclusion in 3.5.0, 3.4.6 and 3.3.11. Jeremy.
Pushed to all branches. Closing out bug report. Thanks!
Just as a postmortem comment. This bug actually was already fixed by commits in master: d9f61dbdc91fae6560361f98d981b1f7bea80422 563a7ccdd9d23ffbd3195c8def82cd4d8d4cb0dc which changed us to use setreuid() in preference to setresuid(), and added become_root()/unbecome_root() pairs around the aio_read()/aio_write() calls. So this fix was unnecessary and simply caused a security hole. Damn :-(. Jeremy.