We experienced sporadic permission denying issue in our environment, both with Samba 4.3.11 and 4.4.7 (haven't tested 4.5.x). Our debugging and tracing showed that sendmsg() returns EACCES error code and when it does, the euid of the thread is not 0(root). <pre> libc.so.1`__so_sendmsg+0xa libsocket.so.1`sendmsg+0x1e libmessages-dgm-samba4.so`unix_dgram_send_job+0x4e libmessages-dgm-samba4.so`pthreadpool_server+0x17e libc.so.1`_thrp_setup+0x8a libc.so.1`_lwp_start </pre> Further dtracing of some kernel functions helped us find the root cause: the permission bits 0700 of the private/msg.sock folder. So we built a samba version with the following change: <pre> diff --git a/source3/lib/messages.c b/source3/lib/messages.c index ef8e83d..25b0197 100644 --- a/source3/lib/messages.c +++ b/source3/lib/messages.c @@ -330,7 +330,7 @@ struct messaging_context *messaging_init(TALLOC_CTX *mem_ctx, } ok = directory_create_or_exist_strict(priv_path, sec_initial_uid(), - 0700); + 0755); if (!ok) { DEBUG(10, ("%s: Could not create msg directory: %s\n", __func__, strerror(errno))); </pre> with the above change, sendmsg() does not return EACCES any more. We are still stress testing and see if any 'permission denied' error is returned by our application. Will report back later.
The problem is this is the wrong fix. It opens up the messaging socket to non-privileged processes. This is not allowable from a security point of view. The Solaris/Illumos kernel should *NOT* be doing security checks on a connect()'ed unix domain socket at sendmsg() time, only a connect() time. For Solaris/Illumos the correct fix is to set the send socket to always blocking. That way the pthread queuing code will never be activated and the sendmsg() will always be done as root. This will have a performance impact on this platform, but I don't see any other alternative, sorry. I'll upload a patch for you to test.
Created attachment 12669 [details] Possible patch for 4.4.x.
Created attachment 12670 [details] Test program showing the problem. Solaris sendmsg() is broken for UNIX domain sockets. Here is a test program that demonstrates that Solaris has a problem in dealing with permissions on UNIX domain sockets. To reproduce, compile the attached program, then become root. In the directory containing the a.out binary do the following: # mkdir t # chown root t # chmod 700 t # ./a.out t/s 5000 The expected output (and indeed the output on Linux and FreeBSD) will be: non_priv_send - sendmsg fail (expected) Permission denied CLIENT:TEST0 SERVER:TEST0 CLIENT:TEST1 SERVER:TEST1 CLIENT:TEST2 SERVER:TEST2 CLIENT:TEST3 SERVER:TEST3 CLIENT:TEST4 SERVER:TEST4 On Solaris we get: non_priv_send - sendmsg fail (expected) Permission denied CLIENT:TEST0 ./sendtest - sendmsg fail Permission denied The root of the issue is that the program connects to the socket as root, and then expects to be able to change to a non-privileged user and use the connected socket file descriptor to call sendmsg(). On Linux and FreeBSD this works. On Solaris it fails. This prevents a class of programs that want to start as privileged, connect to a unix domain socket to talk to a daemon, and then drop privileges for safety and still use the connected fd (or pass the fd to another process). i.e. privilege separation security.
The patch proposed by Jeremy didn't work. Instead of returning error codes 2(ENOENT), 13(EACCES) and 11(EAGAIN) before the patch, now sendmsg() returns error codes 2 and 13. By the way, illumos developer is working on a fix for sendmsg(), thanks to Jeremy for the efforts. Here is the illumos bug report: https://illumos.org/issues/7590 - sendmsg on AF_UNIX socket fails after process drops privileges
Well that would be expected (missing EAGAIN) as the socket is now deliberately left as blocking. Did you still get missing messages with that patch ?
(In reply to Jeremy Allison from comment #5) Yes, I saw 'permission denied' error which is an indication of dropping messages. Also sendmsg() returned tons of error codes 2 and 13, which means it's really bad. Thanks.
Oh, I see the issue with 4.4.x and the test patch. For now run with relaxed permissions on your messaging directory. I'll take a look at what we can do here.
Hmmm. Looking closer - is this with 4.4.x ? If so, that test patch should work. unix_dgram_send() is only ever called as root and the socket should be non-blocking. So the initial sendmsg() should not fail. According to the sendmsg() man page on Solaris: https://docs.oracle.com/cd/E19109-01/tsolaris7/805-8069/6j7j9vo2j/index.html If the socket does not have enough buffer space available to hold the message being sent, send() blocks, unless the socket has been placed in non-blocking I/O mode (see fcntl(2) ). The select(3C) or poll(2) call may be used to determine when it is possible to send more data. The socket is in blocking mode, so in the code below from source3/lib/unix_msg/unix_msg.c the sendmsg() on line 632 should block: 612 /* 613 * Try a cheap nonblocking send 614 */ 615 616 msg = (struct msghdr) { 617 .msg_name = discard_const_p(struct sockaddr_un, dst), 618 .msg_namelen = sizeof(*dst), 619 .msg_iov = discard_const_p(struct iovec, iov), 620 .msg_iovlen = iovlen 621 }; 622 623 fdlen = msghdr_prep_fds(&msg, NULL, 0, fds, num_fds); 624 if (fdlen == -1) { 625 return EINVAL; 626 } 627 628 { 629 uint8_t buf[fdlen]; 630 msghdr_prep_fds(&msg, buf, fdlen, fds, num_fds); 631 632 ret = sendmsg(ctx->sock, &msg, 0); 633 } 634 635 if (ret >= 0) { 636 return 0; 637 } 638 if ((errno != EWOULDBLOCK) && 639 (errno != EAGAIN) && 640 #ifdef ENOBUFS 641 /* FreeBSD can give this for large messages */ 642 (errno != ENOBUFS) && 643 #endif 644 (errno != EINTR)) { 645 return errno; 646 } Can you use dtrace to find out what errno is at line 638 please ? Remember, this is only called as root so you should never see 13(EACCES). I need to know what errno is when the sendmsg() at line 632 fails. You can try adding a debug printf here if there's no other way.
From the previous comment: "is only ever called as root and the socket should be non-blocking." I meant 'blocking', not non-blocking here of course. Sorry.
Created attachment 12675 [details] git-am test fix for 4.4.x OK, here's another test patch that *forces* unix_dgram_send() to be synchronous w.r.t become_root()/unbecome_root(). Compiles but not tested. Just wanted to log it here. YOUZHONG you could throw this on a test machine just to see if it changes things.
Created attachment 12676 [details] git-am test fix for 4.4.x (Sigh). Don't use potentially freed memory after use. Threaded code is tricky.
(In reply to Jeremy Allison from comment #11) Thanks Jeremy. I tested your latest patch, it works but requires change to source3/wscript, because host_os is "sunos5" on both Oracle Solaris and OpenSolaris/illumos platforms. By the way I also tested the sendmsg() kernel fix on our SmartOS platform, so is it possible to have a 'configure' option for this patch so that user can choose whether to apply it or not?