Bug 12429 - Permission 0700 on private/msg.sock folder causes messaging not working properly on Solaris system
Summary: Permission 0700 on private/msg.sock folder causes messaging not working prope...
Status: ASSIGNED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Other (show other bugs)
Version: 4.4.7
Hardware: x64 Solaris
: P5 normal (vote)
Target Milestone: ---
Assignee: Jeremy Allison
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-16 16:18 UTC by YOUZHONG YANG
Modified: 2016-11-21 19:31 UTC (History)
3 users (show)

See Also:


Attachments
Possible patch for 4.4.x. (2.25 KB, patch)
2016-11-16 17:24 UTC, Jeremy Allison
no flags Details
Test program showing the problem. (4.58 KB, text/x-csrc)
2016-11-16 19:28 UTC, Jeremy Allison
no flags Details
git-am test fix for 4.4.x (1.96 KB, patch)
2016-11-18 20:19 UTC, Jeremy Allison
no flags Details
git-am test fix for 4.4.x (1.98 KB, patch)
2016-11-18 20:23 UTC, Jeremy Allison
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description YOUZHONG YANG 2016-11-16 16:18:26 UTC
We experienced sporadic permission denying issue in our environment, both with Samba 4.3.11 and 4.4.7 (haven't tested 4.5.x).

Our debugging and tracing showed that sendmsg() returns EACCES error code and when it does, the euid of the thread is not 0(root).
<pre>
              libc.so.1`__so_sendmsg+0xa
              libsocket.so.1`sendmsg+0x1e
              libmessages-dgm-samba4.so`unix_dgram_send_job+0x4e
              libmessages-dgm-samba4.so`pthreadpool_server+0x17e
              libc.so.1`_thrp_setup+0x8a
              libc.so.1`_lwp_start
</pre>

Further dtracing of some kernel functions helped us find the root cause: the permission bits 0700 of the private/msg.sock folder.

So we built a samba version with the following change:

<pre>
diff --git a/source3/lib/messages.c b/source3/lib/messages.c
index ef8e83d..25b0197 100644
--- a/source3/lib/messages.c
+++ b/source3/lib/messages.c
@@ -330,7 +330,7 @@ struct messaging_context *messaging_init(TALLOC_CTX *mem_ctx,
        }

        ok = directory_create_or_exist_strict(priv_path, sec_initial_uid(),
-                                             0700);
+                                             0755);
        if (!ok) {
                DEBUG(10, ("%s: Could not create msg directory: %s\n",
                           __func__, strerror(errno)));
</pre>

with the above change, sendmsg() does not return EACCES any more. We are still stress testing and see if any 'permission denied' error is returned by our application. Will report back later.
Comment 1 Jeremy Allison 2016-11-16 16:45:37 UTC
The problem is this is the wrong fix. It opens up the messaging socket to non-privileged processes. This is not allowable from a security point of view.

The Solaris/Illumos kernel should *NOT* be doing security checks on a connect()'ed unix domain socket at sendmsg() time, only a connect() time.

For Solaris/Illumos the correct fix is to set the send socket to always blocking. That way the pthread queuing code will never be activated and the sendmsg() will always be done as root. This will have a performance impact on this platform, but I don't see any other alternative, sorry.

I'll upload a patch for you to test.
Comment 2 Jeremy Allison 2016-11-16 17:24:21 UTC
Created attachment 12669 [details]
Possible patch for 4.4.x.
Comment 3 Jeremy Allison 2016-11-16 19:28:17 UTC
Created attachment 12670 [details]
Test program showing the problem.

Solaris sendmsg() is broken for UNIX domain sockets.

Here is a test program that demonstrates
that Solaris has a problem in dealing with
permissions on UNIX domain sockets.

To reproduce, compile the attached program,
then become root. In the directory containing
the a.out binary do the following:

# mkdir t
# chown root t
# chmod 700 t
# ./a.out t/s 5000

The expected output (and indeed the output on Linux
and FreeBSD) will be:

non_priv_send - sendmsg fail (expected) Permission denied
CLIENT:TEST0
SERVER:TEST0
CLIENT:TEST1
SERVER:TEST1
CLIENT:TEST2
SERVER:TEST2
CLIENT:TEST3
SERVER:TEST3
CLIENT:TEST4
SERVER:TEST4

On Solaris we get:

non_priv_send - sendmsg fail (expected) Permission denied
CLIENT:TEST0
./sendtest - sendmsg fail Permission denied

The root of the issue is that the program connects
to the socket as root, and then expects to be able
to change to a non-privileged user and use the connected
socket file descriptor to call sendmsg().

On Linux and FreeBSD this works. On Solaris it fails.

This prevents a class of programs that want to start as
privileged, connect to a unix domain socket to talk to
a daemon, and then drop privileges for safety and still
use the connected fd (or pass the fd to another process).
i.e. privilege separation security.
Comment 4 YOUZHONG YANG 2016-11-18 15:59:27 UTC
The patch proposed by Jeremy didn't work. Instead of returning error codes 2(ENOENT), 13(EACCES) and 11(EAGAIN) before the patch, now sendmsg() returns error codes 2 and 13.

By the way, illumos developer is working on a fix for sendmsg(), thanks to Jeremy for the efforts.

Here is the illumos bug report:

https://illumos.org/issues/7590 - sendmsg on AF_UNIX socket fails after process drops privileges
Comment 5 Jeremy Allison 2016-11-18 16:52:43 UTC
Well that would be expected (missing EAGAIN) as the socket is now deliberately left as blocking.

Did you still get missing messages with that patch ?
Comment 6 YOUZHONG YANG 2016-11-18 18:35:26 UTC
(In reply to Jeremy Allison from comment #5)
Yes, I saw 'permission denied' error which is an indication of dropping messages. Also sendmsg() returned tons of error codes 2 and 13, which means it's really bad.

Thanks.
Comment 7 Jeremy Allison 2016-11-18 18:42:51 UTC
Oh, I see the issue with 4.4.x and the test patch. For now run with relaxed permissions on your messaging directory. I'll take a look at what we can do here.
Comment 8 Jeremy Allison 2016-11-18 18:52:41 UTC
Hmmm. Looking closer - is this with 4.4.x ?

If so, that test patch should work. unix_dgram_send() is only ever called as root and the socket should be non-blocking.

So the initial sendmsg() should not fail.

According to the sendmsg() man page on Solaris:

https://docs.oracle.com/cd/E19109-01/tsolaris7/805-8069/6j7j9vo2j/index.html

If the socket does not have enough buffer space available to hold the message being sent, send() blocks, unless the socket has been placed in non-blocking I/O mode (see fcntl(2) ). The select(3C) or poll(2) call may be used to determine when it is possible to send more data.

The socket is in blocking mode, so in the code below from source3/lib/unix_msg/unix_msg.c the sendmsg() on line 632 should block:

612         /*
613          * Try a cheap nonblocking send
614          */
615 
616         msg = (struct msghdr) {
617                 .msg_name = discard_const_p(struct sockaddr_un, dst),
618                 .msg_namelen = sizeof(*dst),
619                 .msg_iov = discard_const_p(struct iovec, iov),
620                 .msg_iovlen = iovlen
621         };
622 
623         fdlen = msghdr_prep_fds(&msg, NULL, 0, fds, num_fds);
624         if (fdlen == -1) {
625                 return EINVAL;
626         }
627 
628         {
629                 uint8_t buf[fdlen];
630                 msghdr_prep_fds(&msg, buf, fdlen, fds, num_fds);
631 
632                 ret = sendmsg(ctx->sock, &msg, 0);
633         }
634 
635         if (ret >= 0) {
636                 return 0;
637         }
638         if ((errno != EWOULDBLOCK) &&
639             (errno != EAGAIN) &&
640 #ifdef ENOBUFS
641             /* FreeBSD can give this for large messages */
642             (errno != ENOBUFS) &&
643 #endif
644             (errno != EINTR)) {
645                 return errno;
646         }

Can you use dtrace to find out what errno is at line 638 please ? Remember, this is only called as root so you should never see 13(EACCES).

I need to know what errno is when the sendmsg() at line 632 fails. You can try adding a debug printf here if there's no other way.
Comment 9 Jeremy Allison 2016-11-18 18:53:32 UTC
From the previous comment:

"is only ever called as root and the socket should be non-blocking."

I meant 'blocking', not non-blocking here of course. Sorry.
Comment 10 Jeremy Allison 2016-11-18 20:19:38 UTC
Created attachment 12675 [details]
git-am test fix for 4.4.x

OK, here's another test patch that *forces* unix_dgram_send() to be synchronous w.r.t become_root()/unbecome_root(). Compiles but not tested.

Just wanted to log it here. YOUZHONG you could throw this on a test machine just to see if it changes things.
Comment 11 Jeremy Allison 2016-11-18 20:23:58 UTC
Created attachment 12676 [details]
git-am test fix for 4.4.x

(Sigh). Don't use potentially freed memory after use. Threaded code is tricky.
Comment 12 YOUZHONG YANG 2016-11-21 19:31:46 UTC
(In reply to Jeremy Allison from comment #11)
Thanks Jeremy. I tested your latest patch, it works but requires change to source3/wscript, because host_os is "sunos5" on both Oracle Solaris and OpenSolaris/illumos platforms.

By the way I also tested the sendmsg() kernel fix on our SmartOS platform, so is it possible to have a 'configure' option for this patch so that user can choose whether to apply it or not?