Bug 12836 - Misused talloc context can cause a user to crash their smbd by chaining SMB1 commands.
Summary: Misused talloc context can cause a user to crash their smbd by chaining SMB1 ...
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.6.5
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Andrew Bartlett
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-06-13 04:06 UTC by Andrew Bartlett
Modified: 2019-06-11 21:24 UTC (History)
10 users (show)

See Also:


Attachments
Patch (1.82 KB, patch)
2017-06-15 06:54 UTC, Volker Lendecke
no flags Details
Patch with attribution to the bug reporter (1.94 KB, patch)
2017-06-15 06:57 UTC, Volker Lendecke
jra: review+
Details
valgrind trace (42.65 KB, text/plain)
2017-06-16 22:02 UTC, Jeremy Allison
no flags Details
git-am fix for master. (1.28 KB, patch)
2017-07-13 19:12 UTC, Jeremy Allison
no flags Details
git-am fix for master. (1.33 KB, patch)
2017-07-14 16:09 UTC, Jeremy Allison
no flags Details
git-am fix for 4.7.0, 4.6.next, 4.5.next, 4.4.next (1.39 KB, patch)
2017-07-17 22:05 UTC, Jeremy Allison
slow: review+
jra: review? (metze)
Details

Note You need to log in before you can comment on or make changes to this bug.
Comment 2 Volker Lendecke 2017-06-15 06:54:10 UTC
Created attachment 13278 [details]
Patch

AndX beating us again
Comment 3 Volker Lendecke 2017-06-15 06:57:30 UTC
Created attachment 13279 [details]
Patch with attribution to the bug reporter
Comment 6 Jeremy Allison 2017-06-16 22:18:21 UTC
Comment on attachment 13279 [details]
Patch with attribution to the bug reporter

Confirm the fix prevents the POC and fixes the valgrind errors. Now to work out if it could be exploitable beyond a crash.
Comment 7 Alexander Bokovoy 2017-06-18 16:05:58 UTC
Comment on attachment 13279 [details]
Patch with attribution to the bug reporter

I wonder why we have that second
 if (req->unread_bytes) {}
 
part in reply_write_and_X while just before
 
  if (IS_IPC(conn)) {}
 
we already tested that condition and never modified req->unread_bytes
afterwards:
 
        if (req->unread_bytes) {
                /* Can't do a recvfile write on IPC$ */
                if (IS_IPC(conn)) {
                        reply_nterror(req, NT_STATUS_INVALID_PARAMETER);
                        goto out;
                }
                if (numtowrite != req->unread_bytes) {
                        reply_nterror(req, NT_STATUS_INVALID_PARAMETER);
                        goto out;
                }
        } else {
                if (smb_doff > smblen || smb_doff + numtowrite < numtowrite ||
                                smb_doff + numtowrite > smblen) {
                        reply_nterror(req, NT_STATUS_INVALID_PARAMETER);
                        goto out;
                }
        }

So this part is a repeatance:
 
        /* If it's an IPC, pass off the pipe handler. */
        if (IS_IPC(conn)) {
                if (req->unread_bytes) {
                        reply_nterror(req, NT_STATUS_INVALID_PARAMETER);
                        goto out;
                }
                reply_pipe_write_and_X(req);
                goto out;
        }

I wonder if we should change
 
  if (req->unread_bytes) {}

above directly with
 
  if (req_is_in_chain(req)) {}
Comment 8 Jeremy Allison 2017-07-10 23:06:24 UTC
Alexander wrote:

> So this part is a repeatance:

True, but belt and braces IMHO :-). I prefer Volker's minimal fix for safety purposes. I'm trying to evaluate now if this can be exploited in any way. I can't see how yet, but I'm still looking.

If I can't be exploited I'm inclined to just fix as a regular bug, not a CVE.

Jeremy.
Comment 9 Jeremy Allison 2017-07-13 17:52:40 UTC
OK, the fix is incomplete. I finally figured out *exactly* what is going on and there is a broader problem here (although still not a security issue thank god :-).
Comment 10 Jeremy Allison 2017-07-13 19:02:03 UTC
The actual bug is inside construct_reply_chain(), which is worth looking at in its entirety:

----------------------------------------------------------------------------------------------------
1778 static void construct_reply_chain(struct smbXsrv_connection *xconn,
1779                                   char *inbuf, int size, uint32_t seqnum,
1780                                   bool encrypted,
1781                                   struct smb_perfcount_data *deferred_pcd)
1782 {
1783         struct smb_request **reqs = NULL;
1784         struct smb_request *req;
1785         unsigned num_reqs;
1786         bool ok;
1787 
1788         ok = smb1_parse_chain(talloc_tos(), (uint8_t *)inbuf, xconn, encrypted,
1789                               seqnum, &reqs, &num_reqs);
1790         if (!ok) {
1791                 char errbuf[smb_size];
1792                 error_packet(errbuf, 0, 0, NT_STATUS_INVALID_PARAMETER,
1793                              __LINE__, __FILE__);
1794                 if (!srv_send_smb(xconn, errbuf, true, seqnum, encrypted,
1795                                   NULL)) {
1796                         exit_server_cleanly("construct_reply_chain: "
1797                                             "srv_send_smb failed.");
1798                 }
1799                 return;
1800         }
1801 
1802         req = reqs[0];
1803         req->inbuf = (uint8_t *)talloc_move(reqs, &inbuf);
1804 
1805         req->conn = switch_message(req->cmd, req);
1806 
1807         if (req->outbuf == NULL) {
1808                 /*
1809                  * Request has suspended itself, will come
1810                  * back here.
1811                  */
1812                 return;
1813         }
1814         smb_request_done(req);
1815 }
----------------------------------------------------------------------------------------------------

At 1788 we call smb1_parse_chain - with a memory context of talloc_tos().
That context is the top-level one created for a new SMB request in smbd_tevent_trace_callback()
which is deleted once the tevent_loop_once() completes.

So if the request goes async in a chain, then the reqs[] array is TALLOC_FREE'd
once the tevent_loop_once() completes - so when smb_request_done() is called after
the async request is completed the array it operates on has already been freed.

So the correct fix is to change the smb1_parse_chain() call to use NULL as
the long-term talloc context passed in, so the reqs[] array will persist if
an element in the chain goes async.

When I do this (and test under valgrind) the crash goes away,
even without Volker's patch installed.

We're not leaking memory here, as inside smb_request_done(), which is eventually
called to ship the completed chain, we have:

1914 shipit:
1915         if (!srv_send_smb(first_req->xconn,
1916                           (char *)first_req->outbuf,
1917                           true, first_req->seqnum+1,
1918                           IS_CONN_ENCRYPTED(req->conn)||first_req->encrypted,
1919                           &first_req->pcd)) {
1920                 exit_server_cleanly("construct_reply_chain: srv_send_smb "
1921                                     "failed.");
1922         }
1923         TALLOC_FREE(req);       /* non-chained case */
1924         TALLOC_FREE(reqs);      /* chained case */
1925         return;
1926 
1927 error:
1928         {
1929                 char errbuf[smb_size];
1930                 error_packet(errbuf, 0, 0, status, __LINE__, __FILE__);
1931                 if (!srv_send_smb(req->xconn, errbuf, true,
1932                                   req->seqnum+1, req->encrypted,
1933                                   NULL)) {
1934                         exit_server_cleanly("construct_reply_chain: "
1935                                             "srv_send_smb failed.");
1936                 }
1937         }
1938         TALLOC_FREE(req);       /* non-chained case */
1939         TALLOC_FREE(reqs);      /* chained case */

This isn't a security issue, as the client has no control over what memory
is accessed even by selecting different chained smb requests. The most a
malicious client can do is crash its own server.

Updated patchset to follow.

Jeremy.
Comment 11 Jeremy Allison 2017-07-13 19:04:04 UTC
Once more comment - the reproducer uses a Windows pipe as that's the easiest way to make a chained call go async, but even a write to a file can go async if the system is under heavy load, and a chained write to file would then crash for the same reason.
Comment 12 Jeremy Allison 2017-07-13 19:12:28 UTC
Created attachment 13375 [details]
git-am fix for master.

This is the correct fix. Volker, do we also want to restrict chained pipe readX and writeX as well for safety's sake ? I don't know of any client that uses this SMB1 feature plus SMB1 is being deprecated.

Jeremy.
Comment 13 Andrew Bartlett 2017-07-13 20:30:28 UTC
(In reply to Jeremy Allison from comment #10)
I'm nervous about this analysis.

Why can we assert that the client has no control over the data in the free()ed memory?

We hit a 'bad talloc magic', not 'use after free' error from talloc, which means it is not just the old talloc pointer still intact, it has changed, perhaps because it is quickly re-used memory in the pool.

We may be saved from a trivial exploit by the run-time changing magic check in modern talloc, but older versions might be more exposed.

Thanks,
Comment 14 Jeremy Allison 2017-07-13 20:44:42 UTC
Sorry, when I said "client has no control" I mean the problem happens on a read, not a write. There are no writes being done here - the problem happens on a talloc_get_size() call to get the size of an array to iterate over - not a function to call from a talloc'ed area.

All supported versions of Samba talloc have the check here. I simply don't see any way this can be exploited.
Comment 15 Jeremy Allison 2017-07-13 21:21:38 UTC
(In reply to Jeremy Allison from comment #11)

> "but even a write to a file can go async if the system is under heavy load,
> and a chained write to file would then crash for the same reason."

No it can't :-). I just checked and there's code:

        /* Only do this on non-chained and non-chaining reads not using the
         * write cache. */
        if (req_is_in_chain(smbreq) || (lp_write_cache_size(SNUM(conn)) != 0)) {
                return NT_STATUS_RETRY;
        }

preventing this inside asynchronous SMB1writeX and SMB1readX so we're good here. :-).
Comment 16 Jeremy Allison 2017-07-13 21:42:59 UTC
(In reply to Jeremy Allison from comment #14)

Elaborating here - the only exploit avenue is inside smb_request_done(), by arranging for the freed memory req->chain (which previously was an array of struct smb_request pointers) to contain a newly talloc'ed object with a destructor that when fired somehow does what the client wants.

These checks:

        for (i=0; i<num_reqs; i++) {
                if (reqs[i] == req) {
                        break;
                }
        }
        if (i == num_reqs) {
                /*
                 * Invalid chain, should not happen
                 */
                status = NT_STATUS_INTERNAL_ERROR;
                goto error;
        }

inside smb_request_done() ensure that in order to call any other functions than a req->chain destructor the newly talloced object that was allocated inside the old space pointed to by req->chain must also have contained a back-pointer to req somewhere inside it when read as an array. This isn't possible for the client to arrange. So the only exploit possibility is via the 'Invalid chain, should not happen' error exit calling the destructor for req->chain.

Remember the client can only chain SMB1 requests onto the writeX pipe command, and I can't conceive of any sequence that could leave such a destructor in the right place.

Maybe metze or you can be smarter :-), but IMHO this is not a credible security issue - any more than any other crash bug.
Comment 17 Jeremy Allison 2017-07-13 22:27:41 UTC
(In reply to Jeremy Allison from comment #16)

> I can't conceive of any sequence that could leave such a destructor in the 
> right place

Especially as 4.4.x onwards have randomized talloc magic protection.
Comment 18 Marcus Meissner 2017-07-14 13:21:40 UTC
As this bug seems to abort the server it would classify as "Denial Of Service" attack like security issue.

I thought about the argument if there is code execution possible and I tend to agree that the hardening measure of reqs chain backlink checking will avoid the problem. It seems very unlikely an attacker could provide such a structure, but I still have a not so good feeling ;)

So the bug should be fixed. If a user can abort the server it is a security issue. ;)
Comment 19 Jeremy Allison 2017-07-14 15:50:22 UTC
Well the user can abort their *own* server. Historically we have not always treated these as security issues, and often have just patched the crash bug and moved on. If every authenticated user crash bug gets a CVE and security release then we're going to be very busy indeed (as are the Linux distros and their back-port and security teams).

Marcus Meissner wrote:

> It seems very unlikely an attacker could provide such a structure, but I
> still have a not so good feeling ;)

So I need you to make a call on this please, not just have feelings. I don't think this is exploitable. If you disagree please show how this could be done. Even if we do a security release just for the crash bug this affects greatly the severity of the issue.

Thanks,

Jeremy
Comment 20 Jeremy Allison 2017-07-14 16:09:27 UTC
Created attachment 13386 [details]
git-am fix for master.

Changed to use xconn context as requested by Metze. Added 'Reviewed-by:' from Metze. Now all we need to decide is if this needs a CVE or not.
Comment 21 Jeremy Allison 2017-07-14 17:54:22 UTC
From the samba-security list:

From: Ralph Böhme via samba-team <samba-team@lists.samba.org>

On Fri, Jul 14, 2017 at 09:30:14AM -0700, Jeremy Allison wrote:
> On Fri, Jul 14, 2017 at 09:44:24AM +0200, Stefan Metzmacher wrote:
> > Hi Jeremy,
> >
> > > I think the attached is the correct fix for:
> > >
> > > https://bugzilla.samba.org/show_bug.cgi?id=12836
> > >
> > > (the writeX pipe crash). My analysis is on the
> > > bug report. I don't think this is a security
> > > issue, so I'd like to push to master and open
> > > up the bug (minus the reproducer). Can you
> > > guys confirm ?
> > >
> > > Volker, do we also want to add your code to restrict
> > > chained pipe readX and writeX as well for safety's sake ?
> > >
> > > I don't know of any client that uses this SMB1
> > > feature plus SMB1 is being deprecated.
> >
> > I'd use 'xconn' as memory context instead of NULL.
> > We should avoid using NULL as much as possible.
> > And the request can't last longer than the connection.
>
> Fixed and updated. We now need to make the decision
> on CVE or not. Any thoughts ?

your analysis looks reasonable, this doesn't look like a security issue.

Cheerio!
-slow
Comment 22 Ralph Böhme 2017-07-14 18:14:37 UTC
(In reply to Marcus Meissner from comment #18)
> As this bug seems to abort the server it would classify as "Denial Of Service"
> attack like security issue.

this doesn't abort "the server", only the user's smb session.
Comment 23 Jeremy Allison 2017-07-14 18:21:56 UTC
(In reply to Ralph Böhme from comment #22)

Good point Ralph. The misunderstanding may come from the fact this aborts the smbd connected to the user, but not the smbd listening for connections. The master smbd keeps on truckin' :-) and the user can (and most likely will) silently reconnect and continue and not even notice the disruption (unless an attempt at disruption was the intent :-). No other users currently connected or new connections will be affected.
Comment 24 Jeremy Allison 2017-07-14 19:19:24 UTC
From Jim McDonough <jmcd@samba.org> on the samba-security list:

> Thanks. Unfortunately on the bug report Marcus Meissner <meissner@suse.de>
> is taking the position that any server crash DOS in a CVE-able security
> issue even on an authenticated connection.
I'd say that's a context-less reading of his statement.  He'd forgotten
this bit about the smbd forking architecture.  I've reminded him of this
but he's gone for the weekend already so he can't comment.
>
> As I pointed out, this will cause great pain to both us and the
> Linux distros and their back-port and security teams :-(.
>
> See the latest on:
>
> https://bugzilla.samba.org/show_bug.cgi?id=12836
>
> for details.
>
> Jim, do you have a mangerial opinion on this ? :-).
Yeah, my opinion is that he was clear that he believes it is not a
remote code execution issue.  The statement about a server abort was
made using the wrong assumption about affecting other users.  And the
"bad feeling" statement is that he was afraid I was asking if we should
bother fixing it if it wasn't deemed a security issue.

So fix it, no cve....

Jim
Comment 25 Marcus Meissner 2017-07-14 20:01:38 UTC
Correct, I was not aware this server instance aborting is per-user. This would make this abort not an issue.

For the rest see my earlier comment. I see no exploitability with my knowledge.
Comment 26 Jeremy Allison 2017-07-14 20:16:27 UTC
(In reply to Marcus Meissner from comment #25)

Thanks Marcus, sorry if I was a little sharp. Reported security issues are rather stressful (as I'm sure you appreciate :-).
Comment 28 Jeremy Allison 2017-07-17 22:05:23 UTC
Created attachment 13397 [details]
git-am fix for 4.7.0, 4.6.next, 4.5.next, 4.4.next

Cherry-picked from master.
Comment 29 Ralph Böhme 2017-07-18 05:00:53 UTC
Reassigning to Karolin for inclusion in 4.4, 4.5, 4.6 and 4.7.
Comment 31 Karolin Seeger 2017-07-24 19:58:12 UTC
(In reply to Ralph Böhme from comment #29)
Pushed to autobuild-v4-{7,6,5,4}-test.
Comment 32 Karolin Seeger 2017-07-25 09:24:44 UTC
(In reply to Karolin Seeger from comment #31)
Pushed to all branches.
Re-assigning to Andrew.
Comment 33 Andrew Bartlett 2019-06-11 15:58:06 UTC
Closing out as resolved and fixed.