Created attachment 13278 [details] Patch AndX beating us again
Created attachment 13279 [details] Patch with attribution to the bug reporter
Comment on attachment 13279 [details] Patch with attribution to the bug reporter Confirm the fix prevents the POC and fixes the valgrind errors. Now to work out if it could be exploitable beyond a crash.
Comment on attachment 13279 [details] Patch with attribution to the bug reporter I wonder why we have that second if (req->unread_bytes) {} part in reply_write_and_X while just before if (IS_IPC(conn)) {} we already tested that condition and never modified req->unread_bytes afterwards: if (req->unread_bytes) { /* Can't do a recvfile write on IPC$ */ if (IS_IPC(conn)) { reply_nterror(req, NT_STATUS_INVALID_PARAMETER); goto out; } if (numtowrite != req->unread_bytes) { reply_nterror(req, NT_STATUS_INVALID_PARAMETER); goto out; } } else { if (smb_doff > smblen || smb_doff + numtowrite < numtowrite || smb_doff + numtowrite > smblen) { reply_nterror(req, NT_STATUS_INVALID_PARAMETER); goto out; } } So this part is a repeatance: /* If it's an IPC, pass off the pipe handler. */ if (IS_IPC(conn)) { if (req->unread_bytes) { reply_nterror(req, NT_STATUS_INVALID_PARAMETER); goto out; } reply_pipe_write_and_X(req); goto out; } I wonder if we should change if (req->unread_bytes) {} above directly with if (req_is_in_chain(req)) {}
Alexander wrote: > So this part is a repeatance: True, but belt and braces IMHO :-). I prefer Volker's minimal fix for safety purposes. I'm trying to evaluate now if this can be exploited in any way. I can't see how yet, but I'm still looking. If I can't be exploited I'm inclined to just fix as a regular bug, not a CVE. Jeremy.
OK, the fix is incomplete. I finally figured out *exactly* what is going on and there is a broader problem here (although still not a security issue thank god :-).
The actual bug is inside construct_reply_chain(), which is worth looking at in its entirety: ---------------------------------------------------------------------------------------------------- 1778 static void construct_reply_chain(struct smbXsrv_connection *xconn, 1779 char *inbuf, int size, uint32_t seqnum, 1780 bool encrypted, 1781 struct smb_perfcount_data *deferred_pcd) 1782 { 1783 struct smb_request **reqs = NULL; 1784 struct smb_request *req; 1785 unsigned num_reqs; 1786 bool ok; 1787 1788 ok = smb1_parse_chain(talloc_tos(), (uint8_t *)inbuf, xconn, encrypted, 1789 seqnum, &reqs, &num_reqs); 1790 if (!ok) { 1791 char errbuf[smb_size]; 1792 error_packet(errbuf, 0, 0, NT_STATUS_INVALID_PARAMETER, 1793 __LINE__, __FILE__); 1794 if (!srv_send_smb(xconn, errbuf, true, seqnum, encrypted, 1795 NULL)) { 1796 exit_server_cleanly("construct_reply_chain: " 1797 "srv_send_smb failed."); 1798 } 1799 return; 1800 } 1801 1802 req = reqs[0]; 1803 req->inbuf = (uint8_t *)talloc_move(reqs, &inbuf); 1804 1805 req->conn = switch_message(req->cmd, req); 1806 1807 if (req->outbuf == NULL) { 1808 /* 1809 * Request has suspended itself, will come 1810 * back here. 1811 */ 1812 return; 1813 } 1814 smb_request_done(req); 1815 } ---------------------------------------------------------------------------------------------------- At 1788 we call smb1_parse_chain - with a memory context of talloc_tos(). That context is the top-level one created for a new SMB request in smbd_tevent_trace_callback() which is deleted once the tevent_loop_once() completes. So if the request goes async in a chain, then the reqs[] array is TALLOC_FREE'd once the tevent_loop_once() completes - so when smb_request_done() is called after the async request is completed the array it operates on has already been freed. So the correct fix is to change the smb1_parse_chain() call to use NULL as the long-term talloc context passed in, so the reqs[] array will persist if an element in the chain goes async. When I do this (and test under valgrind) the crash goes away, even without Volker's patch installed. We're not leaking memory here, as inside smb_request_done(), which is eventually called to ship the completed chain, we have: 1914 shipit: 1915 if (!srv_send_smb(first_req->xconn, 1916 (char *)first_req->outbuf, 1917 true, first_req->seqnum+1, 1918 IS_CONN_ENCRYPTED(req->conn)||first_req->encrypted, 1919 &first_req->pcd)) { 1920 exit_server_cleanly("construct_reply_chain: srv_send_smb " 1921 "failed."); 1922 } 1923 TALLOC_FREE(req); /* non-chained case */ 1924 TALLOC_FREE(reqs); /* chained case */ 1925 return; 1926 1927 error: 1928 { 1929 char errbuf[smb_size]; 1930 error_packet(errbuf, 0, 0, status, __LINE__, __FILE__); 1931 if (!srv_send_smb(req->xconn, errbuf, true, 1932 req->seqnum+1, req->encrypted, 1933 NULL)) { 1934 exit_server_cleanly("construct_reply_chain: " 1935 "srv_send_smb failed."); 1936 } 1937 } 1938 TALLOC_FREE(req); /* non-chained case */ 1939 TALLOC_FREE(reqs); /* chained case */ This isn't a security issue, as the client has no control over what memory is accessed even by selecting different chained smb requests. The most a malicious client can do is crash its own server. Updated patchset to follow. Jeremy.
Once more comment - the reproducer uses a Windows pipe as that's the easiest way to make a chained call go async, but even a write to a file can go async if the system is under heavy load, and a chained write to file would then crash for the same reason.
Created attachment 13375 [details] git-am fix for master. This is the correct fix. Volker, do we also want to restrict chained pipe readX and writeX as well for safety's sake ? I don't know of any client that uses this SMB1 feature plus SMB1 is being deprecated. Jeremy.
(In reply to Jeremy Allison from comment #10) I'm nervous about this analysis. Why can we assert that the client has no control over the data in the free()ed memory? We hit a 'bad talloc magic', not 'use after free' error from talloc, which means it is not just the old talloc pointer still intact, it has changed, perhaps because it is quickly re-used memory in the pool. We may be saved from a trivial exploit by the run-time changing magic check in modern talloc, but older versions might be more exposed. Thanks,
Sorry, when I said "client has no control" I mean the problem happens on a read, not a write. There are no writes being done here - the problem happens on a talloc_get_size() call to get the size of an array to iterate over - not a function to call from a talloc'ed area. All supported versions of Samba talloc have the check here. I simply don't see any way this can be exploited.
(In reply to Jeremy Allison from comment #11) > "but even a write to a file can go async if the system is under heavy load, > and a chained write to file would then crash for the same reason." No it can't :-). I just checked and there's code: /* Only do this on non-chained and non-chaining reads not using the * write cache. */ if (req_is_in_chain(smbreq) || (lp_write_cache_size(SNUM(conn)) != 0)) { return NT_STATUS_RETRY; } preventing this inside asynchronous SMB1writeX and SMB1readX so we're good here. :-).
(In reply to Jeremy Allison from comment #14) Elaborating here - the only exploit avenue is inside smb_request_done(), by arranging for the freed memory req->chain (which previously was an array of struct smb_request pointers) to contain a newly talloc'ed object with a destructor that when fired somehow does what the client wants. These checks: for (i=0; i<num_reqs; i++) { if (reqs[i] == req) { break; } } if (i == num_reqs) { /* * Invalid chain, should not happen */ status = NT_STATUS_INTERNAL_ERROR; goto error; } inside smb_request_done() ensure that in order to call any other functions than a req->chain destructor the newly talloced object that was allocated inside the old space pointed to by req->chain must also have contained a back-pointer to req somewhere inside it when read as an array. This isn't possible for the client to arrange. So the only exploit possibility is via the 'Invalid chain, should not happen' error exit calling the destructor for req->chain. Remember the client can only chain SMB1 requests onto the writeX pipe command, and I can't conceive of any sequence that could leave such a destructor in the right place. Maybe metze or you can be smarter :-), but IMHO this is not a credible security issue - any more than any other crash bug.
(In reply to Jeremy Allison from comment #16) > I can't conceive of any sequence that could leave such a destructor in the > right place Especially as 4.4.x onwards have randomized talloc magic protection.
As this bug seems to abort the server it would classify as "Denial Of Service" attack like security issue. I thought about the argument if there is code execution possible and I tend to agree that the hardening measure of reqs chain backlink checking will avoid the problem. It seems very unlikely an attacker could provide such a structure, but I still have a not so good feeling ;) So the bug should be fixed. If a user can abort the server it is a security issue. ;)
Well the user can abort their *own* server. Historically we have not always treated these as security issues, and often have just patched the crash bug and moved on. If every authenticated user crash bug gets a CVE and security release then we're going to be very busy indeed (as are the Linux distros and their back-port and security teams). Marcus Meissner wrote: > It seems very unlikely an attacker could provide such a structure, but I > still have a not so good feeling ;) So I need you to make a call on this please, not just have feelings. I don't think this is exploitable. If you disagree please show how this could be done. Even if we do a security release just for the crash bug this affects greatly the severity of the issue. Thanks, Jeremy
Created attachment 13386 [details] git-am fix for master. Changed to use xconn context as requested by Metze. Added 'Reviewed-by:' from Metze. Now all we need to decide is if this needs a CVE or not.
From the samba-security list: From: Ralph Böhme via samba-team <samba-team@lists.samba.org> On Fri, Jul 14, 2017 at 09:30:14AM -0700, Jeremy Allison wrote: > On Fri, Jul 14, 2017 at 09:44:24AM +0200, Stefan Metzmacher wrote: > > Hi Jeremy, > > > > > I think the attached is the correct fix for: > > > > > > https://bugzilla.samba.org/show_bug.cgi?id=12836 > > > > > > (the writeX pipe crash). My analysis is on the > > > bug report. I don't think this is a security > > > issue, so I'd like to push to master and open > > > up the bug (minus the reproducer). Can you > > > guys confirm ? > > > > > > Volker, do we also want to add your code to restrict > > > chained pipe readX and writeX as well for safety's sake ? > > > > > > I don't know of any client that uses this SMB1 > > > feature plus SMB1 is being deprecated. > > > > I'd use 'xconn' as memory context instead of NULL. > > We should avoid using NULL as much as possible. > > And the request can't last longer than the connection. > > Fixed and updated. We now need to make the decision > on CVE or not. Any thoughts ? your analysis looks reasonable, this doesn't look like a security issue. Cheerio! -slow
(In reply to Marcus Meissner from comment #18) > As this bug seems to abort the server it would classify as "Denial Of Service" > attack like security issue. this doesn't abort "the server", only the user's smb session.
(In reply to Ralph Böhme from comment #22) Good point Ralph. The misunderstanding may come from the fact this aborts the smbd connected to the user, but not the smbd listening for connections. The master smbd keeps on truckin' :-) and the user can (and most likely will) silently reconnect and continue and not even notice the disruption (unless an attempt at disruption was the intent :-). No other users currently connected or new connections will be affected.
From Jim McDonough <jmcd@samba.org> on the samba-security list: > Thanks. Unfortunately on the bug report Marcus Meissner <meissner@suse.de> > is taking the position that any server crash DOS in a CVE-able security > issue even on an authenticated connection. I'd say that's a context-less reading of his statement. He'd forgotten this bit about the smbd forking architecture. I've reminded him of this but he's gone for the weekend already so he can't comment. > > As I pointed out, this will cause great pain to both us and the > Linux distros and their back-port and security teams :-(. > > See the latest on: > > https://bugzilla.samba.org/show_bug.cgi?id=12836 > > for details. > > Jim, do you have a mangerial opinion on this ? :-). Yeah, my opinion is that he was clear that he believes it is not a remote code execution issue. The statement about a server abort was made using the wrong assumption about affecting other users. And the "bad feeling" statement is that he was afraid I was asking if we should bother fixing it if it wasn't deemed a security issue. So fix it, no cve.... Jim
Correct, I was not aware this server instance aborting is per-user. This would make this abort not an issue. For the rest see my earlier comment. I see no exploitability with my knowledge.
(In reply to Marcus Meissner from comment #25) Thanks Marcus, sorry if I was a little sharp. Reported security issues are rather stressful (as I'm sure you appreciate :-).
Created attachment 13397 [details] git-am fix for 4.7.0, 4.6.next, 4.5.next, 4.4.next Cherry-picked from master.
Reassigning to Karolin for inclusion in 4.4, 4.5, 4.6 and 4.7.
(In reply to Ralph Böhme from comment #29) Pushed to autobuild-v4-{7,6,5,4}-test.
(In reply to Karolin Seeger from comment #31) Pushed to all branches. Re-assigning to Andrew.
Closing out as resolved and fixed.