Bug 14571 (CVE-2021-20254) - CVE-2021-20254 [SECURITY] Buffer overrun in sids_to_unixids() [source3/passdb/lookup_sid.c]
Summary: CVE-2021-20254 [SECURITY] Buffer overrun in sids_to_unixids() [source3/passdb...
Status: RESOLVED FIXED
Alias: CVE-2021-20254
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.14.0rc3
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-10 21:19 UTC by Peter Eriksson
Modified: 2021-07-12 22:48 UTC (History)
9 users (show)

See Also:


Attachments
Make sids_to_unixids() in source3/passdb/lookup_sids.c more robust (1.58 KB, patch)
2021-01-26 18:54 UTC, Peter Eriksson
no flags Details
Fix buffer overrun and make sids_to_unixids() more robust (1.15 KB, patch)
2021-01-27 12:56 UTC, Peter Eriksson
no flags Details
alternative patch (3.67 KB, patch)
2021-02-20 17:30 UTC, Volker Lendecke
jra: review+
Details
git-am fix for master. (4.77 KB, patch)
2021-02-23 02:09 UTC, Jeremy Allison
vl: review+
Details
Preliminary cut at a CVE text. (2.34 KB, patch)
2021-03-04 22:34 UTC, Jeremy Allison
vl: review+
Details
backport to 3.6 (4.34 KB, patch)
2021-03-09 08:44 UTC, Noel Power
no flags Details
backport to 4.4 (4.80 KB, patch)
2021-03-09 08:44 UTC, Noel Power
no flags Details
backport to 4.6 (4.77 KB, patch)
2021-03-09 08:45 UTC, Noel Power
no flags Details
backport to 4.7 (4.77 KB, patch)
2021-03-09 08:46 UTC, Noel Power
no flags Details
backport to 4.9 (4.77 KB, patch)
2021-03-09 08:47 UTC, Noel Power
no flags Details
backport to 4.10 (4.77 KB, patch)
2021-03-09 08:48 UTC, Noel Power
no flags Details
backport to 4.11 (4.77 KB, patch)
2021-03-09 08:49 UTC, Noel Power
no flags Details
git-am fix for master. (6.06 KB, patch)
2021-03-10 20:12 UTC, Jeremy Allison
no flags Details
backport to 4.9.x (5.84 KB, patch)
2021-03-10 20:49 UTC, Jeremy Allison
no flags Details
git-am fix for master. (6.29 KB, patch)
2021-03-10 23:50 UTC, Jeremy Allison
vl: review+
npower: review+
pen: review+
Details
backport to 4.9.x (6.18 KB, patch)
2021-03-10 23:53 UTC, Jeremy Allison
no flags Details
backport to 3.6 (5.99 KB, patch)
2021-03-12 11:28 UTC, Noel Power
jra: review+
Details
backport to 4.4 (6.29 KB, patch)
2021-03-12 11:29 UTC, Noel Power
jra: review+
Details
backport to 4.11 (6.25 KB, patch)
2021-03-12 11:29 UTC, Noel Power
jra: review+
Details
backport to 4.9 (6.25 KB, patch)
2021-03-12 11:30 UTC, Noel Power
jra: review+
Details
backport to 4.7 (6.25 KB, patch)
2021-03-12 11:31 UTC, Noel Power
jra: review+
Details
backport to 4.10 (6.25 KB, patch)
2021-03-12 11:31 UTC, Noel Power
jra: review+
Details
backport to 4.6 (6.25 KB, patch)
2021-03-12 11:31 UTC, Noel Power
jra: review+
Details
Updated CVE text with correct attribution and version numbers. (2.42 KB, text/plain)
2021-03-12 19:30 UTC, Jeremy Allison
no flags Details
patch for master (v2) (6.31 KB, patch)
2021-03-15 01:33 UTC, Andrew Bartlett
npower: review+
jra: review+
abartlet: ci-passed+
Details
patch backported to 4.14 (v2) (6.35 KB, patch)
2021-03-15 01:34 UTC, Andrew Bartlett
npower: review+
jra: review+
abartlet: ci-passed+
Details
patch backported to 4.13 (v2) (6.49 KB, patch)
2021-03-15 01:35 UTC, Andrew Bartlett
npower: review+
jra: review+
abartlet: ci-passed+
Details
patch backported to 4.12 (v2) (6.49 KB, patch)
2021-03-15 01:35 UTC, Andrew Bartlett
abartlet: ci-passed-
Details
Advisory v3 (2.56 KB, text/plain)
2021-03-15 01:51 UTC, Andrew Bartlett
jra: review+
abartlet: review? (npower)
Details
patch for 4.12/debian9 failure (955 bytes, patch)
2021-03-15 09:21 UTC, Volker Lendecke
no flags Details
patch backported to 4.12 (v3) (7.06 KB, patch)
2021-03-16 16:42 UTC, Jeremy Allison
npower: review+
Details
patch backported to 4.12 (v4) (7.35 KB, patch)
2021-03-16 20:43 UTC, Andrew Bartlett
jra: review+
abartlet: ci-passed+
Details
backport to 3.6 (6.00 KB, patch)
2021-04-22 08:28 UTC, Noel Power
jra: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Eriksson 2020-11-10 21:19:25 UTC
I just got a panic-crash from Samba 4.12.10 on one of our FreeBSD 12.2 servers that I haven't seen before (recently upgraded it from 4.12.5). 

It might be related to Kerberos ticket expiring (10 hours) since this smbd process was started at 11:30, and it crashed 21:39 (10 hours later):

> 11:30/ps-auxwww.log:adm_lillu24 23768   0.2  0.1  211716  168264  -  S    11:29       0:00.04 /liu/sbin/smbd --daemon --configfile=/liu/etc/samba/smb.conf
> 11:31/ps-auxwww.log:root             23768   0.0  0.1  214476  169112  -  S    11:29       0:00.05 /liu/sbin/smbd --daemon --configfile=/liu/etc/samba/smb.conf
> ...
> 21:38/ps-auxwww.log:root        23768   0.0  0.1  230392  188800  -  S    11:29       0:12.75 /liu/sbin/smbd --daemon --configfile=/liu/etc/samba/smb.conf
> 21:39/ps-auxwww.log:root        23768   0.0  0.1  230476  188924  -  S    11:29       0:12.76 /liu/sbin/smbd --daemon --configfile=/liu/etc/samba/smb.conf

Crash stack trace. Unfortunately no core dump.

Nov 10 21:39:11 runur01 smbd_audit[23768]: [2020/11/10 21:39:11.810411,  0] ../../lib/util/fault.c:79(fault_report)
Nov 10 21:39:11 runur01 smbd_audit[23768]:   ===============================================================
Nov 10 21:39:11 runur01 smbd_audit[23768]: [2020/11/10 21:39:11.810811,  0] ../../lib/util/fault.c:80(fault_report)
Nov 10 21:39:11 runur01 smbd_audit[23768]:   INTERNAL ERROR: Signal 11 in pid 23768 (4.12.10-LiU)
Nov 10 21:39:11 runur01 smbd_audit[23768]:   If you are running a recent Samba version, and if you think this problem is not yet fixed in the latest versions, please consider reporting this bug, see https://wiki.samba.org/index.php/Bug_Reporting
Nov 10 21:39:11 runur01 smbd_audit[23768]: [2020/11/10 21:39:11.810867,  0] ../../lib/util/fault.c:86(fault_report)
Nov 10 21:39:11 runur01 smbd_audit[23768]:   ===============================================================
Nov 10 21:39:11 runur01 smbd_audit[23768]: [2020/11/10 21:39:11.810896,  0] ../../source3/lib/util.c:830(smb_panic_s3)
Nov 10 21:39:11 runur01 smbd_audit[23768]:   PANIC (pid 23768): internal error
Nov 10 21:39:11 runur01 smbd_audit[23768]: [2020/11/10 21:39:11.811758,  0] ../../lib/util/fault.c:222(log_stack_trace)
Nov 10 21:39:11 runur01 smbd_audit[23768]:   BACKTRACE:
Nov 10 21:39:11 runur01 smbd_audit[23768]:    #0 log_stack_trace + 0x35 [ip=0x80129a305] [sp=0x7fffffffc1c0]
Nov 10 21:39:11 runur01 smbd_audit[23768]:    #1 smb_panic_s3 + 0x24 [ip=0x802d4c9b8] [sp=0x7fffffffca90]
Nov 10 21:39:11 runur01 smbd_audit[23768]:    #2 smb_panic + 0x2d [ip=0x80129a532] [sp=0x7fffffffcab0]
Nov 10 21:39:11 runur01 smbd_audit[23768]:    #3 sig_fault + 0x6c [ip=0x80129a7a9] [sp=0x7fffffffcbb0]
Nov 10 21:39:11 runur01 smbd_audit[23768]:    #4 <unknown symbol> [ip=0x801517b70] [sp=0x7fffffffcbc0]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #5 <unknown symbol> [ip=0x80151713f] [sp=0x7fffffffcf80]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #6 <unknown symbol> [ip=0x7ffffffff003] [sp=0x7fffffffcff0]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #7 sids_to_unixids + 0x25d [ip=0x803a07e59] [sp=0x7fffffffdd00]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #8 create_local_token + 0x2da [ip=0x8035ca9c7] [sp=0x7fffffffdd90]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #9 make_session_info_krb5 + 0x94 [ip=0x8035c359f] [sp=0x7fffffffdfb0]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #10 auth3_generate_session_info_pac + 0x2b6 [ip=0x8035c440a] [sp=0x7fffffffe020]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #11 gensec_generate_session_info_pac + 0x70 [ip=0x8088eac09] [sp=0x7fffffffe130]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #12 gensec_gse_session_info + 0x113 [ip=0x80848cd79] [sp=0x7fffffffe190]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #13 gensec_session_info + 0x25 [ip=0x8088e93e0] [sp=0x7fffffffe220]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #14 gensec_child_session_info + 0x12 [ip=0x8088eaf58] [sp=0x7fffffffe270]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #15 gensec_session_info + 0x25 [ip=0x8088e93e0] [sp=0x7fffffffe280]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #16 smbd_smb2_session_setup_gensec_done + 0xa8 [ip=0x8017e0070] [sp=0x7fffffffe2d0]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #17 _tevent_req_notify_callback + 0x1b [ip=0x802170a1c] [sp=0x7fffffffe300]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #18 tevent_req_finish + 0x97 [ip=0x802170ac4] [sp=0x7fffffffe310]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #19 _tevent_req_done + 0x11 [ip=0x802170ae0] [sp=0x7fffffffe340]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #20 gensec_update_done + 0x416 [ip=0x8088e9213] [sp=0x7fffffffe350]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #21 _tevent_req_notify_callback + 0x1b [ip=0x802170a1c] [sp=0x7fffffffe390]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #22 tevent_req_finish + 0x97 [ip=0x802170ac4] [sp=0x7fffffffe3a0]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #23 _tevent_req_done + 0x11 [ip=0x802170ae0] [sp=0x7fffffffe3d0]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #24 gensec_spnego_update_post + 0x153 [ip=0x8088e789a] [sp=0x7fffffffe3e0]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #25 gensec_spnego_update_done + 0x92 [ip=0x8088e8442] [sp=0x7fffffffe410]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #26 _tevent_req_notify_callback + 0x1b [ip=0x802170a1c] [sp=0x7fffffffe440]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #27 tevent_req_finish + 0x97 [ip=0x802170ac4] [sp=0x7fffffffe450]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #28 _tevent_req_done + 0x11 [ip=0x802170ae0] [sp=0x7fffffffe480]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #29 gensec_update_done + 0x416 [ip=0x8088e9213] [sp=0x7fffffffe490]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #30 _tevent_req_notify_callback + 0x1b [ip=0x802170a1c] [sp=0x7fffffffe4d0]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #31 tevent_req_finish + 0x97 [ip=0x802170ac4] [sp=0x7fffffffe4e0]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #32 tevent_req_trigger + 0x29 [ip=0x802170b6f] [sp=0x7fffffffe510]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #33 tevent_common_invoke_immediate_handler + 0x134 [ip=0x8021702cb] [sp=0x7fffffffe520]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #34 tevent_common_loop_immediate + 0x1f [ip=0x8021702ec] [sp=0x7fffffffe570]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #35 poll_event_loop_once + 0x59 [ip=0x802171f2a] [sp=0x7fffffffe580]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #36 _tevent_loop_once + 0x94 [ip=0x80216f67f] [sp=0x7fffffffe5e0]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #37 tevent_common_loop_wait + 0x21 [ip=0x80216f856] [sp=0x7fffffffe610]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #38 _tevent_loop_wait + 0xa [ip=0x80216f8b2] [sp=0x7fffffffe630]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #39 smbd_process + 0x8e7 [ip=0x8017c9ef9] [sp=0x7fffffffe640]
Nov 10 21:39:12 runur01 smbd_audit[23768]:    #40 smbd_accept_connection + 0x1ce [ip=0x102e1ea] [sp=0x7fffffffe6d0]
Nov 10 21:39:13 runur01 smbd_audit[23768]:    #41 tevent_common_invoke_fd_handler + 0x84 [ip=0x80216fea9] [sp=0x7fffffffe7a0]
Nov 10 21:39:13 runur01 smbd_audit[23768]:    #42 poll_event_loop_once + 0x5b4 [ip=0x802172485] [sp=0x7fffffffe7d0]
Nov 10 21:39:13 runur01 smbd_audit[23768]:    #43 _tevent_loop_once + 0x94 [ip=0x80216f67f] [sp=0x7fffffffe830]
Nov 10 21:39:13 runur01 smbd_audit[23768]:    #44 tevent_common_loop_wait + 0x21 [ip=0x80216f856] [sp=0x7fffffffe860]
Nov 10 21:39:13 runur01 smbd_audit[23768]:    #45 _tevent_loop_wait + 0xa [ip=0x80216f8b2] [sp=0x7fffffffe880]
Nov 10 21:39:13 runur01 smbd_audit[23768]:    #46 main + 0x1d1b [ip=0x1030087] [sp=0x7fffffffe890]
Nov 10 21:39:13 runur01 smbd_audit[23768]:    #47 _start + 0x99 [ip=0x10284a9] [sp=0x7fffffffeca0]
Nov 10 21:39:13 runur01 smbd_audit[23768]:    #48 <unknown symbol> [ip=0x801265000] [sp=0x7fffffffecd0]
Nov 10 21:39:13 runur01 smbd_audit[23768]: [2020/11/10 21:39:13.214746,  0] ../../source3/lib/dumpcore.c:310(dump_core)
Nov 10 21:39:13 runur01 smbd_audit[23768]:   unable to change to %N.core
Nov 10 21:39:13 runur01 smbd_audit[23768]:   refusing to dump core
Comment 1 Peter Eriksson 2021-01-26 10:29:15 UTC
Got a core dump and the problem seems to occur at line 1307 in source3/passwd/lookup_sid.c:

1303:       num_not_cached = 0;
1304:
1305:        for (i=0; i<num_sids; i++) {
1306:		if (ids[i].type == ID_TYPE_NOT_SPECIFIED) {
1307:                        switch (wbc_ids[num_not_cached].type) {
1308:                        case WBC_ID_TYPE_UID:

Checking with GDB "num_not_cached" is 68, and trying to print
wbc_ids[68] gives:

gdb) print wbc_ids[66]
$21 = {type = WBC_ID_TYPE_NOT_SPECIFIED, id = {uid = 0, gid = 0}}
(gdb) print wbc_ids[67]
$22 = {type = WBC_ID_TYPE_NOT_SPECIFIED, id = {uid = 48, gid = 48}}
(gdb) print wbc_ids[68]
Cannot access memory at address 0x81b339000
(gdb) print num_not_cached
$23 = 68
(gdb) print num_sids
$24 = 245

Unfortunately the code seems to reuse num_not_cached so it's difficult to see what it was at the time of the talloc_array() call a bit earlier...

The user connected at this particular time is a member of 24 Unix groups (90 AD groups of which 24 has gidNumber set).
Comment 2 Peter Eriksson 2021-01-26 15:35:02 UTC
Added some more debugging checks to that code and it does look like problem is in the first loop in sids_to_unixids where:

              if (idmap_cache_find_sid2unixid(&sids[i], &ids[i], &expired)
                    && !expired)
                {
                        continue;
                }

This check misses to account for the case where ids[i].type already is set to ID_TYPE_NOT_SPECIFIED (0) causing the loop a bit down to hit a buffer overrun.

I modified the code like this:

-- lookup_sid.c.ORIG	2021-01-26 16:25:59.235435000 +0100
+++ lookup_sid.c	2021-01-26 16:26:32.450740000 +0100
@@ -1247,7 +1247,7 @@
 {
 	struct wbcDomainSid *wbc_sids = NULL;
 	struct wbcUnixId *wbc_ids = NULL;
-	uint32_t i, num_not_cached;
+	uint32_t i, num_not_cached, cached_not_specified;
 	wbcErr err;
 	bool ret = false;
 
@@ -1257,7 +1257,8 @@
 	}
 
 	num_not_cached = 0;
-
+	cached_not_specified = 0;
+	
 	for (i=0; i<num_sids; i++) {
 		bool expired;
 		uint32_t rid;
@@ -1277,6 +1278,9 @@
 		if (idmap_cache_find_sid2unixid(&sids[i], &ids[i], &expired)
 		    && !expired)
 		{
+		  if (ids[i].type == ID_TYPE_NOT_SPECIFIED)
+		    cached_not_specified++;
+		  
 			continue;
 		}
 		ids[i].type = ID_TYPE_NOT_SPECIFIED;
@@ -1287,6 +1291,10 @@
 	if (num_not_cached == 0) {
 		goto done;
 	}
+	if (cached_not_specified > 0)
+	  DEBUG(1, ("DANGER, WILL ROBINSON, DANGER! (cached_not_specified=%d, num_not_cached=%d)\n",
+		    cached_not_specified, num_not_cached));
+	
 	wbc_ids = talloc_array(talloc_tos(), struct wbcUnixId, num_not_cached);
 	if (wbc_ids == NULL) {
 		goto fail;


and ran a quick test on a test server and got:

[2021/01/26 15:50:19.497913,  0] ../../source3/smbd/server.c:1784(main)
  smbd version 4.13.4-LiU started.
  Copyright Andrew Tridgell and the Samba Team 1992-2020
[2021/01/26 15:50:19.502293,  1] ../../source3/profile/profile_dummy.c:30(set_profile_level)
  INFO: Profiling support unavailable in this build.
[2021/01/26 15:50:20.024415,  1] ../../source3/passdb/lookup_sid.c:1295(sids_to_unixids)
  DANGER, WILL ROBINSON, DANGER! (cached_not_specified=2, num_not_cached=1)
[2021/01/26 15:50:20.025579,  1] ../../source3/smbd/files.c:305(file_init_global)
  file_init_global: Information only: requested 7537572 open files, 59392 are available.
[2021/01/26 15:50:20.032817,  0] ../../lib/util/become_daemon.c:135(daemon_ready)
  daemon_ready: daemon 'smbd' finished starting up and ready to serve connections
[2021/01/26 15:50:42.019453,  1] ../../source3/passdb/lookup_sid.c:1295(sids_to_unixids)
  DANGER, WILL ROBINSON, DANGER! (cached_not_specified=2, num_not_cached=272)
Comment 3 Peter Eriksson 2021-01-26 15:57:13 UTC
Added an assertion check in the second loop:

>       num_not_cached_size = num_not_cached;
>       num_not_cached = 0;
>
>        for (i=0; i<num_sids; i++) {
>                if (ids[i].type == ID_TYPE_NOT_SPECIFIED) {
>                  if (num_not_cached >= num_not_cached_size) {
>                    DEBUG(1, ("ERROR - num_not_cached(%d) >= num_not_cached_size(%d), i=%d, num_sids=%d\n",
>                              num_not_cached, num_not_cached_size, i, num_sids));
>                    abort();
>                  }
>                        switch (wbc_ids[num_not_cached].type) {


Results in this when starting Samba:

[2021/01/26 16:54:20.002903,  1] ../../source3/passdb/lookup_sid.c:1295(sids_to_unixids)
  DANGER, WILL ROBINSON, DANGER! (cached_not_specified=2, num_not_cached=2)
[2021/01/26 16:54:20.014793,  1] ../../source3/passdb/lookup_sid.c:1316(sids_to_unixids)
  ERROR - num_not_cached(3) >= num_not_cached_size(2), i=5, num_sids=6
[2021/01/26 16:54:20.015263,  0] ../../lib/util/fault.c:159(smb_panic_log)
Comment 4 Peter Eriksson 2021-01-26 18:54:21 UTC
Created attachment 16407 [details]
Make sids_to_unixids() in source3/passdb/lookup_sids.c more robust

Found one way to force this to happen by killing winbindd and restarting it without killing and restarting the smbd daemons.

With my patch (se attached file) applied I then get these errors instead when trying to connect via "smbclient -k" after having killed and restarted winbindd:

[2021/01/26 19:42:05.445256,  1] ../../source3/passdb/lookup_sid.c:1295(sids_to_unixids)
  DANGER, WILL ROBINSON, DANGER! (cached_not_specified=1, num_not_cached=270)
[2021/01/26 19:42:05.542645,  1] ../../source3/passdb/lookup_sid.c:1316(sids_to_unixids)
  ERROR - num_not_cached(270) >= num_not_cached_size(270), i=290, num_sids=294
[2021/01/26 19:42:05.542734,  1] ../../source3/auth/auth_generic.c:167(auth3_generate_session_info_pac)
  Failed to map kerberos pac to server info (NT_STATUS_NO_MEMORY)


(We have a watchdog process that checks for unresponsive samba servers (smbd) which typically happens every 10:th hour on busy servers. One workaround we've found to "unfreeze" them has been to kill and restart winbindd. By just restarting winbindd we don't cause user interruption..)
Comment 5 Peter Eriksson 2021-01-27 12:52:16 UTC
Ah! Found it! The problem occurs when:

[2021/01/27 12:43:28.106396,  0] ../../source3/lib/idmap_cache.c:60(idmap_cache_find_sid2unixid)
  Parsing value for key [IDMAP/SID2XID/S-1-18-1]: value=[-1:N]
[2021/01/27 12:43:28.106428,  1] ../../source3/passdb/lookup_sid.c:1283(sids_to_unixids)
  ID_TYPE_NOT_SPECIFIED and NOT expired: i=17, rid=-1)

Ie, when we have a non-expired negative idmap cache entry!

Looking at the code in idmap_cache.c:idmap_cache_find_sid2unixid() it's not so surprising. When it finds a negative cache entry(-1:N) that hasn't expired it sets the type to ID_TYPE_NOT_SPECIFIED and returns success - which then triggers the bug in lookup_sid.c:sids_to_unixids().

Possibly one should use some other key to distinguish between non-expired negative cache entries and not-yet-looked-up entries, but one easy fix to avoid the buffer overrun is:

-- source3/passdb/lookup_sid.c.ORIG    2021-01-26 16:25:59.235435000 +0100
+++ source3/passdb/lookup_sid.c 2021-01-27 13:18:20.264864000 +0100
@@ -1277,7 +1278,8 @@
                if (idmap_cache_find_sid2unixid(&sids[i], &ids[i], &expired)
                    && !expired)
                {
-                       continue;
+                       if (ids[i].type != ID_TYPE_NOT_SPECIFIED)
+                               continue;
                }
                ids[i].type = ID_TYPE_NOT_SPECIFIED;
                memcpy(&wbc_sids[num_not_cached], &sids[i],

This kinds of defeats the purpose of the negative cache entry - but fixing that probably requires adding a new ID_TYPE_NEGATIVE_MATCH type or something so one can avoid the non-expired Sid being looked up again with wbcSidsToUnixIds().
Comment 6 Peter Eriksson 2021-01-27 12:56:25 UTC
Created attachment 16410 [details]
Fix buffer overrun and make sids_to_unixids() more robust
Comment 7 Jeremy Allison 2021-01-27 18:32:41 UTC
Great work Peter ! Thanks for tracking this down. We'll take a look at this patch soon.
Comment 8 Peter Eriksson 2021-02-18 12:16:57 UTC
Any chance this might be fixed in 4.14.0? Had a quick look at the source and the same problem seems to exist in 4.14.0rc3.

It's a bit of a security problem - we have had at least one case of a user getting included into groups they should not be due to the buffer overrun - and thus being able to delete files they shouldn't have...

(A user got gid 0 included into his group list - and a file on the install server just happended to have group 0 as group set with full rights, and the installer he was running tried to clean up after itself - pof! install.exe gone :-)

I have a local patch I use so I'll survive, but I guess others might be interested too...
Comment 9 Andrew Bartlett 2021-02-18 22:13:07 UTC
Please report security concerns to security@samba.org in the future please.

https://wiki.samba.org/index.php/Samba_Security_Process#Reporting_Security_Defects_in_Samba

I've marked this embargoed for now and asked Red Hat for a CVE.

Thanks,

Andrew Bartlett
Comment 10 Volker Lendecke 2021-02-20 17:30:34 UTC
Created attachment 16466 [details]
alternative patch

Alternative patch. Haven't run a pipeline as this would be publically visible.
Comment 11 Jeremy Allison 2021-02-23 00:39:01 UTC
+1 on the alternative patch from Volker. Peter, can you also test with Volker's patch to ensure it fixes the problem ?

In examining Volker's patch I did notice another potential problem with this code, I think there's an uninitialized memory read here also. I'll follow up with another comment.
Comment 12 Jeremy Allison 2021-02-23 00:52:21 UTC
I think the uninitialized memory read can occur as follows.

Looking at the code in master source3/passdb/lookup_sid.c:sids_to_unixids() after Volker's patch is applied (line numbers).

First we try and look up the given SIDs in global_sid_Unix_Users,
global_sid_Unix_Groups, and the cache, as below.

1286         for (i=0; i<num_sids; i++) {
1287                 bool expired;
1288                 uint32_t rid;
1289 
1290                 if (sid_peek_check_rid(&global_sid_Unix_Users,
1291                                        &sids[i], &rid)) {
1292                         ids[i].type = ID_TYPE_UID;
1293                         ids[i].id = rid;
1294                         bitmap_set(found, i);
1295                         continue;
1296                 }
1297                 if (sid_peek_check_rid(&global_sid_Unix_Groups,
1298                                        &sids[i], &rid)) {
1299                         ids[i].type = ID_TYPE_GID;
1300                         ids[i].id = rid;
1301                         bitmap_set(found, i);
1302                         continue;
1303                 }
1304                 if (idmap_cache_find_sid2unixid(&sids[i], &ids[i], &expired)
1305                     && !expired)
1306                 {
1307                         bitmap_set(found, i);
1308                         continue;
1309                 }
1310                 ids[i].type = ID_TYPE_NOT_SPECIFIED;
1311                 memcpy(&wbc_sids[num_not_cached], &sids[i],
1312                        ndr_size_dom_sid(&sids[i], 0));
1313                 num_not_cached += 1;
1314         }

So wbc_sids now contains entries for the sids we failed
to look up, of length num_not_cached.

We now create a new array wbc_ids to receive the id's of the SIDs we were unable to look
up in lines:

1318         wbc_ids = talloc_array(talloc_tos(), struct wbcUnixId, num_not_cached);
1319         if (wbc_ids == NULL) {
1320                 goto fail;
1321         }

and initialize it to WBC_ID_TYPE_NOT_SPECIFIED here.

1322         for (i=0; i<num_not_cached; i++) {
1323                 wbc_ids[i].type = WBC_ID_TYPE_NOT_SPECIFIED;
1324         }

Note that WE DO NOT initialize wbc_ids[i].id.gid to -1 here (I
think we should), meaning wbc_ids[i].id.gid is uninitialized
memory for all values of i from 0 -> num_not_cached.

Now we lookup the SIDs in winbindd using:

1325         err = wbcSidsToUnixIds(wbc_sids, num_not_cached, wbc_ids);
1326         if (!WBC_ERROR_IS_OK(err)) {
1327                 DEBUG(10, ("wbcSidsToUnixIds returned %s\n",
1328                            wbcErrorString(err)));
1329         }

However, if we look inside wbcSidsToUnixIds()->wbcCtxSidsToUnixIds()
we find in nsswitch/libwbclient/wbc_idmap.c:

375         for (i=0; i<num_sids; i++) {
376                 struct wbcUnixId *id = &ids[i];
377                 char *q;
378                 int error = 0;
379 
380                 switch (p[0]) {
381                 case 'U':
382                         id->type = WBC_ID_TYPE_UID;
383                         id->id.uid = smb_strtoul(p+1,
384                                                  &q,
385                                                  10,
386                                                  &error,
387                                                  SMB_STR_STANDARD);
388                         break;
389                 case 'G':
390                         id->type = WBC_ID_TYPE_GID;
391                         id->id.gid = smb_strtoul(p+1,
392                                                  &q,
393                                                  10,
394                                                  &error,
395                                                  SMB_STR_STANDARD);
396                         break;
397                 case 'B':
398                         id->type = WBC_ID_TYPE_BOTH;
399                         id->id.uid = smb_strtoul(p+1,
400                                                  &q,
401                                                  10,
402                                                  &error,
403                                                  SMB_STR_STANDARD);
404                         break;
405                 default:
406                         id->type = WBC_ID_TYPE_NOT_SPECIFIED;
407                         q = strchr(p, '\n');
408                         break;
409                 };

Note that for the case where we're returning WBC_ID_TYPE_NOT_SPECIFIED
we do *NOT* write into id->id.gid, which means it's left uninitialized.

Back inside source3/passdb/lookup_sid.c:sids_to_unixids() we have:

1333         for (i=0; i<num_sids; i++) {
1334                 if (bitmap_query(found, i)) {
1335                         continue;
1336                 }
1337 
1338                 switch (wbc_ids[num_not_cached].type) {
1339                 case WBC_ID_TYPE_UID:
1340                         ids[i].type = ID_TYPE_UID;
1341                         ids[i].id = wbc_ids[num_not_cached].id.uid;
1342                         break;
1343                 case WBC_ID_TYPE_GID:
1344                         ids[i].type = ID_TYPE_GID;
1345                         ids[i].id = wbc_ids[num_not_cached].id.gid;
1346                         break;
1347                 default:
1348                         /* The types match, and wbcUnixId -> id is a union anyway */
1349                         ids[i].type = (enum id_type)wbc_ids[num_not_cached].type;
1350                         ids[i].id = wbc_ids[num_not_cached].id.gid;
1351                         break;
1352                 }
1353                 num_not_cached += 1;
1354         }

I think on line:

1350                         ids[i].id = wbc_ids[num_not_cached].id.gid;

we are assigning uninitialized memory into ids[i].id.

If we add a line here:

1322         for (i=0; i<num_not_cached; i++) {
1323                 wbc_ids[i].type = WBC_ID_TYPE_NOT_SPECIFIED;
++++                 wbc_ids[i].id,gid = (uint32_t)-1;
1324         }

This should fix that issue.

Volker, do you concur ?
Comment 13 Jeremy Allison 2021-02-23 02:09:24 UTC
Created attachment 16467 [details]
git-am fix for master.

Volker, this is your patch with my RB+ and my additional uninitialized memory fix on top.

Peter, can you test this and confirms it fixes the bug ?

Thanks,

Jeremy.
Comment 14 Peter Eriksson 2021-02-23 08:26:05 UTC
I'm currently testing Volker's first patch on our most busy production server (about 1500-2000 concurrent users). With a small addition - a SMB_ASSERT check to really make sure the process hard-crashes if the buffer overrun occurs:


--- a/source3/passdb/lookup_sid.c.orig	2021-02-22 08:49:22.141016000 +0100
+++ a/source3/passdb/lookup_sid.c	2021-02-22 08:51:23.396299000 +0100
@@ -1249,7 +1249,7 @@
 	struct wbcDomainSid *wbc_sids = NULL;
 	struct wbcUnixId *wbc_ids = NULL;
 	struct bitmap *found = NULL;
-	uint32_t i, num_not_cached;
+	uint32_t i, num_not_cached, num_not_cached_size;
 	wbcErr err;
 	bool ret = false;
 
@@ -1309,15 +1309,17 @@
 			   wbcErrorString(err)));
 	}
 
+	num_not_cached_size = num_not_cached;
 	num_not_cached = 0;
 
 	for (i=0; i<num_sids; i++) {
-		struct wbcUnixId *wbc_id = &wbc_ids[num_not_cached];
+	  	struct wbcUnixId *wbc_id;
 
 		if (bitmap_query(found, i)) {
 			continue;
 		}
-
+		SMB_ASSERT(num_not_cached < num_not_cached_size);
+		wbc_id = &wbc_ids[num_not_cached];
 		switch (wbc_id->type) {
 		case WBC_ID_TYPE_UID:
 			ids[i].type = ID_TYPE_UID;

Have been running the test patch since last evening and so far it's looking good. No core dumps atleast...

I can apply the additional fixes and test it again, but I can't do that (on the production server, I'll test it on our test server) until this evening (prefer not to kill the user's smbd processes during office hours :-)
Comment 15 Peter Eriksson 2021-02-23 23:03:23 UTC
A short status update:

I'm now running with the updated patch (and my SMB_ASSERT check) on our most busy server (for an hour now - but it's midnight so not much users :-)

I've run our test server (only automated tests, and me) with it since lunch time (about 12 hours).

Looking good so far. No core dumps and no errors from our testing/monitoring system.

I'll report again tomorrow after the workday when we've had a full day with it in use.
Comment 16 Peter Eriksson 2021-02-24 16:43:06 UTC
Ok, everything is looking good on our production server. No core dumps or errors found after todays work hours.

- Peter
Comment 17 Jeremy Allison 2021-02-24 19:31:27 UTC
OK, looks good then. I'll try and write up a CVE text for this to get it ready for release.
Comment 18 Jeremy Allison 2021-02-25 21:54:18 UTC
OK, I'm thinking about the CVE for this, and if it's externally exploitable. The problem is caused by:

"Looking at the code in idmap_cache.c:idmap_cache_find_sid2unixid() it's not so surprising. When it finds a negative cache entry(-1:N) that hasn't expired it sets the type to ID_TYPE_NOT_SPECIFIED and returns success - which then triggers the bug in lookup_sid.c:sids_to_unixids()."

An authenticated attacker could send a file ACL containing explicitly non-mappable SIDs in order to fill the negative cache with ID_TYPE_NOT_SPECIFIED types. But then a new logon has to occur containing a token including one of those SIDs I think.

It can obviously happen (that's how you tracked it down) but I can't think of any way to explicitly make this happen on demand.

Peter, can you give a little more context on how this occurred so I can understand the possible threat model ?
Comment 19 Peter Eriksson 2021-02-25 22:26:13 UTC
> On 25 Feb 2021, at 22:54, samba-bugs@samba.org wrote:
> 
> https://bugzilla.samba.org/show_bug.cgi?id=14571
> 
> --- Comment #18 from Jeremy Allison <jra@samba.org> ---
> OK, I'm thinking about the CVE for this, and if it's externally exploitable.
> The problem is caused by:
> 
> "Looking at the code in idmap_cache.c:idmap_cache_find_sid2unixid() it's not so
> surprising. When it finds a negative cache entry(-1:N) that hasn't expired it
> sets the type to ID_TYPE_NOT_SPECIFIED and returns success - which then
> triggers the bug in lookup_sid.c:sids_to_unixids()."
> 
> An authenticated attacker could send a file ACL containing explicitly
> non-mappable SIDs in order to fill the negative cache with
> ID_TYPE_NOT_SPECIFIED types. But then a new logon has to occur containing a
> token including one of those SIDs I think.
> 
> It can obviously happen (that's how you tracked it down) but I can't think of
> any way to explicitly make this happen on demand.
> 

In our case the ACL was already assigned to the file (possibly via NFS -> ZFS, or locally on the server when the files initially was copied from the old file servers (Windows-based or some other non-Samba server), or via rsync (but the rsync ACL code is seriously buggy when it comes to NFSv4/ZFS ACLs so we haven’t been able to use it).

So I guess a way to get it there is using some side band method (NFS or locally on the server).

- Peter
Comment 20 Peter Eriksson 2021-02-25 22:30:10 UTC
The reason I noticed it was really a combination of separate issues:

1. I got a problem report where a another sysadmin had installed a Windows program on a client computer from a software install share on one of “my” servers, and somehow had managed to delete the INSTALL.EXE executable from that share. Which he shouldn’t have been able to do since he wasn’t a member of the group that had write access to the share in question.  At first I didn’t think much of this (thinking it probably had been some other sysadmin that had deleted that file by mistake or something) since I didn’t see anything obviously wrong with the ACLs protecting the files.

2. When testing this I noticed that I sometimes got connection denied when mounting a share via smbclient, but that it always worked at the second attempt… And looking at the log files it takes about core dumps or something.

3. I changed a setting for core dump files to save them to a central location, and I noticed a number of cores from smbd…

4. Debugging those cores I noticed that it happened in the code in question.

5. I think got to think about #1 above… And looked a bit closer at the ACL protecting those files. And noticed that it contained a  group wheel (gid 0): full access entry. So if the buffer overrun happened to walk into zeroed memory instead of an unmapped memory then it wasn’t so surprising that they got full access to that file.

Now, the _vast_ majority of shares/file/directories here does not have group 0 (wheel/root) ACLs set on them (and the one it question shouldn’t have had it set) so if it always is gid zero that get’s included then it’s probably no big deal - but I guess that is depended on how the memory allocator works...

I think one reason I noticed the core dumps might be that FreeBSD 12.2’s (I had recently upgraded that server to it) memory allocation stuff is slightly different from FreeBSD 11.3 (and FreeBSD 12.2 also uses newer C compilers). The bug just didn’t cause core dumps on our FreeBSD 11 servers (that I had noticed anyway).

- Peter


> On 25 Feb 2021, at 22:54, samba-bugs@samba.org wrote:
> 
> https://bugzilla.samba.org/show_bug.cgi?id=14571
> 
> --- Comment #18 from Jeremy Allison <jra@samba.org> ---
> OK, I'm thinking about the CVE for this, and if it's externally exploitable.
> The problem is caused by:
> 
> "Looking at the code in idmap_cache.c:idmap_cache_find_sid2unixid() it's not so
> surprising. When it finds a negative cache entry(-1:N) that hasn't expired it
> sets the type to ID_TYPE_NOT_SPECIFIED and returns success - which then
> triggers the bug in lookup_sid.c:sids_to_unixids()."
> 
> An authenticated attacker could send a file ACL containing explicitly
> non-mappable SIDs in order to fill the negative cache with
> ID_TYPE_NOT_SPECIFIED types. But then a new logon has to occur containing a
> token including one of those SIDs I think.
> 
> It can obviously happen (that's how you tracked it down) but I can't think of
> any way to explicitly make this happen on demand.
> 
> Peter, can you give a little more context on how this occurred so I can
> understand the possible threat model ?
> 
> -- 
> You are receiving this mail because:
> You reported the bug.
Comment 21 Jeremy Allison 2021-03-04 21:21:00 UTC
I'm writing the CVE text for this now. It's not easy :-).
Comment 22 Jeremy Allison 2021-03-04 22:03:20 UTC
Peter, do you want to be credited in the CVE writeup ? If so, how shall I credit you ?
Comment 23 Jeremy Allison 2021-03-04 22:34:43 UTC
Created attachment 16499 [details]
Preliminary cut at a CVE text.

First draft of a CVE text. Peter, Volker and Karolin could you take a look at this ? Peter, it would help greatly if you can tell me how to credit you. Karolin, how far back should we go on release updates ?

The Linux distros can let me know what versions they still support that will need back-ports (the code looks like it might have problems from 3.6.0 onwards) once the bug is opened to samba-vendors.
Comment 24 Jeremy Allison 2021-03-04 22:37:31 UTC
Adding Jim (SuSE) and Gunther (I think Andreas is out on leave at the moment) so we can get heads-up on what versions will need back-ports (I should be able to help).
Comment 25 Peter Eriksson 2021-03-04 23:17:27 UTC
(In reply to Jeremy Allison from comment #23)

Your CVE text looks fine to me. I'm no expert on CVE texts though :-)


> Peter, do you want to be credited in the CVE writeup ? If so, how shall I credit you ?

Umm. Sure. I don't know how you normally do that though.

"Peter Eriksson, IT Department, Linköping University" perhaps? Or do you need something else/more? "Peter Eriksson" is a fairly common name and there are multiple ones at our university, but right now I think I'm the only one working at the IT Dept :-). The nickname/handle some people know me as is "pen" but that might be to overdo it.
Comment 26 Noel Power 2021-03-05 12:19:56 UTC
(In reply to Jeremy Allison from comment #24)


On our side assuming >= 3.6 then we have a whole pile of affected releases :-)

sle11-sp4   samba 3.6
sle12sp2    samba 4.4.2
sle12sp3/4  samba 4.6.16
sle12sp5    samba 4.10.18
sle15-sp0   samba 4.7.11
sle15-sp1   samba 4.9.5
Comment 27 Jeremy Allison 2021-03-05 19:25:02 UTC
Thanks Peter and thanks for the review Volker. I'll wait for Karolin to respond with the list of versions we'll need to release and then I'll update the CVE text to look more like a "final" version.

Noel, ping me if you need help with back-ports. The code changes are pretty clear, so it shouldn't be hard. I'm happy to help review any changes.
Comment 28 Noel Power 2021-03-08 09:23:20 UTC
(In reply to Jeremy Allison from comment #27)
Thanks Jeremy!, think the changes seem straight forward, review would be welcome. I'll start attaching here as and when I get to them
Comment 29 Noel Power 2021-03-09 08:44:13 UTC
Created attachment 16505 [details]
backport to 3.6
Comment 30 Noel Power 2021-03-09 08:44:54 UTC
Created attachment 16506 [details]
backport to 4.4
Comment 31 Noel Power 2021-03-09 08:45:47 UTC
Created attachment 16507 [details]
backport to 4.6
Comment 32 Noel Power 2021-03-09 08:46:35 UTC
Created attachment 16508 [details]
backport to 4.7
Comment 33 Noel Power 2021-03-09 08:47:16 UTC
Created attachment 16509 [details]
backport to 4.9
Comment 34 Noel Power 2021-03-09 08:48:36 UTC
Created attachment 16510 [details]
backport to 4.10
Comment 35 Noel Power 2021-03-09 08:49:20 UTC
Created attachment 16511 [details]
backport to 4.11
Comment 36 Noel Power 2021-03-09 08:52:37 UTC
(In reply to Noel Power from comment #28)
attached backports for 3.6, 4.4, 4.6, 4.7, 4.9, 4.10, 4.11 (as mentioned in comment #26) These only differ in line numbers and minor context diffs expect for the patch for 3.6 (and too a lesser extent 4.4)
Comment 37 Jeremy Allison 2021-03-09 19:26:15 UTC
Comment on attachment 16511 [details]
backport to 4.11

LGTM.
Comment 38 Andrew Bartlett 2021-03-10 00:34:00 UTC
Comment on attachment 16499 [details]
Preliminary cut at a CVE text.

Make sure when the versions are updated here that the reporters full affiliation is also listed.
Comment 39 Karolin Seeger 2021-03-10 10:09:13 UTC
Is there a chance to get this one ready for the security release on March 24?
That would mean, all patches, reviews and the advisory would be good to go until Sunday, March 14.
Comment 40 Noel Power 2021-03-10 13:55:21 UTC
(In reply to Karolin Seeger from comment #39)
hmm there appears to be a problem with older backports (yes I know these are no longer supported here)

e.g. at least 4.9.5, 4.6.16 seem to fail the samba.tests.posixacl test

mostly likely samba 4.4.2, 4.7.11 also fail (no test available for 3.6 but probably same)

note: the other versions we *SUSE) have 4.10.18, 4.11.14 seem fine

I don't see any a problem with the patch so I guess there is something deeper going on, right now I don't have a clue (be glad of anyone familar with this part of the code/test has any ideas :-()
Comment 41 Jeremy Allison 2021-03-10 18:08:25 UTC
make test TESTS=posixacl passes for master, 4.14, 4.13, 4.12 - so I think the overall patch is good.

I'll take a look at the 4.8.x back-port.

We don't yet have a Red Hat contact (Guenther hasn't replied on this) and Andreas is out on leave for a while I think. Karolin, what do we usually do with no vendor follow-up ?
Comment 42 Noel Power 2021-03-10 18:42:32 UTC
(In reply to Jeremy Allison from comment #41)
hmm I think there is an issue with the main patch still,

what I see happening (in 4.9.5) is as follows


test_setntacl_policies_check_getposixacl fails when checking the posix acl count

e.g.

self.assertEquals(posix_acl.count, 15, self.print_posix_acl(posix_acl))

fails with 

self.assertEquals(posix_acl.count, 15, self.print_posix_acl(posix_acl))
AssertionError: a_type: 3
a_perm: 7
gid: 3000000
a_type: 1
a_perm: 6
uid: 1001
a_type: 5
a_perm: 0
a_type: 2
a_perm: 6
a_type: 1
a_perm: 7
uid: 3000000
a_type: 4
a_perm: 7
a_type: 3
a_perm: 5
gid: 3000001
a_type: 3
a_perm: 7
gid: 3000002
a_type: 3
a_perm: 5
gid: 3000003
a_type: 3
a_perm: 7
gid: 3000008
a_type: 6
a_perm: 7

]

there are only 11 acls instead of 15

What's happening is certain dom_sids passed into sids_to_unixids

a) first get marked as 'ID_TYPE_NOT_SPECIFIED' in the first for loop (alls good there)

b) then wbcSidsToUnixIds(...) is called

c) then we goto the second for loop to process thre results of the above

we fall into the default (e.g. WBC_ID_TYPE_BOTH) case and modify ids[i]

then here's where things change for the orig code because at

d) in the third for loop we overwrite ids[i].type/id with the result of legacy_sid_to_xid

for the same test we don't call legacy_sid_to_xid because we test on id[i].type == ID_TYPE_NOT_SPECIFIED, this explains the no. acl diffs because TYPE_BOTH has been overwritten with either GID or UID which generate result in a single acl being added instead of 2

I think we should have set the bitmap flag also in the second loop at c above e.g.


            for (i=0; i<num_sids; i++) {
                if (bitmap_query(found, i)) {
                        continue;
                }

                switch (wbc_ids[num_not_cached].type) {
                case WBC_ID_TYPE_UID:
                        ids[i].type = ID_TYPE_UID;
                        ids[i].id = wbc_ids[num_not_cached].id.uid;
+                       bitmap_set(found, i);
                        break;
                case WBC_ID_TYPE_GID:
                        ids[i].type = ID_TYPE_GID;
                        ids[i].id = wbc_ids[num_not_cached].id.gid;
+                       bitmap_set(found, i);
                        break;
                default:
                        /* The types match, and wbcUnixId -> id is a union anyway */
                        ids[i].type = (enum id_type)wbc_ids[num_not_cached].type;
                        ids[i].id = wbc_ids[num_not_cached].id.gid;
+                       bitmap_set(found, i);
                        break;
                }
                num_not_cached += 1;



what do you think ?
Comment 43 Jeremy Allison 2021-03-10 18:58:55 UTC
OK, I'm going to have to look at this carefully to be sure.

Karolin, I don't think this one is going to be ready for Sunday, March 14. Looks like we might need to have one more go around with the reporter for testing etc.
Comment 44 Jeremy Allison 2021-03-10 19:31:59 UTC
(In reply to Noel Power from comment #42)

Yes, the calls to bitmap_set(found, i) are missing there.

But your patch isn't quite right, we only need to add bitmap_set(found, i) in the second loop for the cases:

case WBC_ID_TYPE_UID:
case WBC_ID_TYPE_GID:

and the (currently missing)

case WBC_ID_TYPE_BOTH:

all of which can be returned from wbcSidsToUnixIds() as a "successful" mapping.

But we mustn't call bitmap_set(found, i) in the WBC_ID_TYPE_NOT_SPECIFIED case, as then it will prevent the fallbacks in the third loop to legacy_sid_to_gid() and legacy_sid_to_uid() for non-mapped id's.

Right now the "default:" case is taking care of the WBC_ID_TYPE_BOTH *and* the WBC_ID_TYPE_NOT_SPECIFIED returns for an id, which isn't correct.

Wow. I don't know how I missed that. I'll update the patch with *lots* of comments to explain what is going on here. This stuff is horribly subtle :-(.
Comment 45 Jeremy Allison 2021-03-10 20:12:50 UTC
Created attachment 16516 [details]
git-am fix for master.

OK, here's what I think will work. I'm marking as "Obsolete" all the other patches (I'm really sorry for obsoleting your work Noel), and I'm folding in the (now no longer strictly required) second patch into the first as it now no longer makes sense on its own.

Volker, I'm really sorry for messing up the initial review and missing the subtle logic of setting the 'found' bitmap flag in the second loop. This screw-up is my fault :-(.

Noel, I'm really glad your testing found this issue, we'd have shipped an incorrect fix without it. Thanks a *LOT* for you help.
Comment 46 Jeremy Allison 2021-03-10 20:15:03 UTC
Peter, I'm very sorry but I think we'll need one more round on testing on your site for this to make sure I haven't broken anything else.

Noel, I'm very sorry for the extra work, but if you can back-port this to your 4.9.x test-rig and try again I have hopes it will pass now.

Sorry for the problem.
Comment 47 Jeremy Allison 2021-03-10 20:49:38 UTC
Created attachment 16517 [details]
backport to 4.9.x

Noel, as an apology for obsoleting all your previous work, here's a back-port for 4.9.x of the master fix. It passes:

make test TESTS=posixacl

here. Let me know if it works for you. Thanks !
Comment 48 Peter Eriksson 2021-03-10 21:51:48 UTC
(In reply to Jeremy Allison from comment #46)

> Peter, I'm very sorry but I think we'll need one more round on testing on your site > for this to make sure I haven't broken anything else.

No problem. I'll start testing asap!

- Peter
Comment 49 Peter Eriksson 2021-03-10 22:37:04 UTC
(In reply to Jeremy Allison from comment #44)

> Wow. I don't know how I missed that. I'll update the patch with *lots* of comments
> to explain what is going on here. This stuff is horribly subtle :-(.

Suggestion: Add an SMB_ASSERT() to make extra sure any future changes to this code doesn't reintroduce the buffer overrun again. It's a simple check but atleast it makes me sleep better at night :-)


--- samba-4.14.0/source3/passdb/lookup_sid.c.ORIG     2021-03-10 22:56:43.829838000 +0100
+++ samba-4.14.0/source3/passdb/lookup_sid.c    2021-03-10 22:58:16.139302000 +0100
@@ -1268,7 +1268,7 @@
        struct wbcDomainSid *wbc_sids = NULL;
        struct wbcUnixId *wbc_ids = NULL;
        struct bitmap *found = NULL;
-       uint32_t i, num_not_cached;
+       uint32_t i, num_not_cached, num_not_cached_size;
        wbcErr err;
        bool ret = false;
 
@@ -1354,6 +1354,7 @@
         * being mapped.
         */
 
+       num_not_cached_size = num_not_cached;
        num_not_cached = 0;
 
        for (i=0; i<num_sids; i++) {
@@ -1361,6 +1362,7 @@
                        continue;
                }
 
+               SMB_ASSERT(num_not_cached < num_not_cached_size);
                switch (wbc_ids[num_not_cached].type) {
                case WBC_ID_TYPE_UID:
                        ids[i].type = ID_TYPE_UID;
Comment 50 Jeremy Allison 2021-03-10 22:58:37 UTC
(In reply to Peter Eriksson from comment #49)

The problem is I find that addition very confusing and makes the code less clear (to me, I'm sorry).
Comment 51 Peter Eriksson 2021-03-10 23:05:42 UTC
(In reply to Jeremy Allison from comment #50)

Would it be clearer if the 'num_not_cached_size' variable was
named 'wbc_ids_size' instead, and assigned at the allocation? Like this:

     wbc_ids = talloc_array(talloc_tos(), struct wbcUnixId, num_not_cached);
     wbc_ids_size = num_not_cached;

or is there a talloc_sizeof() function that could be used in the SMB_ASSERT()?

Ah well. It _shouldn't_ happen anyway. :-)
Comment 52 Jeremy Allison 2021-03-10 23:34:41 UTC
OK, I can live with an assert on 'wbc_ids_size' - that now makes sense :-). I'll add it an update to the master and 4.9.x backports.
Comment 53 Jeremy Allison 2021-03-10 23:50:24 UTC
Created attachment 16519 [details]
git-am fix for master.

Now includes Peter's

SMB_ASSERT(num_not_cached < wbc_ids_size)
Comment 54 Jeremy Allison 2021-03-10 23:53:01 UTC
Created attachment 16520 [details]
backport to 4.9.x

Added SMB_ASSERT.
Comment 55 Noel Power 2021-03-11 09:59:32 UTC
Comment on attachment 16519 [details]
git-am fix for master.

lgtm
Comment 56 Karolin Seeger 2021-03-11 12:46:07 UTC
(In reply to Jeremy Allison from comment #41)
I don't know. What else can we do?
Maybe when opening for vendors someone will take care.
Comment 57 Jeremy Allison 2021-03-11 17:24:46 UTC
Anoop, can you take a look at this one please ?
Comment 58 Jeremy Allison 2021-03-11 17:26:16 UTC
OK, great. Looks like we have a working for for master confirmed by Peter.

Noel, can you re-do the back-ports ? I'm really sorry for the mistake in the initial review, that was completely my fault.
Comment 59 Noel Power 2021-03-11 18:12:07 UTC
(In reply to Jeremy Allison from comment #58)
>Noel, can you re-do the back-ports ?
They are done, I'll upload them after the CI runs have completed
Comment 60 Peter Eriksson 2021-03-11 18:19:07 UTC
(In reply to Jeremy Allison from comment #58)

> OK, great. Looks like we have a working for for master confirmed by Peter.

Yep. Been running the updated code all day on multiple servers (FreeBSD 11.4 & 12.2) with live users (many). No core dumps, no funny business, no strange log output and best of all - no irate users (or at least no problem reports received).

I also took some time to read the patched code just to see if I could find any weak spots (or at least something potentially suspicious) - looking good as far as I can tell!
Comment 61 Jeremy Allison 2021-03-11 18:44:06 UTC
(In reply to Peter Eriksson from comment #60)

Thanks so much for looking it over. C is hard :-).
Comment 62 Noel Power 2021-03-12 11:27:49 UTC
(In reply to Noel Power from comment #59)
CI(s) or at least the release we can run that for have completed and all those that were previously failing now pass \o/

I'll attach the backports now
Comment 63 Noel Power 2021-03-12 11:28:42 UTC
Created attachment 16532 [details]
backport to 3.6
Comment 64 Noel Power 2021-03-12 11:29:20 UTC
Created attachment 16533 [details]
backport to 4.4
Comment 65 Noel Power 2021-03-12 11:29:53 UTC
Created attachment 16534 [details]
backport to 4.11
Comment 66 Noel Power 2021-03-12 11:30:22 UTC
Created attachment 16535 [details]
backport to 4.9
Comment 67 Noel Power 2021-03-12 11:31:07 UTC
Created attachment 16536 [details]
backport to 4.7
Comment 68 Noel Power 2021-03-12 11:31:26 UTC
Created attachment 16537 [details]
backport to 4.10
Comment 69 Noel Power 2021-03-12 11:31:57 UTC
Created attachment 16538 [details]
backport to 4.6
Comment 70 Noel Power 2021-03-12 11:33:03 UTC
Jeremy if you have a change a look in particular at the 3.6 and 4.4.2 versions would be great (as these are a little different, 3.6 especially) and they didn't get any CI love
Comment 71 Jeremy Allison 2021-03-12 19:30:16 UTC
Created attachment 16539 [details]
Updated CVE text with correct attribution and version numbers.
Comment 72 Jeremy Allison 2021-03-12 19:57:44 UTC
OK Karolin, this one looks good to go. I've updated the CVE text with the correct attributions and version numbers. Let me know if you need anything more from me. Thanks ! Jeremy.
Comment 73 Andrew Bartlett 2021-03-14 22:38:33 UTC
I'll work to get the CI and backports for 4.12, 4.13 and 4.14 prepared today so this can get pushed out.
Comment 74 Andrew Bartlett 2021-03-15 01:33:49 UTC
Created attachment 16542 [details]
patch for master (v2)

Patch for master with correct naming convention and CVE tags in commit message.
Comment 75 Andrew Bartlett 2021-03-15 01:34:25 UTC
Created attachment 16543 [details]
patch backported to 4.14 (v2)
Comment 76 Andrew Bartlett 2021-03-15 01:35:12 UTC
Created attachment 16544 [details]
patch backported to 4.13 (v2)
Comment 77 Andrew Bartlett 2021-03-15 01:35:54 UTC
Created attachment 16545 [details]
patch backported to 4.12 (v2)
Comment 78 Andrew Bartlett 2021-03-15 01:37:50 UTC
Per https://wiki.samba.org/index.php/Samba_Security_Process the patches need to be backported and run though CI individually, so there is no confusion.   There is also a naming convention to be aware of, to avoid issues when the patches are downloaded and applied.

This is stricter than on normal bugs, as the stakes are higher.
Comment 79 Andrew Bartlett 2021-03-15 01:51:47 UTC
Created attachment 16546 [details]
Advisory v3

I've updated the advisory with a clearer CVSS 3.1 calculation, which I only get to 6.8 (thankfully).  

Also, we don't use the 'modified' strings in samba, just the base score, to keep it manageable. 

I've also filled out the credits.
Comment 80 Andrew Bartlett 2021-03-15 02:16:35 UTC
Patch for 4.12 builds with -O3 ubuntu 16.04 and Debian 9.  On these platforms it fails with this:

==> /builds/samba/samba.org-security-patches/samba-o3.stderr <==
../../source3/passdb/lookup_sid.c: In function ‘sids_to_unixids’:
../../source3/passdb/lookup_sid.c:1246:6: error: assuming pointer wraparound does not occur when comparing P +- C1 with P +- C2 [-Werror=strict-overflow]
 bool sids_to_unixids(const struct dom_sid *sids, uint32_t num_sids,
      ^
../../source3/passdb/lookup_sid.c:1246:6: error: assuming pointer wraparound does not occur when comparing P +- C1 with P +- C2 [-Werror=strict-overflow]
cc1: all warnings being treated as errors
Comment 81 Andrew Bartlett 2021-03-15 02:17:52 UTC
Assigning to Jeremy to determine next steps.
Comment 82 Karolin Seeger 2021-03-15 08:02:04 UTC
Can we get the reviews here asap? If yes, we could add this one to the upcoming security release...
Comment 83 Karolin Seeger 2021-03-15 08:12:08 UTC
(In reply to Karolin Seeger from comment #82)
Sorry, just realizing the ci-passed -. So unfortunately no chance to be shipped on March 24.
Comment 84 Karolin Seeger 2021-03-15 08:28:55 UTC
Re-assigning to Jeremy again.
Comment 85 Volker Lendecke 2021-03-15 09:21:52 UTC
Created attachment 16548 [details]
patch for 4.12/debian9 failure

No clue why, but the attached patch fixes the 4.12 build failure for me. The effect is the same, WBC_ID_TYPE_NOT_SPECIFIED is the first definition in enum wbcIdType. After more than 25 years of full-time C programming, I still don't understand this language at all.
Comment 86 Volker Lendecke 2021-03-15 09:24:03 UTC
(In reply to Volker Lendecke from comment #85)
> No clue why, but the attached patch fixes the 4.12 build failure for me. The
> effect is the same, WBC_ID_TYPE_NOT_SPECIFIED is the first definition in
> enum wbcIdType. After more than 25 years of full-time C programming, I still
> don't understand this language at all.

By the way -- can someone with more knowledge in standard C explain this to me? I'm really curious.
Comment 87 Noel Power 2021-03-15 11:23:36 UTC
(In reply to Volker Lendecke from comment #86)
hah, I started looking at this too this morning (unfortunately missed the bug update till now) and no, I don't have any explanation (other than I'd guess this is a compiler bug) I did notice though that even without the CVE patch just



        for (i=0; i<num_not_cached; i++) {
                wbc_ids[i].type = WBC_ID_TYPE_NOT_SPECIFIED;
+               wbc_ids[i].id.gid = (uint32_t)-1;
        }

will trigger the error

but alternatively

        for (i=0; i<num_not_cached; i++) {
                wbc_ids[i].type = WBC_ID_TYPE_NOT_SPECIFIED;
        }
+       for (i=0; i<num_not_cached; i++) {
+               wbc_ids[i].id.gid = (uin32_t)-1;
+       }

also fixes the problem with/without the CVE patch (this is on ubuntu 1604, presumably debian 9 has the same gcc ver)
Comment 88 Volker Lendecke 2021-03-15 12:32:08 UTC
(In reply to Noel Power from comment #87)
> (In reply to Volker Lendecke from comment #86)
> hah, I started looking at this too this morning (unfortunately missed the
> bug update till now) and no, I don't have any explanation (other than I'd
> guess this is a compiler bug) 

Whenever in the past I suspected a compiler bug it *ALWAYS* was my lack of understanding of the subtleties in C, no exception. Thus a comment from someone who actually understands C would be highly appreciated.
Comment 89 Peter Eriksson 2021-03-15 12:50:42 UTC
(In reply to Volker Lendecke from comment #88)

> Whenever in the past I suspected a compiler bug it *ALWAYS* was my lack of 
> understanding of the subtleties in C, no exception. Thus a comment from someone who
> actually understands C would be highly appreciated.

I only have... uh... 35 years of experience writing C code, but I can't explain that error message either. So my guess is compiler bug too. Especially since it only pops up on those platforms with that compiler version.

(I wouldn't claim to actually fully understand C these days though. Back in the K&R C days life was much easier :-)
Comment 90 Noel Power 2021-03-15 13:59:15 UTC
(In reply to Volker Lendecke from comment #88)
>Thus a comment from someone who actually understands C would be highly appreciated.

Not that I would ever claim to actually understand C so I hope you don't mind me making another observation, but that a small test program as follows would *seem* to indicate that this could be a compiler bug

#include <stdio.h>
#include <unistd.h>

enum wbcIdType {
        WBC_ID_TYPE_NOT_SPECIFIED,
        WBC_ID_TYPE_UID,
        WBC_ID_TYPE_GID,
        WBC_ID_TYPE_BOTH
};

union wbcUnixId2Container {
	uid_t uid;
	gid_t gid;
};

struct wbcUnixId {
        enum wbcIdType type;
        union wbcUnixId2Container id;
};

static void make_error(void)
{
        struct wbcUnixId wbc_ids[10];
#ifdef TRIGGER
        struct wbcUnixId *wbc_ids_ptr = wbc_ids;
#endif
        unsigned int i, wbc_ids_size;
        wbc_ids_size = 10;
        for (i=0; i < wbc_ids_size; i++) {
                wbc_ids[i].type = WBC_ID_TYPE_NOT_SPECIFIED;
                wbc_ids[i].id.gid = (unsigned int)-1;
        }
#if TRIGGER
        for (i=0; i < wbc_ids_size; i++) {
                wbc_ids_ptr[i].type = WBC_ID_TYPE_NOT_SPECIFIED;
                wbc_ids_ptr[i].id.gid = (unsigned int)-1;
        }
#endif
        if (wbc_ids[0].type == WBC_ID_TYPE_NOT_SPECIFIED) {
                printf("test to satisfy compiler\n");
	}
}

int main(int c, char** argv)
{
	make_error();
	return 0;
}

to cause the error
gcc -DTRIGGER   -Wstrict-overflow=2 -O3 -o test test.c

or not...
gcc -Wstrict-overflow=2 -O3 -o test test.c
Comment 91 Volker Lendecke 2021-03-15 14:06:28 UTC
(In reply to Noel Power from comment #90)
> Not that I would ever claim to actually understand C so I hope you don't
> mind me making another observation, but that a small test program as follows
> would *seem* to indicate that this could be a compiler bug

Oh, great! Thanks, Noel! I wonder if it's worth filing that as a bug with debian.org or gcc. The gcc maintainers should be C-literate enough to point us at a gcc bug report or marking it as INVALID pointing at the C specification explaining why gcc is right.
Comment 92 Peter Eriksson 2021-03-15 14:45:45 UTC
(In reply to Noel Power from comment #90)

Hmm.. Which version of Gcc is triggering that bug? Could it be some #define in the stdio or unistd.h header files on that particular OS-release that is causing it?

Can it be reproduced with some more generic types instead of uid_t & gid_t so unistd.h doesn't have to be included?

I've tried to compile the test.c code on FreeBSD 12.2, Linux (Ubuntu 20) & OmniOS (and even some stone-age Solaris 10) systems, with various GCC (10.2.0, 9.3.0, 8.4.0, 7.5.0, 5.2.0, 4.9.0, 4.8.2) and CLang (10.0.1) compilers and I doesn't seem be be able to get it to complain... 

(Well, except for Gcc 3.4.4 which doesn't understand -Wstrict-overflow :-)
Comment 93 Noel Power 2021-03-15 15:03:29 UTC
(In reply to Volker Lendecke from comment #91)
>I wonder if it' I wonder if it's worth filing that as a bug with debian.org or gcc.
gcc 5.4.0 is pretty old I guess, not sure either if this is debian or gcc specific, it doesn't happen on here on my recent suse (e.g. gcc 7.5) 
(In reply to Peter Eriksson from comment #92)

>Hmm.. Which version of Gcc is triggering that bug? Could it be some #define in the stdio or unistd.h header files on that particular OS-release that is causing it?

gcc gcc 5.4.0

>Can it be reproduced with some more generic types instead of uid_t & gid_t so unistd.h doesn't have to be included?

pretty sure in my first iteration I tried this with pure unsigned int instead of uint_t (and without unist.h)

>I've tried to compile the test.c code on FreeBSD 12.2, Linux (Ubuntu 20) & OmniOS (and even some stone-age Solaris 10) systems, with various GCC (10.2.0, 9.3.0, 8.4.0, 7.5.0, 5.2.0, 4.9.0, 4.8.2) and CLang (10.0.1) compilers and I doesn't seem be be able to get it to complain... 

that's interesting (and more comforting)
Comment 94 Peter Eriksson 2021-03-15 15:17:59 UTC
(In reply to Noel Power from comment #93)

I got it to complain via a Github CI run (Ubuntu 16.04 with Gcc 5.5.0) - with a (modified version of test.c without any #includes) so it's probably a bug with Gcc 5.4 & 5.5, but probably not in Gcc 5.2 and earlier, or any recent one...
Comment 95 Noel Power 2021-03-15 15:55:20 UTC
(In reply to Peter Eriksson from comment #94)
are you sure about gcc 5.5 ? ubuntu 1604 has gcc 5.4.0 or at least the samba ubuntu 1604 CI docker image has that version (it's actually what I was using to test this myself)

gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
Comment 96 Peter Eriksson 2021-03-15 15:59:10 UTC
(In reply to Noel Power from comment #95)

Yes, pretty sure. This is what "gcc -version" from the Github CI run says:

> gcc (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010
> Copyright (C) 2015 Free Software Foundation, Inc.

https://github.com/ptrrkssn/test
Comment 97 Jeremy Allison 2021-03-16 15:57:25 UTC
So just following up (I was on vacation yesterday) - this is almost certainly (nothing is 100% certain with compilers :-) a compiler bug in this specific gcc version, and the fix for that specific version to add in a second loop that does:

        for (i=0; i<num_not_cached; i++) {
                wbc_ids[i].type = WBC_ID_TYPE_NOT_SPECIFIED;
        }
+       for (i=0; i<num_not_cached; i++) {
+               wbc_ids[i].id.gid = (uin32_t)-1;
+       }

for this specific compiler version ?

Which versions of the patch need this horrid change ?
Comment 98 Noel Power 2021-03-16 16:07:17 UTC
(In reply to Jeremy Allison from comment #97)
>Which versions of the patch need this horrid change ?

In terms of CI afaict only 4.12, I guess potentially any of the versions could be built with an older compiler, e.g. I did see this also happen on a fully up to date debian9 container with gcc 6.3.0 so maybe this stretches over more gcc versions than we think

probably safest to modify all ?
Comment 99 Jeremy Allison 2021-03-16 16:10:05 UTC
(In reply to Noel Power from comment #98)

I don't want to modify all with something that's utterly make-work for a broken old compiler. Let's just put the nastyness in the 4.12 back-port with an added comment that this is for a broken gcc-5.x compiler.

I can do the change and re-upload the 4.12 version.
Comment 100 Jeremy Allison 2021-03-16 16:12:04 UTC
(In reply to Noel Power from comment #98)

Oh, I didn't read carefully enough - you're saying this also happens with gcc v.6 also. Do we know if it happens after that ? (My current gcc is version 10.2).
Comment 101 Noel Power 2021-03-16 16:17:35 UTC
(In reply to Jeremy Allison from comment #100)
unfortunately no idea at what version this dissappears, the earliest (since 6.x) that I have here is 7.5 which doesn't exhibit the problem
Comment 102 Noel Power 2021-03-16 16:23:25 UTC
(In reply to Noel Power from comment #101)

personally I think it is enough to have a documented version of the compiler version fix (e.g. 4.12) and leave the others well alone.
Comment 103 Jeremy Allison 2021-03-16 16:30:00 UTC
OK, I'll add that change to the 4.12 version and re-upload.
Comment 104 Jeremy Allison 2021-03-16 16:42:58 UTC
Created attachment 16551 [details]
patch backported to 4.12 (v3)

Added the comment:

+       /*
+        * gcc versions at least 5.x -> 6.x have a bug
+        * with initializing the wbc_ids[i].type and
+        * wbc_ids[i].id.gid to (uint32_t)-1 in the
+        * same loop when using [-Werror=strict-overflow].
+        * The error is:
+        *
+        * ../../source3/passdb/lookup_sid.c:1246:6: error: assuming
+        * pointer wraparound does not occur when comparing P +- C1
+        * with P +- C2 [-Werror=strict-overflow]
+        *
+        * Doing this loop twice fixes the compiler error.
+        * As wbc_ids_size isn't large this isn't too much
+        * of a performance burden.
+        */
Comment 105 Jeremy Allison 2021-03-16 16:45:04 UTC
Noel, can you check if this version fixes the -Werror=strict-overflow compiler bug ?
Comment 106 Noel Power 2021-03-16 17:50:23 UTC
Comment on attachment 16551 [details]
patch backported to 4.12 (v3)

I build this on a the samba ubuntu 16.04 docker image,
also ran 'make quicktest FAIL_IMMEDIATELY=1 TESTS='--include-env=ad_dc'

all working fine
Comment 107 Andrew Bartlett 2021-03-16 20:30:06 UTC
I'm assuming nobody has put the v3 patch for 4.12 under CI yet?  If you do please indicate with a ci? marker.

Also the commit message needs to be updated to explain the extra change.  I'll update the patch and re-upload.
Comment 108 Andrew Bartlett 2021-03-16 20:33:07 UTC
In terms of the versions to patch, Debian 9 and Ubuntu 16.04 were dropped because we dropped support for the python version they shipped with (4.13 needs Python 3.6)

While of course python and gcc versions can float freely of each other on user-built systems, at least this is unlikely to come up in distributions.
Comment 109 Andrew Bartlett 2021-03-16 20:43:21 UTC
Created attachment 16553 [details]
patch backported to 4.12 (v4)

A new version of the patch with the commit message updated per our guideline:

https://wiki.samba.org/index.php/Samba_Release_Planning#Patch_Process

This is now running in GitLab CI on the private CI system.
Comment 110 Andrew Bartlett 2021-03-16 20:56:24 UTC
Comment on attachment 16546 [details]
Advisory v3

We say there is no mitigating factors, but in situations where winbindd is operating correctly can this happen?

I'm just trying to rule out if this can happen on the AD DC in particular, but also a normal domain member with winbind-provided users/groups?

AD DC installations will be patching for another issue shortly and a clear direction on if they are bitten by this also would be helpful to avoid starting that process again when they might not be impacted.
Comment 111 Jeremy Allison 2021-03-16 21:00:27 UTC
(In reply to Andrew Bartlett from comment #110)

It's not a winbindd bug. It's a problem inside smbd that can add unknown memory to the gid list under very specific circumstances involving a cached negative mapping. I don't think this affects the AD-DC server code at all.

It could affect normal member servers though.
Comment 112 Jeremy Allison 2021-03-16 21:07:51 UTC
(In reply to Jeremy Allison from comment #111)

In case I'm not clear, this is a generic problem within smbd, not related to the AD-DC or winbindd sid-to-name lookups at all.
Comment 113 Andrew Bartlett 2021-03-16 22:22:00 UTC
(In reply to Jeremy Allison from comment #112)
I know the code is generic in smbd, but the 3 loops are:

 - well known
 - ask winbindd
 - fallback to passdb

I'm asking if the 'ask winbindd' step fills in enough of the data that the fallback to passdb stage, where we had the issue (if I understood) never happens.

This would reduce the scope significantly on the AD DC for example.
Comment 114 Jeremy Allison 2021-03-16 22:54:15 UTC
(In reply to Andrew Bartlett from comment #113)

The bug was in the processing of the second loop - the "ask winbindd" stage if you will.

The first loop mis-counts the number of elements being asked of winbindd (the number of elements being asked is OK, it's the processing of the reply that is in error), so the second loop would copy off the end of the array returned by winbindd, thus potentially putting a zero into the gid field.
Comment 115 Andrew Bartlett 2021-03-16 23:56:04 UTC
Comment on attachment 16553 [details]
patch backported to 4.12 (v4)

CI passed on the 4.12 patch!

That was a saga, thanks to everyone who pitched in to fix that.
Comment 116 Andrew Bartlett 2021-03-16 23:57:29 UTC
Assigning to Karolin for the next available security release.
Comment 117 Karolin Seeger 2021-04-13 11:30:21 UTC
Opening bug report for vendors.
Planned release date: Tuesday, April 29 2021
Comment 118 Erik Damrose 2021-04-13 12:35:00 UTC
Can you please clarify if the planned release date is Tuesday, _April 27_ 2021 or _Thursday_, April 29 2021?
Comment 119 Karolin Seeger 2021-04-14 06:31:40 UTC
(In reply to Erik Damrose from comment #118)
Thursday, April 29, sorry
Comment 120 Noel Power 2021-04-22 08:28:40 UTC
Created attachment 16595 [details]
backport to 3.6

Seems there is a typo in a variable name and a compiler error, worse I remember fixing these but must have uploaded the previous version
Comment 121 Jeremy Allison 2021-04-23 21:25:56 UTC
Comment on attachment 16595 [details]
backport to 3.6

OK, this looks OK to me. It doesn't apply to my local v3-6-test tree but I think that may have drifted from the one you are shipping, and the logic looks good to me.
Comment 122 Samba QA Contact 2021-04-29 08:09:37 UTC
This bug was referenced in samba v4-14-stable (Release samba-4.14.4):

55b8f31679b57545d7808cae8527663d770b10bc
Comment 123 Samba QA Contact 2021-04-29 08:11:00 UTC
This bug was referenced in samba v4-13-stable:

39d9e71cfcff17395ba26c076e2dc5fe0ddc1d65
Comment 124 Samba QA Contact 2021-04-29 08:13:06 UTC
This bug was referenced in samba v4-12-stable (Release samba-4.12.15):

6a6a33274c0829bb48c280f65c06213a185bee81
Comment 125 Karolin Seeger 2021-04-29 09:07:23 UTC
Samba 4.14.4, 4.13.8 and 4.12.15 have been shipped to address this defect.
Comment 126 Karolin Seeger 2021-04-29 09:07:48 UTC
Pushed to autobuild-master.
Comment 127 Samba QA Contact 2021-04-29 09:10:27 UTC
This bug was referenced in samba v4-14-test:

55b8f31679b57545d7808cae8527663d770b10bc
Comment 128 Samba QA Contact 2021-04-29 09:11:59 UTC
This bug was referenced in samba v4-13-test:

32c511d439b23d880133b8d9d32274eba3952a88
Comment 129 Samba QA Contact 2021-04-29 09:14:22 UTC
This bug was referenced in samba v4-12-test:

6a6a33274c0829bb48c280f65c06213a185bee81
Comment 130 Samba QA Contact 2021-04-29 09:56:02 UTC
This bug was referenced in samba master:

75ad84167f5d2379557ec078d17c9a1c244402fc
Comment 131 Andrew Bartlett 2021-05-03 00:51:27 UTC
Removing embargo and CC of samba-vendor.  If you wish to follow along and were on that CC please subscribe manually.
Comment 132 Karolin Seeger 2021-05-03 06:45:16 UTC
Closing out bug report.

Thanks!
Comment 133 Samba QA Contact 2021-05-11 09:47:41 UTC
This bug was referenced in samba v4-13-test:

39d9e71cfcff17395ba26c076e2dc5fe0ddc1d65
Comment 134 Samba QA Contact 2021-05-11 09:48:43 UTC
This bug was referenced in samba v4-13-stable:

32c511d439b23d880133b8d9d32274eba3952a88