Date: Thu, 7 Jan 2021 19:08:59 +0000 From: Nathaniel Turner <nturner@exagrid.com> To: "security@samba.org" <security@samba.org> Subject: Bad password count not incremented atomically [-- Attachment #1 [details] --] [-- Type: multipart/alternative, Encoding: 7bit, Size: 14K --] Hi folks, I was doing a little "defense in depth" work, with the goal of limiting an attacker's ability to brute force a samba u ser's password (should they get past other defenses, such as firewalls and "hosts allow" configurations). In the process of validating my new configuration, I found that samba's bad password count logic appears to have a sim ple race condition: Each connection appears to get separate process, and each process loads, increments, and saves the bad password count without any coordination. So with many clients attacking, the "bad lockout attempt" limit can be e xceeded by a significant margin. I did my simple tests with these settings, but the exact numbers don't really matter: pdbedit -P "bad lockout attempt" -C 10 pdbedit -P "lockout duration" -C 1 pdbedit -P "reset count minutes" -C 1 The attached "test-smb-auth-rate-limits" bash script is a trimmed version of a larger test suite, but hopefully gets the idea across. When run against a pretty low powered VM, I see the lockout attempt limit exceeded by about 10,000% (we get 1066 auth attempts within a minute, instead of 10): $ test-smb-auth-rate-limits nturner vm-000-4698.maas.local smb-0 2021-01-07 18:15:41+00:00 Test: 'attack_smb nturner vm-000-4698.maas.local smb-0 20 240' 2021-01-07 18:16:54+00:00 199 do_connect: Connection to vm-000-4698.maas.local failed (Error NT_STATUS_IO_DEVICE_E RROR) 2021-01-07 18:16:54+00:00 9592 do_connect: Connection to vm-000-4698.maas.local failed (Error NT_STATUS_IO_TIMEOUT) 2021-01-07 18:16:54+00:00 255 protocol negotiation failed: NT_STATUS_CONNECTION_RESET 2021-01-07 18:16:54+00:00 3665 protocol negotiation failed: NT_STATUS_IO_TIMEOUT 2021-01-07 18:16:54+00:00 5223 session setup failed: NT_STATUS_ACCOUNT_LOCKED_OUT 2021-01-07 18:16:54+00:00 1066 session setup failed: NT_STATUS_LOGON_FAILURE 2021-01-07 18:16:54+00:00 landed 1066 attempts out of 20000 (running total is now 1066/20) 2021-01-07 18:16:54+00:00 FAIL: 'attack_smb nturner vm-000-4698.maas.local smb-0 20 240' took 73 seconds (must take at least 120) 2021-01-07 18:16:54+00:00 FAIL: 1 tests failed When run against a much beefier (32 processor, fast RAID) machine, it's not as bad, but the limit is still exceeded by more than 100%: $ test-smb-auth-rate-limits nturner 063ts17-avta181002305.maas.local smb-0 2021-01-07 18:19:21+00:00 Test: 'attack_smb nturner 063ts17-avta181002305.maas.local smb-0 20 240' 2021-01-07 18:20:11+00:00 19977 session setup failed: NT_STATUS_ACCOUNT_LOCKED_OUT 2021-01-07 18:20:11+00:00 23 session setup failed: NT_STATUS_LOGON_FAILURE 2021-01-07 18:20:11+00:00 landed 23 attempts out of 20000 (running total is now 23/20) 2021-01-07 18:20:11+00:00 FAIL: 'attack_smb nturner 063ts17-avta181002305.maas.local smb-0 20 240' took 50 seconds (mu st take at least 120) 2021-01-07 18:20:11+00:00 FAIL: 1 tests failed The attached patch, which hacks check_sam_security to serialize all child processes via a lock file, while very silly, does prevent overshooting the configured limits. A couple questions: 1. My test hack just open codes a call to lockf. What's the "right" way to do interprocess coordination like this in samba? 2. Instead of serializing the entire authentication path, could we perhaps reduce the critical section to everything after the call to sam_password_ok(), and add a call to reload the sampass structure from the live DB as the first thing in that critical section? (I can try that, but I'm wondering if there are any known reasons why say, reloading sampass here would be a bad idea.) Nathaniel Turner
Created attachment 16385 [details] Test script
Created attachment 16386 [details] Initial patch
Created attachment 16390 [details] git-am fix for master. Nate, can you test this fix against your setup ? I'd like to see if I'm on the right track here. Thanks ! Jeremy.
Created attachment 16391 [details] git-am fix for master. Slightly better version that checks the error after the account reload.
Created attachment 16392 [details] git-am fix for master. Only re-load the user info if we successfully got the mutex.
Created attachment 16393 [details] git-am fix for master. Version #3. Improve the granularity of the mutex. We can make the mutex specific to the user name itself, not a global mutex for the sam account database.
Jeremy, I tested with the latest version (v3) and this shows much better results than without. I'm seeing far fewer NT_STATUS_LOGON_FAILURE returns from smbclient per minute. I am still seeing more than 10, but I suspect that part of my test is too simplistic. I would not be surprised to find that this status is returned in paths where the authentication did not make it as far as the password check. $ test-smb-auth-rate-limits nturner vm-000-4698.maas.local smb-0 2021-01-12 21:58:50-05:00 Test: 'attack_smb nturner vm-000-4698.maas.local smb-0 20 240' 2021-01-12 22:00:32-05:00 230 Connection to vm-000-4698.maas.local failed (Error NT_STATUS_IO_DEVICE_ERROR) 2021-01-12 22:00:32-05:00 11819 Connection to vm-000-4698.maas.local failed (Error NT_STATUS_IO_TIMEOUT) 2021-01-12 22:00:32-05:00 759 protocol negotiation failed: NT_STATUS_CONNECTION_RESET 2021-01-12 22:00:32-05:00 1163 protocol negotiation failed: NT_STATUS_IO_TIMEOUT 2021-01-12 22:00:32-05:00 3794 session setup failed: NT_STATUS_ACCOUNT_LOCKED_OUT 2021-01-12 22:00:32-05:00 1912 session setup failed: NT_STATUS_IO_TIMEOUT 2021-01-12 22:00:32-05:00 323 session setup failed: NT_STATUS_LOGON_FAILURE 2021-01-12 22:00:32-05:00 landed 323 attempts out of 20000 (running total is now 323/20) 2021-01-12 22:00:32-05:00 FAIL: 'attack_smb nturner vm-000-4698.maas.local smb-0 20 240' took 102 seconds (must take at least 120) 2021-01-12 22:00:32-05:00 FAIL: 1 tests failed $ test-smb-auth-rate-limits nturner 063ts17-avta181002305.maas.local smb-0 2021-01-12 22:03:44-05:00 Test: 'attack_smb nturner 063ts17-avta181002305.maas.local smb-0 20 240' 2021-01-12 22:04:48-05:00 19980 session setup failed: NT_STATUS_ACCOUNT_LOCKED_OUT 2021-01-12 22:04:48-05:00 20 session setup failed: NT_STATUS_LOGON_FAILURE 2021-01-12 22:04:48-05:00 landed 20 attempts out of 20000 (running total is now 20/20) 2021-01-12 22:04:48-05:00 FAIL: 'attack_smb nturner 063ts17-avta181002305.maas.local smb-0 20 240' took 64 seconds (must take at least 120) 2021-01-12 22:04:48-05:00 FAIL: 1 tests failed
Looking closer at the results with the patch on the faster machine, I think 20 attempts in 64 seconds is correct. If the first 10 failures occur within say, 1 second, the clock for ending the lockout starts then, which leaves a few seconds for 10 more attempts at the end of that time window. (I think I'd initially assumed the lockout clock didn't start until later, which in retrospect is silly.) So my test is now passing (with the proposed patch) on my faster machine. I'm still not sure why the slower machine is seeing 323 session setup failed: NT_STATUS_LOGON_FAILURE results within 102 seconds. (I'll see if I can hack smbd to return a more unique status when an actual password check has failed to convince myself that these are not in fact all landed password check attempts.)
OK, hacking nt_status_squash to be a no-op in smbd (and hacking the test to look for NT_STATUS_WRONG_PASSWORD instead of NT_STATUS_LOGON_FAILURE) shows that with the patch, account lockout appears to be working exactly as designed now: Test with nt_status_squash hack (but without fix): $ test-smb-auth-rate-limits nturner vm-000-4698.maas.local smb-0 2021-01-13 14:20:56+00:00 Test: 'attack_smb nturner vm-000-4698.maas.local smb-0 30 240' 2021-01-13 14:22:11+00:00 186 do_connect: Connection to vm-000-4698.maas.local failed (Error NT_STATUS_IO_DEVICE_ERROR) 2021-01-13 14:22:11+00:00 10040 do_connect: Connection to vm-000-4698.maas.local failed (Error NT_STATUS_IO_TIMEOUT) 2021-01-13 14:22:11+00:00 329 protocol negotiation failed: NT_STATUS_CONNECTION_RESET 2021-01-13 14:22:11+00:00 2555 protocol negotiation failed: NT_STATUS_IO_TIMEOUT 2021-01-13 14:22:11+00:00 5927 session setup failed: NT_STATUS_ACCOUNT_LOCKED_OUT 2021-01-13 14:22:11+00:00 371 session setup failed: NT_STATUS_LOGON_FAILURE 2021-01-13 14:22:11+00:00 592 session setup failed: NT_STATUS_WRONG_PASSWORD 2021-01-13 14:22:11+00:00 landed 592 attempts out of 20000 (running total is now 592/30) 2021-01-13 14:22:11+00:00 FAIL: 'attack_smb nturner vm-000-4698.maas.local smb-0 30 240' took 75 seconds (must take at least 120) 2021-01-13 14:22:11+00:00 FAIL: 1 tests failed Test with nt_status_squash hack AND fix: $ test-smb-auth-rate-limits nturner vm-000-4698.maas.local smb-0 2021-01-13 14:09:33+00:00 Test: 'attack_smb nturner vm-000-4698.maas.local smb-0 30 240' 2021-01-13 14:10:46+00:00 215 do_connect: Connection to vm-000-4698.maas.local failed (Error NT_STATUS_IO_DEVICE_ERROR) 2021-01-13 14:10:46+00:00 10801 do_connect: Connection to vm-000-4698.maas.local failed (Error NT_STATUS_IO_TIMEOUT) 2021-01-13 14:10:46+00:00 331 protocol negotiation failed: NT_STATUS_CONNECTION_RESET 2021-01-13 14:10:46+00:00 2482 protocol negotiation failed: NT_STATUS_IO_TIMEOUT 2021-01-13 14:10:46+00:00 4712 session setup failed: NT_STATUS_ACCOUNT_LOCKED_OUT 2021-01-13 14:10:46+00:00 1449 session setup failed: NT_STATUS_LOGON_FAILURE 2021-01-13 14:10:46+00:00 10 session setup failed: NT_STATUS_WRONG_PASSWORD 2021-01-13 14:10:47+00:00 landed 10 attempts out of 20000 (running total is now 10/30) 2021-01-13 14:12:04+00:00 189 do_connect: Connection to vm-000-4698.maas.local failed (Error NT_STATUS_IO_DEVICE_ERROR) 2021-01-13 14:12:04+00:00 10568 do_connect: Connection to vm-000-4698.maas.local failed (Error NT_STATUS_IO_TIMEOUT) 2021-01-13 14:12:04+00:00 298 protocol negotiation failed: NT_STATUS_CONNECTION_RESET 2021-01-13 14:12:04+00:00 1743 protocol negotiation failed: NT_STATUS_IO_TIMEOUT 2021-01-13 14:12:04+00:00 6817 session setup failed: NT_STATUS_ACCOUNT_LOCKED_OUT 2021-01-13 14:12:04+00:00 375 session setup failed: NT_STATUS_LOGON_FAILURE 2021-01-13 14:12:04+00:00 10 session setup failed: NT_STATUS_WRONG_PASSWORD 2021-01-13 14:12:04+00:00 landed 10 attempts out of 20000 (running total is now 20/30) 2021-01-13 14:13:19+00:00 197 do_connect: Connection to vm-000-4698.maas.local failed (Error NT_STATUS_IO_DEVICE_ERROR) 2021-01-13 14:13:19+00:00 10792 do_connect: Connection to vm-000-4698.maas.local failed (Error NT_STATUS_IO_TIMEOUT) 2021-01-13 14:13:19+00:00 466 protocol negotiation failed: NT_STATUS_CONNECTION_RESET 2021-01-13 14:13:19+00:00 1546 protocol negotiation failed: NT_STATUS_IO_TIMEOUT 2021-01-13 14:13:19+00:00 6748 session setup failed: NT_STATUS_ACCOUNT_LOCKED_OUT 2021-01-13 14:13:19+00:00 224 session setup failed: NT_STATUS_LOGON_FAILURE 2021-01-13 14:13:19+00:00 27 session setup failed: NT_STATUS_WRONG_PASSWORD 2021-01-13 14:13:19+00:00 landed 27 attempts out of 20000 (running total is now 47/30) 2021-01-13 14:13:19+00:00 PASS: 'attack_smb nturner vm-000-4698.maas.local smb-0 30 240' took 226 seconds (more than 120; good) 2021-01-13 14:13:19+00:00 SUCCESS: all tests passed
Great ! Thanks for the update. Now I think I need a simple test I can add to our regression test suite so we can ensure this stays fixed. It'll be a little difficult as it's a race condition issue, which can mean flaky reproduction but I'm guessing if I set the limit low enough I should be able to get a reasonably reliable reproducer. As far as I can see this only affects smbd/Samba3/local pdb accounts, not Samba4-AD accounts as that correctly does the atomic updates on bad-password counts. Do you concur ?
By Samba4-AD, I think you are referring to the way samba manages its user DB when running as an AD domain controller. I'm not at all familiar with that code, but a quick look at source4/auth/ntlm/auth_sam.c makes me wonder if it has a similar problem: If the password is bad (and not a prior good password), we do this in authsam_password_check_and_record(): nt_status = authsam_update_bad_pwd_count(auth_context->sam_ctx, msg, domain_dn); This eventually calls into dsdb_update_bad_pwd_count(). It's not obvious to me how this is atomic, but even if it is, the code back out in the end of authsam_password_check_and_record() looks like it is going to return NT_STATUS_WRONG_PASSWORD even if we find we have exceeded the bad password count threshold. So if we have 1000 processes running through here at the same time, it looks like many of them could get past the samdb_result_passwords check at the beginning of that function before the lockout flag gets set. At that point, the caller will be able to tell if the password was good or not, because they'll see NT_STATUS_WRONG_PASSWORD (mapped to NT_STATUS_LOGON_FAILURE) instead of NT_STATUS_ACCOUNT_LOCKED_OUT (which would mask the fact that the password was checked and didn't match). Am I reading this right? As for a test case for the current patch, yeah, this is a tricky thing to test reliably. It will probably vary based on the hardware used, but for what it's worth, my test VM has 2 cores and a virtual disk carved out of a larger RAID array and shared with a couple dozen other VMs. Without the fix, I can easily reproduce the problem with the xargs and smbclient approach in the attached test script. However, per_attempt=20000 is too high, as it results in many NT_STATUS_LOGON_FAILURE returns that are not due to NT_STATUS_WRONG_PASSWORD. If I set per_attempt=2000, the defect still reproduces, and all NT_STATUS_LOGON_FAILURE returns I see are due to NT_STATUS_WRONG_PASSWORD. (Confirmed by temporarily removing the mapping logic in nt_status_squash and repeating the test several times with each code variant.)
Gary, any chance you could take a look at this issue for the Samba-AD/DC code ? I'm not as familiar with how it manages the password lockouts. Also, this could only happen in the prefork process model as it requires multiple simultaneous process access.
I'll need to take a more detailed look tomorrow, but it does look like it's an issue in the AD code.
Definitely looks to be an issue in the AD code. I'll spend some time looking into this in more detail.
Created attachment 16411 [details] Proposed patch for bad password path Patch covers that bad password count update. Still need to: Write a test Handle the good password case.
Comment on attachment 16411 [details] Proposed patch for bad password path This looks OK. It would be good if the new code was more explicit that it is operating on the DN by <GUID= and not the string DN, which might be in a race. With the explicit transaction the dsdb_autotransaction_request() should be ldb_request() to avoid a nested transaction.
Are we going to treat this as a security bug, and get a CVE assigned? I feel that it needs to be fixed but does not warrant a CVE.
Maybe we should ask Red Hat product security to see if they think this needs a CVE ?
I get a CVSSv3.1 of AV:N/AC:H/PR:N/UI:N/S:U/C:L/I:L/A:N (4.8) That assumes that a password bad enough to be brute forced can only provide limited access, eg some confidentiality or integrity breach. It also rates such an attack as hard. If so, we would stop here per: 4a.) Stop and ask for advise on security@samba.org if the CVSSv3 score is less than 5.0. This is a heavy-weight process with costs to many parties. I'll ask Red Hat for some advice.
Red Hat have assigned CVE-2021-20251 for this. I assume we need to roll Gary's and Jeremy's patches together.
It needs to be recognised that in a multi-DC situation (LDAP backend) for the 'classic' NT4-like DC the reload will still be racy, as different servers won't see each others locks. This is of course limited to the number of servers, so the race won't be as fast (eg 4x parallel, with some stepping on toes meaning the count may not increase in the way expected). We also don't have any integration tests in a format that we could actually include in our testsuite, or any unit tests of the source3 side.
Yep, putting them together would seem to be the thing to do. As this is a race condition it's difficult to test, and as the problem is worst (and tested :-) in the AD case (and Gary has a fix for that :-), I think we may be OK with the code we have (tested by the submitter).
Jeremy, your commit message says "WIP" and "prototype". Is it ready?
I only put that on it until the reporter confirmed it worked :-).
It's ready.
Douglas will drive this from here, but all assistance most valued.
Nate: how [if at all] do you want to be credited in the advisory and patches? Nathaniel Turner of Exagrid? houseofnate? Also, is there anything in the comments here that you would rather not have made public when the patch goes out? We can hide individual messages forever.
Created attachment 16474 [details] s3/s4 combined patch, v1
Douglas, you can credit me as Nathaniel Turner of ExaGrid. I don't see anything in the discussion history that needs to be redacted. Thank you.
Created attachment 16483 [details] patch for master (v1)
Created attachment 16484 [details] patch for 4.14 (v1)
Created attachment 16485 [details] patch for 4.13 (v1)
Created attachment 16486 [details] patch for 4.12 (v1)
Created attachment 16487 [details] patch for 4.11 (v1) I've backported the patches to all supported versions and have the result under CI.
Comment on attachment 16483 [details] patch for master (v1) At the very least we need to put "objectGUID" into dcesrv_samr_OemChangePasswordUser2(), but I'm also quite worried that we have not eliminated the race here. Also the KDC does not check the error returns from authsam_logon_success_accounting() so the attack just moves from the NTLM to the KDC codepath. Thankfully the KDC is not often multi-process, bit might be in the future. Likewise, the SAMR password change routines don't call authsam_logon_success_accounting(). The same will apply in source3, where the bad password mutex isn't taken out on the SAMR server. I've stopped the test pipelines for now. Sorry!
For the AD DC, a valid integration test would: - take out a transaction - lock out the account - connect to the server with a valid password - sleep() to ensure the server hits the transaction lock - commit the local transaction If the valid password is returned as LOCKED_OUT then the accounting is working. The same can be done with a bad password, set the value to X in the transaction and confirm it becomes X + 1 after. We can then do this for the 4 methods by which passwords can be checked: - KDC - NTLM - LDAP password change - SAMR
(In reply to Andrew Bartlett from comment #45) Good catch on the source3/rpc_server/samr_XXX.c code ! I can extend the source3/ part of the patch to do the same fix for source3/rpc_server/samr/srv_samr_chgpasswd.c:pass_oem_change() The code doing set_user_info_XXX() inside source3/rpc_server/samr/srv_samr_nt.c I think is Administrator level password changes, so I don't think we need the fix there, yes ? Or am I wrong ? :-) :-).
(In reply to Jeremy Allison from comment #47) Yes, the set_ routines should be the administrator password set.
Looking at the AD DC side of things, and the code that still needs to be changed: * The SAMR server change need careful work to ensure that we match Windows behaviour. Just plugging into the success accounting code might diverge behaviour. We might call that a Windows bug, but this isn't just a 'fix a race'. * The KDC is much harder to restructure to support this. In the MIT KDC, samba_bad_password_count() and the KDB functions kdb_samba_db_audit_as_req() return void. Thankfully only the Heimdal KDC is supported, and the HDB hook auth_status() -> hdb_samba4_auth_status() returns an error code. However the kerberos5.c code does not check that error code. Also, the success accounting isn't called. So while this patch will at least mean the counter will increment, if we have parallel KDC processing (a prefork KDC) it will be possible to check more passwords until the next DB read after the transaction finishes for a failure (and potential lockout). But fixing it properly isn't easy. I propose that we ignore the KDC for now, sadly, and focus on the NTLM case because due to the forking model of smbd much more parallel behaviour is possible. The KDC is preforked by default, so this might allow a few more guesses while bad passwords are waiting on the DB transaction lock. I also propose no behaviour change in the SAMR server to call the success accounting, but instead open a transaction around the read. Instead *if* we are preforking we might well allow extra guesses. I need to think as to if in these situations taking out a transaction around the initial read might provide some protection. Stepping back, all in all this is a lot of work to reduce the danger for which the standard mitigation is 'choose good passwords'.
Gary and I have done significant work to address a major concern of mine, which is that putting every good password behind the transaction lock would be a bad idea - these can be quite long-lived. We will only take out a transaction if a hint is found in a runtime tdb. Gary made a start and I've then refined those patches with an eye to being reasonable to review (carefully adapting changes so that they don't touch as much of the code). What remains is the adaption of the unit tests. We still also need the source3 code updated.
Created attachment 16573 [details] WIP patch for master The tests need to be updated as we no longer call dsdb_search_by_guid().
The status here is that while this is an important and reproducible issue, we have not been able to put the resources into finishing the AD patches once the need to avoid the transaction in the good password case was identified. I'll chip away at this when I get time, but there are higher priorities right now that can't be solved by simply not having really bad passwords. Sorry,
Created attachment 17410 [details] patch for master v2 An updated patchset that includes Python integration tests. One caveat is that the Python tests require an additional library (python3-crypto on Debian/Ubuntu), that the bootstrap system doesn't install at present, so we can use single-DES for testing SAMR password changes. The alternative is writing new bindings in C.
Comment on attachment 17410 [details] patch for master v2 You may be able to trigger the password change with py_net_change_password() (extended to specify SAMR). But the minimal change from here would be to expose single DES from GnuTLS rather than bringing back python-crypto, as it causes dependency issues. Some similar things are in pyglue.c, eg py_generate_random_bytes()
(In reply to Andrew Bartlett from comment #54) lib/crypto/py_crypto.c would be the right spot, we already do the same for RC4 for similar reasons.
Created attachment 17411 [details] patch for master v3 Thanks. The new patchset uses C bindings from py_crypto.c, so the dependency on python3-crypto is removed.
Comment on attachment 17411 [details] patch for master v3 Rather than wrap E_old_pw_hash(), which has a name which extends from days very long ago when we tried to pretend we didn't have full DES in Samba (pre 2000era crypto export rules), please wrap the equivilant des_crypt112_16() or des_crypt56_gnutls() and give a more expressive name. This will make it easier for others later. Down in the tests, it looks like there is some leftover from the previous modes where the GUID was put into the user message. I don't think we need add_object_guid() and maybe also add_sid() any more, eg in this repeated block: + msg = create_message(ctx); + guid = GUID_random(); + add_object_guid(msg, &guid); + add_sid(msg, "S-1-5-21-2470180966-3899876309-2637894779-1000"); We should however have a test that renames the user (eg in that transaction) and confirms the re-read still works (implicitly confirms the GUID in the DN is being used). This was part of the simplification I did between Gary's patch and my rework, as by removing the GUID handling we could simplify the cases and should be able to simplify the tests. This is looking good.
(In reply to Joseph Sutton from comment #56) I don't see any tests that the bad password indicator is set or used. The tests prepare for lockout by editing the DB directly, which is OK, but I wonder if we need some other test? Or is this covered implicitly somehow? I wonder if we should have a test that just sets up the race, eg 4 bad passwords all stuck behind the transaction lock with a threshold of 3. That would be the simple proof that we have fixed the issue, and should be able to be knownfailed fairly reliably. We also never, until restart, clean up an old bad password indicator if the password is never used again successfully, but I guess that's OK.
A note for later as I can't make this idea public until this is public. The bad password indicator DB could be used to implement a properly secure version of the SMB authentication rate limiter that MS is doing on their next windows version. If we do that we could just write in that TDB the time the password should be able to be checked (2sec more each failure), and initially just sleep - even better properly async wait - till that time, up to (say) 30sec.
(In reply to Andrew Bartlett from comment #60) In particular, a test that a successful correct login is possible on an account without a bad password indicator, but despite an outstanding transaction. Also a test that a successful correct login is possible on an account with a bad password indicator (but which is wrong - eg appears erroneously, forcing the re-read)
Created attachment 17413 [details] patch for master v4 This patchset should address the new comments. It also now ensures that we return a LOCKED_OUT error code whether the password is bad or not, and makes the bad password count update for an LDAP password change atomic (previously, there was a small gap between ending a transaction and immediately starting a new one, during which other processes could acquire the transaction lock and sneak an update in).
Comment on attachment 17413 [details] patch for master v4 My only comment is that the other tests seem to suggest that the logonCount is not updated for a simple bind (except to set to 0 on the first logon). (I was checking on this because simple binds, which follow the same internal pattern as NTLM authentication, are the high-volume auth path we don't want stuck behind a transaction queue).
Comment on attachment 17413 [details] patch for master v4 The hold-up on this patch is a race in socket_wrapper being triggered by the forked processes. There is a small stat()/[libc_]bind() race looking for a free 'port' in a loop within swrap_auto_bind(). Once this is worked around we should be almost ready to publish this.
I think we should get rid of the stat() in swrap_auto_bind(). src/socket_wrapper.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/socket_wrapper.c b/src/socket_wrapper.c index 8141b8b..68adcae 100644 --- a/src/socket_wrapper.c +++ b/src/socket_wrapper.c @@ -3916,10 +3916,12 @@ static int swrap_auto_bind(int fd, struct socket_info *si, int family) type, socket_wrapper_default_iface(), port); - if (stat(un_addr.sa.un.sun_path, &st) == 0) continue; ret = libc_bind(fd, &un_addr.sa.s, un_addr.sa_socklen); if (ret == -1) { + if (errno == EALREADY || errno == EADDRINUSE) { + continue; + } goto done; } Can you test if the above patch fixes it?
(In reply to Andreas Schneider from comment #66) I've tested your patch, and it appears to fix the 'Address already in use' error I was getting -- thanks! However, I had to make the following additional change: @@ -3815,7 +3815,6 @@ static int swrap_auto_bind(int fd, struct socket_info *si, int family) char type; int ret; int port; - struct stat st; char *swrap_dir = NULL; swrap_mutex_lock(&autobind_start_mutex); to avoid an 'unused variable' warning.
Created attachment 17436 [details] patch for master v5
socket_wrapper update to version 1.3.4 addressing the TOCTOU issue: https://gitlab.com/samba-team/samba/-/merge_requests/2627
I've tried to apply the patch on my asn-samr-aes branch. It fails to apply because of changes in master. There are no conflicts with the changes in my samr branch. commit 63b0861eaf1ef6852783127fb4a1803b8108ddd4 Author: Joseph Sutton <josephsutton@catalyst.net.nz> AuthorDate: Tue Jul 5 20:18:01 2022 +1200 Commit: Andreas Schneider <asn@samba.org> CommitDate: Thu Jul 28 13:26:03 2022 +0200 CVE-2021-20251 s4-rpc_server: Remove transaction logic from SAMR password change samdb_set_password() also starts a transaction, so doing it here is unecessary and only results in a nested transaction. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14611 Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> I think it should be the other way around. We need to remove the transaction from samdb_set_password(). 3.1.5.10.3 SamrUnicodeChangePasswordUser2 (Opnum 55) [..] 1. On a DC configuration, if Active Directory is not running, the server MUST abort the request and return an error status. 2. All database operations MUST occur in a single transaction. [..] 3.1.5.10.4 SamrUnicodeChangePasswordUser4 (Opnum 73) Same as above ... So we need to make sure we wrap all of them in a transaction.
(In reply to Andreas Schneider from comment #70) Ah, OK -- I'll see if everything can be wrapped in a transaction. But I'm not entirely sure how that can work, given that the search is performed with a system samdb context, and the actual password change is performed with a new context, having only user privileges. Unless DSDB_SESSION_INFO can be modified on the context to lower our privileges? I noticed that our ChangePasswordUser4 implementation seems to operate as system, without requiring any rights on the part of the user, unlike other password change methods. Is that really what Windows is doing?
(In reply to Joseph Sutton from comment #71) In the past, before the patches for NTLM removal, some measure of the 'transaction' was provided by the fact that the first SYSTEM read got the hashes, which the second USER operation had to have match byte-wise. But I changed that to just a flag structure, to avoid relying on the NT hash. The only way we can put this in a transaction (cleanly, without making assumptions about how TDB/LMDB work under the hood) is by changing the opaque, sort of like become_root() in the fileserver.
Created attachment 17452 [details] patch for master v7 New patch rebased on master, currently under CI.
Created attachment 17453 [details] patch for master v8
(In reply to Joseph Sutton from comment #71) Yes, the password should be changed as the user. Thanks for fixing it correctly in all functions.
Created attachment 17456 [details] patch for master v9 Well, almost correctly. This time I think the patch is basically ready, if there are no more major changes needed.
Removing embargo.
This bug was referenced in samba master: b27a67af0216811d330d8a4c52390cf4fc04b5fd 17b8d164f69a5ed79d9b7b7fc2f3f84f8ea534c8 4bb9d85fed8498566bdb87baa71a3147806baafc 91e2e5616ccd507fcaf097533c5fc25974119c1e 439f96a2cfe77f6cbf331d965a387512c2db91c6 2087b0cd986b8959b2a402b9a1891472e47ca0b0 408717242aad8adf4551f2394eee2d80a06c7e63 7b8e32efc336fb728e0c7e3dd6fbe2ed54122124 d6cf245b96fb02edb3bcc52733d040d5f03fb918 336e303cf1962b56b64c0d9d2b05ac15d00e8692 de4cc0a3dae89f3e51a099282615cf80c8539e11 4a9e0fdccfa218fbb2c3eb87e1a955ade0364b98 b954acfde258a1909ed60c1c3e1015701582719f 55147335aec8194b6439169b040556a96db22e95 712181032a47318576ef35f6a6cf0f958aa538fb b5f78b7b895a6b92cfdc9221b18d67ab18bc2a24 d8a862cb811489abb67d4cf3a7fbd83d05c7e5cb a65147a9e98ead70869cdfa20ffcc9c167dbf535 96479747bdb5bc5f33d903085f5f69793f369e3a 2b593c34c4f5cb82440b940766e53626c1cbec5b b1e740896ebae14ba64250da2f718e1d707e9eed bdfc9d96f8fe5070ab8a189bbf42ccb7e77afb73 a268a1a0e304d0702469e4ac146d8af5e7384c39 268ea7bef5af4b9c8a02f4f5856113ff0664d9e8 8587734bf989aeaafa9d09d78d0f381caf52d285 65c473d4a53fc8a22a0d531aff45203ea3a4d99b fabbea25310a31c0409b1c11eaced39bd8cde8dd f74f92aea164af40d9177b332778a76d7ecabcbd fcabcb326d385c1e1daaa8dae9820e33a3868f56 7981cba87e3a7256b12bfc5fdd89b136c12979ff 1d869a2a666cfada1495d891021de6c2b8567a96 8ae0c38d54f065915e927bbfe1b656400a79eb13
Congratulations on landing this ! Lots of hard work.
Created attachment 17513 [details] patch for 4.17 v9
Created attachment 17521 [details] proposed patch for 4.16 v9
Assigning to Jule for the next available releases.
Pushed to autobuild-v4-{17,16}-test.
This bug was referenced in samba v4-16-test: f0c44d9e53dbeb9a8ebc387635f6146cb292882f 3542483de3f07806d78ea018852092516582c71d b7351888e82c5d2ae6222aede762609c02da2905 63020bf13c04f7a8604b8f4a02dba48b8aeed769 0e3ac110df7e08823407006aa82e6300524212ee bc30ca2117c75aa57c8a8b11420678cc8eea294b 740c4c2b95367793c557085e3cc5b4e367e2e0bb 831335aaaad6eead7ebd7dc3ed89c5f33c2a3e35 9dcf447d82238816f6056d64c68e85ddd8808660 8580b90a87b0a18dad8d9808e6d85ccbe5ecf107 a1a440c10148d19d3b9ac47ade2fa36ab0e39e33 79f791ff0ebe0a8a6787c74ad20870ce41907844 a9aae34d5a97081dff9126328167678cfc4601c7 6b826a375a13f44a0486024ee09564cf3b1528ca 0d6da5250be44888b3eaeb94aad7cee77f44d095 dd38fae8c8d7d47c75f03b6bc94bed8d47012cb6 6a70d006917486e4f1bf9333ab8a4961e79ef51a 2fe2485b93d589620684f79b61d3da222accfb1e f725f2f2442b3cef0fb51ad2af5248147524873b f58d7e42009180dba67832149123a19237d388b2 29b31129fd372513ad24e56ec4caab6844e2ed72 5eb5daaa1521d424ebde5a4d06ad05c9cdfc7996 74d8c3d5843f2ba285bc1a6ace1eba5080c0e99c 96c24b58b8cc2301007f0bf586184fd2d264b028 05447dfb2016d32fdfffb42fdb02cc4db68b18a9 69abe0c2b0aeeb8ac1fb93baae41f1452ac4a5fc ae3b615236c4cc773be30f34ec9b794a25253449 f78ff75c51f6cd15b476de373f11b1c87f962970 6a0280d9553db535601c71c1be475f85949d5b83
This bug was referenced in samba v4-17-test: d4ae8610ea37bdd0ac2e6dcb59f858d9921f949b 518818b3c104b7d314265a6c631742384de30d76 0b3604e6e0de7c03eeebb81ef8611dd4b6905c6f b82543978d112fb31fba593bde16161eefbf8dbe 276d81368ecf057d462fca40c28cd68deb20b390 2dc965ad1d85b3f77e734a74b52ce9049b5df536 d57c4ea959927bf340c6326029db5a11941538a6 674dbeaca0710e4493d53b427903b2ef4ec59788 2e4c6196d88fe22d3666bcc07091d8b00f015a0b 180784c49b34a1cca30d2bfbef7397e9b6826d20 d07f34ec394c75f0530366fb010e3804c0570468 e0fdfce1327c8c68da99689fe3a3d6fc34f0dfca fa22c9bf2bee98245fdfecbeb360bc2f11d9de89 ffe43511bb9ed9e5bb91273d13b63e1655ee1240 1167352291264d5e1c4b1617603f77e902beb1c8 446cfe34523cbdd18b1407f0ed219996b7c0e1b1 3a96ccbb8414b2642836cf88ea5531040fc658b4 254e94892cd3b561f5d17e7008b792fa7948c68b 4d0cba69c8f0d2f4426f3889b40f2f6b161aef25 5f1bafdd3f042a42f0cb9f5dec3345d24c3586d9 4adcada4104294d2b7e30617fee2f959e93e908e 5befe31c651419cfae9ff31e25290e2952441670 b3f48fae13ed7538b5997b90682615f0cafb1b49 13efa6261883edefd80267db7063d138aa8b5554 5c8bbe3e74c04d44c8c118bf3a81306ead02ddb5 3e54aabd9e38a0f2020dd0359613244cf578c719 b8c123d02d0d0968bfc8548e57931342dd77e068 7b28bd1080383fba3abb54b3cbf03134cbefd7db 7fe10442b7670b35101218ee3e9a303cc573618c 619ffc2a2fb50d87d771ea316818c65139c79e5a 9aabf78216f91aee6abcd401b10f1ca01f544be0 bb86d2f3a10edbe27aa2edeafce0475b9cd79feb
Closing out bug report. Thanks!
This bug was referenced in samba v4-17-stable (Release samba-4.17.1): d4ae8610ea37bdd0ac2e6dcb59f858d9921f949b 518818b3c104b7d314265a6c631742384de30d76 0b3604e6e0de7c03eeebb81ef8611dd4b6905c6f b82543978d112fb31fba593bde16161eefbf8dbe 276d81368ecf057d462fca40c28cd68deb20b390 2dc965ad1d85b3f77e734a74b52ce9049b5df536 d57c4ea959927bf340c6326029db5a11941538a6 674dbeaca0710e4493d53b427903b2ef4ec59788 2e4c6196d88fe22d3666bcc07091d8b00f015a0b 180784c49b34a1cca30d2bfbef7397e9b6826d20 d07f34ec394c75f0530366fb010e3804c0570468 e0fdfce1327c8c68da99689fe3a3d6fc34f0dfca fa22c9bf2bee98245fdfecbeb360bc2f11d9de89 ffe43511bb9ed9e5bb91273d13b63e1655ee1240 1167352291264d5e1c4b1617603f77e902beb1c8 446cfe34523cbdd18b1407f0ed219996b7c0e1b1 3a96ccbb8414b2642836cf88ea5531040fc658b4 254e94892cd3b561f5d17e7008b792fa7948c68b 4d0cba69c8f0d2f4426f3889b40f2f6b161aef25 5f1bafdd3f042a42f0cb9f5dec3345d24c3586d9 4adcada4104294d2b7e30617fee2f959e93e908e 5befe31c651419cfae9ff31e25290e2952441670 b3f48fae13ed7538b5997b90682615f0cafb1b49 13efa6261883edefd80267db7063d138aa8b5554 5c8bbe3e74c04d44c8c118bf3a81306ead02ddb5 3e54aabd9e38a0f2020dd0359613244cf578c719 b8c123d02d0d0968bfc8548e57931342dd77e068 7b28bd1080383fba3abb54b3cbf03134cbefd7db 7fe10442b7670b35101218ee3e9a303cc573618c 619ffc2a2fb50d87d771ea316818c65139c79e5a 9aabf78216f91aee6abcd401b10f1ca01f544be0 bb86d2f3a10edbe27aa2edeafce0475b9cd79feb
This bug was referenced in samba master: 1414269dccfd7cb831889cc92df35920b034457c
This bug was referenced in samba v4-17-test: 8578a24c288a95619f1a74c4aecc8753b96e149b
This bug was referenced in samba v4-16-test: a1136ed2e05a2adca83a57a0402a165de631be58
This bug was referenced in samba v4-16-stable (Release samba-4.16.8): f0c44d9e53dbeb9a8ebc387635f6146cb292882f 3542483de3f07806d78ea018852092516582c71d b7351888e82c5d2ae6222aede762609c02da2905 63020bf13c04f7a8604b8f4a02dba48b8aeed769 0e3ac110df7e08823407006aa82e6300524212ee bc30ca2117c75aa57c8a8b11420678cc8eea294b 740c4c2b95367793c557085e3cc5b4e367e2e0bb 831335aaaad6eead7ebd7dc3ed89c5f33c2a3e35 9dcf447d82238816f6056d64c68e85ddd8808660 8580b90a87b0a18dad8d9808e6d85ccbe5ecf107 a1a440c10148d19d3b9ac47ade2fa36ab0e39e33 79f791ff0ebe0a8a6787c74ad20870ce41907844 a9aae34d5a97081dff9126328167678cfc4601c7 6b826a375a13f44a0486024ee09564cf3b1528ca 0d6da5250be44888b3eaeb94aad7cee77f44d095 dd38fae8c8d7d47c75f03b6bc94bed8d47012cb6 6a70d006917486e4f1bf9333ab8a4961e79ef51a 2fe2485b93d589620684f79b61d3da222accfb1e f725f2f2442b3cef0fb51ad2af5248147524873b f58d7e42009180dba67832149123a19237d388b2 29b31129fd372513ad24e56ec4caab6844e2ed72 5eb5daaa1521d424ebde5a4d06ad05c9cdfc7996 74d8c3d5843f2ba285bc1a6ace1eba5080c0e99c 96c24b58b8cc2301007f0bf586184fd2d264b028 05447dfb2016d32fdfffb42fdb02cc4db68b18a9 69abe0c2b0aeeb8ac1fb93baae41f1452ac4a5fc ae3b615236c4cc773be30f34ec9b794a25253449 f78ff75c51f6cd15b476de373f11b1c87f962970 6a0280d9553db535601c71c1be475f85949d5b83 a1136ed2e05a2adca83a57a0402a165de631be58
This bug was referenced in samba v4-17-stable (Release samba-4.17.4): 8578a24c288a95619f1a74c4aecc8753b96e149b
Does someone know why this issue never went through samba-vendors or any form of pre-notification? We were a bit surprised by the issue and only found it recently. Could we have noticed this somewhere else ?
(In reply to rfrohl from comment #93) Thanks rforhl for pointing out the gap in our process. Sorry for that. We eventually judged that this bug was not serious enough for an embargo and coordinated disclosure, but I think we we should have notified the vendor list when we did that. Also, since we have a CVE, we should have some text at https://www.samba.org/samba/security/CVE-2021-20251.html (and in an attachment here) even if said text is mostly saying "don't worry about this one". I think the contributing factors are: * this one was just so drawn out that without the focus of a coordinated release we lost sight of some of the things we want to do for a CVE. * We don't have a clear policy about notifying vendors in this case anyway.
Rating this is a bit touchy it would seem, but the rating of this bug with > Importance: P1 critical suggests at first glance that it might be more serious and should have gotten more attention. Not to sure what the guidance there would be.
(In reply to rfrohl from comment #93) This issue fell (no - was pushed, we modified the policy) below the level that requires a security release per: https://wiki.samba.org/index.php/Samba_Security_Process point 4b.
Created attachment 17733 [details] Proposed patch for v4-15-test The patch for 4.15 required some adaptation due to the older heimdal version and has a couple of extra patches, one to fix the lockout tests due to different heimdal KDC behavior and another one to forward the error returned from the hdb plugin. Pipeline https://gitlab.com/scabrero/samba/-/pipelines/745547640
Created attachment 17734 [details] Modified patch for v4-15-test I've attached a version of your backport with some suggested changes, if that's OK: * I added Patch 02/41 to fix a leak in Heimdal. * I also added Patch 03/41 to deal with errors returned from clientdb->hdb_auth_status(). * I changed Patch 25/41 (s4:kdc: Move logon success accounting) to operate as it did previously, by calling log_authentication_event() in success cases as well. * I adapted Patch 27/41 (heimdal:kdc: Forward KRB5KDC_ERR_CLIENT_REVOKED...) to the changes in Patch 03/41.
(In reply to Joseph Sutton from comment #98) Of course, I am fine with the changes. Thanks for having a look.