Just build new 4.8.0 from source to update from 4.7.5; in test environment upgrade went OK, so I've shut down one of production controllers, updated and restarted it and after restart sam.ldb appears to be completely broken: 1) log.samba from startup: ============================= [2018/03/15 20:35:11.246214, 0, pid=16865, effective(0, 0), real(0, 0)] ../source4/smbd/server.c:466(binary_smbd_main) samba version 4.8.0-4.8.0SUSE-SLE_11-x86_64 started. Copyright Andrew Tridgell and the Samba Team 1992-2018 [2018/03/15 20:35:12.608982, 0, pid=16866, effective(0, 0), real(0, 0)] ../source4/smbd/server.c:638(binary_smbd_main) binary_smbd_main: samba: using 'standard' process model [2018/03/15 20:35:12.624095, 0, pid=16869, effective(0, 0), real(0, 0)] ../source4/dsdb/common/util.c:1755(samdb_reference_dn_is_our_ntdsa) Failed to find object DC=ad,DC=maxidom,DC=ru for attribute fsmoRoleOwner - Cannot find DN DC=ad,DC=maxidom,DC=ru to get attribute fsmoRoleOwner for reference dn: No such Base DN: DC=ad,DC=maxidom,DC=ru [2018/03/15 20:35:12.632163, 1, pid=16869, effective(0, 0), real(0, 0)] ../source4/dsdb/common/util.c:1939(samdb_is_pdc) Failed to find if we are the PDC for this ldb: Searching for fSMORoleOwner in DC=ad,DC=maxidom,DC=ru failed: Cannot find DN DC=ad,DC=maxidom,DC=ru to get attribute fsmoRoleOwner for reference dn: No such Base DN: DC=ad,DC=maxidom,DC=ru [2018/03/15 20:35:12.643361, 1, pid=16875, effective(0, 0), real(0, 0)] ../source4/dsdb/common/util.c:1359(samdb_ntds_settings_dn) Searching for dsServiceName in rootDSE failed: operations error at ../source4/dsdb/samdb/ldb_modules/rootdse.c:516 [2018/03/15 20:35:12.643848, 1, pid=16875, effective(0, 0), real(0, 0)] ../source4/dsdb/common/util.c:1379(samdb_ntds_settings_dn) Failed to find our own NTDS Settings DN in the ldb! [2018/03/15 20:35:12.644456, 1, pid=16875, effective(0, 0), real(0, 0)] ../source4/dsdb/common/util.c:1536(samdb_ntds_objectGUID) Failed to find our own NTDS Settings objectGUID in the ldb! [2018/03/15 20:35:12.644833, 1, pid=16875, effective(0, 0), real(0, 0)] ../source4/kdc/kdc-heimdal.c:319(kdc_task_init) kdc_task_init: Cannot determine if we are an RODC: operations error at ../source4/dsdb/common/util.c:3470 [2018/03/15 20:35:12.645185, 0, pid=16875, effective(0, 0), real(0, 0)] ../source4/smbd/service_task.c:36(task_server_terminate) task_server_terminate: task_server_terminate: [kdc: krb5_init_context samdb RODC connect failed] [2018/03/15 20:35:12.648211, 1, pid=16876, effective(0, 0), real(0, 0)] ../source4/dsdb/common/util.c:1359(samdb_ntds_settings_dn) Searching for dsServiceName in rootDSE failed: operations error at ../source4/dsdb/samdb/ldb_modules/rootdse.c:516 [2018/03/15 20:35:12.648692, 1, pid=16876, effective(0, 0), real(0, 0)] ../source4/dsdb/common/util.c:1379(samdb_ntds_settings_dn) Failed to find our own NTDS Settings DN in the ldb! [2018/03/15 20:35:12.649284, 1, pid=16876, effective(0, 0), real(0, 0)] ../source4/dsdb/common/util.c:1536(samdb_ntds_objectGUID) Failed to find our own NTDS Settings objectGUID in the ldb! [2018/03/15 20:35:12.649651, 0, pid=16876, effective(0, 0), real(0, 0)] ../source4/smbd/service_task.c:36(task_server_terminate) task_server_terminate: task_server_terminate: [dreplsrv: Failed to connect to local samdb: WERR_DS_UNAVAILABLE ] ============================= 2) samba-tool dbcheck ============================= Unable to determine the DomainSID, can not enforce uniqueness constraint on local domainSIDs Searching for dsServiceName in rootDSE failed: operations error at ../source4/dsdb/samdb/ldb_modules/rootdse.c:516 Failed to find our own NTDS Settings DN in the ldb! Searching for dsServiceName in rootDSE failed: operations error at ../source4/dsdb/samdb/ldb_modules/rootdse.c:516 Failed to find our own NTDS Settings DN in the ldb! ERROR(ldb): uncaught exception - operations error at ../source4/dsdb/samdb/ldb_modules/rootdse.c:516 File "/usr/lib64/python2.6/site-packages/samba/netcmd/__init__.py", line 176, in _run return self.run(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/samba/netcmd/dbcheck.py", line 135, in run reset_well_known_acls=reset_well_known_acls) File "/usr/lib64/python2.6/site-packages/samba/dbchecker.py", line 95, in __init__ self.ntds_dsa = ldb.Dn(samdb, samdb.get_dsServiceName()) File "/usr/lib64/python2.6/site-packages/samba/samdb.py", line 943, in get_dsServiceName res = self.search(base="", scope=ldb.SCOPE_BASE, attrs=["dsServiceName"]) =============================
Can confirm; even though ldbsearch seems to show all entries as before, samba is unable to find them: Mar 24 13:23:39 radius samba[3696]: [2018/03/24 13:23:39, 0] ../source4/smbd/server.c:638(binary_smbd_main) Mar 24 13:23:39 radius samba[3696]: binary_smbd_main: samba: using 'standard' process model Mar 24 13:23:39 radius samba[3760]: [2018/03/24 13:23:39, 1] ../source4/kdc/db-glue.c:2854(samba_kdc_setup_db_ctx) Mar 24 13:23:39 radius samba[3760]: samba_kdc_fetch: could not find own KRBTGT in DB: dsdb_search at ../source4/dsdb/common/util.c:4641 Mar 24 13:23:39 radius samba[3760]: [2018/03/24 13:23:39, 0] ../source4/smbd/service_task.c:36(task_server_terminate) Mar 24 13:23:39 radius samba[3760]: task_server_terminate: task_server_terminate: [kdc: hdb_samba4_create_kdc (setup KDC database) failed] Mar 24 13:23:39 radius samba[3696]: [2018/03/24 13:23:39, 0] ../lib/util/become_daemon.c:138(daemon_ready) Mar 24 13:23:39 radius samba[3696]: daemon_ready: STATUS=daemon 'samba' finished starting up and ready to serve connections
I have developed patches to prevent the index corruption and combined with 5c1504b94d1417894176811f18c5d450de22cfd2 we should be able to cope with this and other index issues. I'll continue to work on this.
(In reply to Andrew Bartlett from comment #2) Just to update those following this bug, I'm continuing to work on this (mostly the safety to ensure it can't happen again). I've got patches into master and now I'm just backporting them with tests.
Created attachment 14114 [details] patch for 4.8 cherry-picked and adapted from master Proposed patch for Samba 4.8
Comment on attachment 14114 [details] patch for 4.8 cherry-picked and adapted from master Andrew, is this everything ldb related that's need backporting? If so we need to change the ldb version and create an ldb 1.3.3 before the next 4.8 release.
(In reply to Andrew Bartlett from comment #4) Hi Andrew, a week ago you have mentioned a "serious issue breaking upgrades to samba 4.8" on Samba-Technical and provided a "Refuse to commit a faulty reindex" patch. Is this issue also covered by these patches? If not, shouldn't we add this here?
(In reply to Björn Baumbach from comment #6) That is the bulk of this patch. The fix for the 'actual' issue is thought to be [PATCH 5/5] ldb_tdb: Do not fail in GUID index mode if there is a duplicate attribute I would like feedback on the patch from a user with a DB impacted by this issue, as I didn't have one to hand (instead the tests create what I thought was the situation). Finally, on retrospect that commit should probably be first in the series, but I'll wait until I get feedback before I re-do the series as metze also wanted a release commit made.
(In reply to Andrew Bartlett from comment #7) Initial tests with v4-8-stable + the attached patch seem to be successful – Samba starts up, Bind9 DLZ works, Kerberos works, LDAP search returns the expected number of entries.
Marking as regression/blocker. We shouldn't ship another 4.8.x without the fix.
Comment on attachment 14114 [details] patch for 4.8 cherry-picked and adapted from master We need a patchset that lets us do a new ldb release see comments 5,6,7 and I actually want to understand what's going on here, there seems to be a lot a patches which seem to be related, but are not included.
(In reply to Stefan Metzmacher from comment #10) The patch in attachment 14114 [details] is the set required to fix the issue, indeed only one patch is really needed, the mentioned last patch. Can you give more detail on what you would like to understand further about this issue? The intention here is to have the minimum set of changes for the urgent backport, without bringing in the whole re-work for LMDB. What other commits from master do you want? Thanks, Andrew Bartlett
Created attachment 14166 [details] backported patch for 4.8 This patch adds the release files and does the re-order as requested.
(In reply to Andrew Bartlett from comment #12) Thanks! Pushed to autobuild-v4-8-test
(In reply to Stefan Metzmacher from comment #13) Pushed to v4-8-test. Closing out bug report. Thanks!
(In reply to Alexey Vekshin from comment #0) Can you run the new Samba 4.8.2 release and see how it goes for you now? You may need to run the sambaundoguididx script to get it back before you upgrade. Thanks, Andrew Bartlett
(In reply to Andrew Bartlett from comment #15) Thank you for your efforts! I'll re-try in-place upgrade with 4.8.2 and report on results. > You may need to run the sambaundoguididx script to get it back before you upgrade. To get back _what_? :) Now I'm domain with 9 DCs on 4.7.1 (self-built RPMs, mostly based on SLES specs). Do I need to upgrade libtalloc (2.11), libevent (0.9.36), tdb (1.3.15) and ldb (1.3.2) to current versions too?
(In reply to Alexey Vekshin from comment #16) To get the DB back into the pre-upgrade state you will need to run sambaundoguididx, so it can try the upgrade again. However, as you have 9 DCs, I would suggest joining the domain rather then upgrading, it works better (gains features like sortedLinks and encryptedSecrets during the join). Samba requires exactly matching versions of talloc, tdb, et al as when the release was shipped so yes.
(In reply to Andrew Bartlett from comment #17) I've tested both new join and in-place upgrade from 4.7.6 to 4.8.2 and it worked flawlessly. New replication visualisation also works with python 2.6 :) NTDS Connections known to each destination DC destination ,---------- *,CN=AD-BDC+ |,--------- *,CN=AD-PDC+ ||,-------- *,CN=AD-TDC+ |||,------- *,CN=DC1-E1+ ||||,------ *,CN=DC1-K1+ |||||,----- *,CN=DC1-M1+ ||||||,---- *,CN=DC1-MF1+ |||||||,--- *,CN=DC1-N1+ ||||||||,-- *,CN=DC1-S1+ source |||||||||,- *,CN=DC1-U1+ *,CN=AD-BDC+ 0111221221 *,CN=AD-PDC+ 1022211111 *,CN=AD-TDC+ 2202233121 *,CN=DC1-E1+ 1220232221 *,CN=DC1-K1+ 2232011223 *,CN=DC1-M1+ 2131101212 *,CN=DC1-MF1+ 1122110122 *,CN=DC1-N1+ 2112222021 *,CN=DC1-S1+ 2132212202 *,CN=DC1-U1+ 1111122110 Thank you for all your assistance.