Bug 13335 - after update to 4.8.0 DC failed with "Failed to find our own NTDS Settings objectGUID"
Summary: after update to 4.8.0 DC failed with "Failed to find our own NTDS Settings ob...
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: AD: LDB/DSDB/SAMDB (show other bugs)
Version: 4.8.0
Hardware: All Linux
: P1 regression (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-15 16:20 UTC by Alexey Vekshin
Modified: 2018-06-07 14:03 UTC (History)
6 users (show)

See Also:


Attachments
patch for 4.8 cherry-picked and adapted from master (75.35 KB, patch)
2018-04-09 08:42 UTC, Andrew Bartlett
gary: review+
metze: review-
Details
backported patch for 4.8 (98.61 KB, patch)
2018-04-29 23:19 UTC, Andrew Bartlett
abartlet: review? (gary)
metze: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alexey Vekshin 2018-03-15 16:20:19 UTC
Just build new 4.8.0 from source to update from 4.7.5; in test environment upgrade went OK, so I've shut down one of production controllers, updated and restarted it and after restart sam.ldb appears to be completely broken:

1) log.samba from startup:
=============================
[2018/03/15 20:35:11.246214,  0, pid=16865, effective(0, 0), real(0, 0)] ../source4/smbd/server.c:466(binary_smbd_main)
  samba version 4.8.0-4.8.0SUSE-SLE_11-x86_64 started.
  Copyright Andrew Tridgell and the Samba Team 1992-2018
[2018/03/15 20:35:12.608982,  0, pid=16866, effective(0, 0), real(0, 0)] ../source4/smbd/server.c:638(binary_smbd_main)
  binary_smbd_main: samba: using 'standard' process model
[2018/03/15 20:35:12.624095,  0, pid=16869, effective(0, 0), real(0, 0)] ../source4/dsdb/common/util.c:1755(samdb_reference_dn_is_our_ntdsa)
  Failed to find object DC=ad,DC=maxidom,DC=ru for attribute fsmoRoleOwner - Cannot find DN DC=ad,DC=maxidom,DC=ru to get attribute fsmoRoleOwner for reference dn: No such Base DN: DC=ad,DC=maxidom,DC=ru
[2018/03/15 20:35:12.632163,  1, pid=16869, effective(0, 0), real(0, 0)] ../source4/dsdb/common/util.c:1939(samdb_is_pdc)
  Failed to find if we are the PDC for this ldb: Searching for fSMORoleOwner in DC=ad,DC=maxidom,DC=ru failed: Cannot find DN DC=ad,DC=maxidom,DC=ru to get attribute fsmoRoleOwner for reference dn: No such Base DN: DC=ad,DC=maxidom,DC=ru
[2018/03/15 20:35:12.643361,  1, pid=16875, effective(0, 0), real(0, 0)] ../source4/dsdb/common/util.c:1359(samdb_ntds_settings_dn)
  Searching for dsServiceName in rootDSE failed: operations error at ../source4/dsdb/samdb/ldb_modules/rootdse.c:516
[2018/03/15 20:35:12.643848,  1, pid=16875, effective(0, 0), real(0, 0)] ../source4/dsdb/common/util.c:1379(samdb_ntds_settings_dn)
  Failed to find our own NTDS Settings DN in the ldb!
[2018/03/15 20:35:12.644456,  1, pid=16875, effective(0, 0), real(0, 0)] ../source4/dsdb/common/util.c:1536(samdb_ntds_objectGUID)
  Failed to find our own NTDS Settings objectGUID in the ldb!
[2018/03/15 20:35:12.644833,  1, pid=16875, effective(0, 0), real(0, 0)] ../source4/kdc/kdc-heimdal.c:319(kdc_task_init)
  kdc_task_init: Cannot determine if we are an RODC: operations error at ../source4/dsdb/common/util.c:3470
[2018/03/15 20:35:12.645185,  0, pid=16875, effective(0, 0), real(0, 0)] ../source4/smbd/service_task.c:36(task_server_terminate)
  task_server_terminate: task_server_terminate: [kdc: krb5_init_context samdb RODC connect failed]
[2018/03/15 20:35:12.648211,  1, pid=16876, effective(0, 0), real(0, 0)] ../source4/dsdb/common/util.c:1359(samdb_ntds_settings_dn)
  Searching for dsServiceName in rootDSE failed: operations error at ../source4/dsdb/samdb/ldb_modules/rootdse.c:516
[2018/03/15 20:35:12.648692,  1, pid=16876, effective(0, 0), real(0, 0)] ../source4/dsdb/common/util.c:1379(samdb_ntds_settings_dn)
  Failed to find our own NTDS Settings DN in the ldb!
[2018/03/15 20:35:12.649284,  1, pid=16876, effective(0, 0), real(0, 0)] ../source4/dsdb/common/util.c:1536(samdb_ntds_objectGUID)
  Failed to find our own NTDS Settings objectGUID in the ldb!
[2018/03/15 20:35:12.649651,  0, pid=16876, effective(0, 0), real(0, 0)] ../source4/smbd/service_task.c:36(task_server_terminate)
  task_server_terminate: task_server_terminate: [dreplsrv: Failed to connect to local samdb: WERR_DS_UNAVAILABLE
  ]
=============================

2) samba-tool dbcheck
=============================
Unable to determine the DomainSID, can not enforce uniqueness constraint on local domainSIDs

Searching for dsServiceName in rootDSE failed: operations error at ../source4/dsdb/samdb/ldb_modules/rootdse.c:516
Failed to find our own NTDS Settings DN in the ldb!
Searching for dsServiceName in rootDSE failed: operations error at ../source4/dsdb/samdb/ldb_modules/rootdse.c:516
Failed to find our own NTDS Settings DN in the ldb!
ERROR(ldb): uncaught exception - operations error at ../source4/dsdb/samdb/ldb_modules/rootdse.c:516
  File "/usr/lib64/python2.6/site-packages/samba/netcmd/__init__.py", line 176, in _run
    return self.run(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/samba/netcmd/dbcheck.py", line 135, in run
    reset_well_known_acls=reset_well_known_acls)
  File "/usr/lib64/python2.6/site-packages/samba/dbchecker.py", line 95, in __init__
    self.ntds_dsa = ldb.Dn(samdb, samdb.get_dsServiceName())
  File "/usr/lib64/python2.6/site-packages/samba/samdb.py", line 943, in get_dsServiceName
    res = self.search(base="", scope=ldb.SCOPE_BASE, attrs=["dsServiceName"])
=============================
Comment 1 Mantas Mikulėnas (grawity) 2018-03-24 11:36:04 UTC
Can confirm; even though ldbsearch seems to show all entries as before, samba is unable to find them:

Mar 24 13:23:39 radius samba[3696]: [2018/03/24 13:23:39,  0] ../source4/smbd/server.c:638(binary_smbd_main)
Mar 24 13:23:39 radius samba[3696]:   binary_smbd_main: samba: using 'standard' process model
Mar 24 13:23:39 radius samba[3760]: [2018/03/24 13:23:39,  1] ../source4/kdc/db-glue.c:2854(samba_kdc_setup_db_ctx)
Mar 24 13:23:39 radius samba[3760]:   samba_kdc_fetch: could not find own KRBTGT in DB: dsdb_search at ../source4/dsdb/common/util.c:4641
Mar 24 13:23:39 radius samba[3760]: [2018/03/24 13:23:39,  0] ../source4/smbd/service_task.c:36(task_server_terminate)
Mar 24 13:23:39 radius samba[3760]:   task_server_terminate: task_server_terminate: [kdc: hdb_samba4_create_kdc (setup KDC database) failed]
Mar 24 13:23:39 radius samba[3696]: [2018/03/24 13:23:39,  0] ../lib/util/become_daemon.c:138(daemon_ready)
Mar 24 13:23:39 radius samba[3696]:   daemon_ready: STATUS=daemon 'samba' finished starting up and ready to serve connections
Comment 2 Andrew Bartlett 2018-03-26 04:23:17 UTC
I have developed patches to prevent the index corruption and combined with 5c1504b94d1417894176811f18c5d450de22cfd2 we should be able to cope with this and other index issues. 

I'll continue to work on this.
Comment 3 Andrew Bartlett 2018-04-05 18:54:20 UTC
(In reply to Andrew Bartlett from comment #2)
Just to update those following this bug, I'm continuing to work on this (mostly the safety to ensure it can't happen again).  I've got patches into master and now I'm just backporting them with tests.
Comment 4 Andrew Bartlett 2018-04-09 08:42:59 UTC
Created attachment 14114 [details]
patch for 4.8 cherry-picked and adapted from master

Proposed patch for Samba 4.8
Comment 5 Stefan Metzmacher 2018-04-10 07:59:47 UTC
Comment on attachment 14114 [details]
patch for 4.8 cherry-picked and adapted from master

Andrew, is this everything ldb related that's need backporting?
If so we need to change the ldb version and create an ldb 1.3.3
before the next 4.8 release.
Comment 6 Björn Baumbach 2018-04-10 13:30:54 UTC
(In reply to Andrew Bartlett from comment #4)

Hi Andrew,
a week ago you have mentioned a "serious issue breaking upgrades to samba
4.8" on Samba-Technical and provided a "Refuse to commit a faulty reindex" patch. Is this issue also covered by these patches?
If not, shouldn't we add this here?
Comment 7 Andrew Bartlett 2018-04-10 18:46:56 UTC
(In reply to Björn Baumbach from comment #6)
That is the bulk of this patch.

The fix for the 'actual' issue is thought to be 

[PATCH 5/5] ldb_tdb: Do not fail in GUID index mode if there is a
 duplicate attribute

I would like feedback on the patch from a user with a DB impacted by this issue, as I didn't have one to hand (instead the tests create what I thought was the situation). 

Finally, on retrospect that commit should probably be first in the series, but I'll wait until I get feedback before I re-do the series as metze also wanted a release commit made.
Comment 8 Mantas Mikulėnas (grawity) 2018-04-12 18:23:49 UTC
(In reply to Andrew Bartlett from comment #7)

Initial tests with v4-8-stable + the attached patch seem to be successful – Samba starts up, Bind9 DLZ works, Kerberos works, LDAP search returns the expected number of entries.
Comment 9 Jeremy Allison 2018-04-26 16:26:45 UTC
Marking as regression/blocker. We shouldn't ship another 4.8.x without the fix.
Comment 10 Stefan Metzmacher 2018-04-26 19:15:53 UTC
Comment on attachment 14114 [details]
patch for 4.8 cherry-picked and adapted from master

We need a patchset that lets us do a new ldb release see comments 5,6,7
and I actually want to understand what's going on here, there seems to be
a lot a patches which seem to be related, but are not included.
Comment 11 Andrew Bartlett 2018-04-29 23:13:26 UTC
(In reply to Stefan Metzmacher from comment #10)
The patch in attachment 14114 [details] is the set required to fix the issue, indeed only one patch is really needed, the mentioned last patch. 

Can you give more detail on what you would like to understand further about this issue?

The intention here is to have the minimum set of changes for the urgent backport, without bringing in the whole re-work for LMDB.  What other commits from master do you want?

Thanks,

Andrew Bartlett
Comment 12 Andrew Bartlett 2018-04-29 23:19:19 UTC
Created attachment 14166 [details]
backported patch for 4.8

This patch adds the release files and does the re-order as requested.
Comment 13 Stefan Metzmacher 2018-05-02 08:57:12 UTC
(In reply to Andrew Bartlett from comment #12)

Thanks! Pushed to autobuild-v4-8-test
Comment 14 Karolin Seeger 2018-05-07 07:04:30 UTC
(In reply to Stefan Metzmacher from comment #13)
Pushed to v4-8-test.
Closing out bug report.

Thanks!
Comment 15 Andrew Bartlett 2018-05-17 01:32:08 UTC
(In reply to Alexey Vekshin from comment #0)
Can you run the new Samba 4.8.2 release and see how it goes for you now?

You may need to run the sambaundoguididx script to get it back before you upgrade.

Thanks,

Andrew Bartlett
Comment 16 Alexey Vekshin 2018-05-21 12:43:55 UTC
(In reply to Andrew Bartlett from comment #15)

Thank you for your efforts! 
I'll re-try in-place upgrade with 4.8.2 and report on results. 

> You may need to run the sambaundoguididx script to get it back before you upgrade.

To get back _what_? :)

Now I'm domain with 9 DCs on 4.7.1 (self-built RPMs, mostly based on SLES specs). Do I need to upgrade libtalloc (2.11), libevent (0.9.36), tdb (1.3.15) and ldb (1.3.2) to current versions too?
Comment 17 Andrew Bartlett 2018-05-21 22:43:57 UTC
(In reply to Alexey Vekshin from comment #16)
To get the DB back into the pre-upgrade state you will need to run sambaundoguididx, so it can try the upgrade again. 

However, as you have 9 DCs, I would suggest joining the domain rather then upgrading, it works better (gains features like sortedLinks and encryptedSecrets during the join). 

Samba requires exactly matching versions of talloc, tdb, et al as when the release was shipped so yes.
Comment 18 Alexey Vekshin 2018-06-07 14:03:47 UTC
(In reply to Andrew Bartlett from comment #17)

I've tested both new join and in-place upgrade from 4.7.6 to 4.8.2 and it worked flawlessly. New replication visualisation also works with python 2.6 :)

NTDS Connections known to each destination DC
                          destination
              ,---------- *,CN=AD-BDC+
              |,--------- *,CN=AD-PDC+
              ||,-------- *,CN=AD-TDC+
              |||,------- *,CN=DC1-E1+
              ||||,------ *,CN=DC1-K1+
              |||||,----- *,CN=DC1-M1+
              ||||||,---- *,CN=DC1-MF1+
              |||||||,--- *,CN=DC1-N1+
              ||||||||,-- *,CN=DC1-S1+
       source |||||||||,- *,CN=DC1-U1+
 *,CN=AD-BDC+ 0111221221
 *,CN=AD-PDC+ 1022211111
 *,CN=AD-TDC+ 2202233121
 *,CN=DC1-E1+ 1220232221
 *,CN=DC1-K1+ 2232011223
 *,CN=DC1-M1+ 2131101212
*,CN=DC1-MF1+ 1122110122
 *,CN=DC1-N1+ 2112222021
 *,CN=DC1-S1+ 2132212202
 *,CN=DC1-U1+ 1111122110

Thank you for all your assistance.