Bug 12972 - Failed to find account dn (serverReference) for DC=...
Failed to find account dn (serverReference) for DC=...
Status: NEW
Product: Samba 4.1 and newer
Classification: Unclassified
Component: AD: LDB/DSDB/SAMDB
4.7.0rc4
All All
: P5 normal
: ---
Assigned To: Tim Beale
Samba QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2017-08-17 08:20 UTC by Stefan Metzmacher
Modified: 2017-09-18 21:02 UTC (History)
3 users (show)

See Also:


Attachments
Full logs (990.38 KB, application/octet-stream)
2017-08-17 08:20 UTC, Stefan Metzmacher
no flags Details
Patches for v4-7-test (135.42 KB, patch)
2017-09-12 10:20 UTC, Stefan Metzmacher
metze: review? (abartlet)
metze: review? (garming)
Details
Proposed change-set to fix the bug on 4.7 (24.95 KB, patch)
2017-09-14 00:31 UTC, Tim Beale
no flags Details
Patch-set to fix the problem (24.83 KB, patch)
2017-09-18 21:02 UTC, Tim Beale
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Metzmacher 2017-08-17 08:20:05 UTC
Created attachment 13480 [details]
Full logs

I just found this in samba.stderr of a private autobuild
I'm not sure if this is one or more problem, but I refenctly
saw the "Failed to find account dn (serverReference) for..."
messages in a customer environment with 14 dcs and the
serverReference attribute was in fact missing on about 7 of the servers.


Failed to convert objects after retry: WERR_DS_DRA_SCHEMA_MISMATCH/NT_STATUS_INVALID_NETWORK_RESPONSE
Failed to convert objects after retry: WERR_DS_DRA_SCHEMA_MISMATCH/NT_STATUS_INVALID_NETWORK_RESPONSE
Failed to convert objects after retry: WERR_DS_DRA_SCHEMA_MISMATCH/NT_STATUS_INVALID_NETWORK_RESPONSE
../source4/dsdb/common/util.c:4807: Failed to find account dn (serverReference) for CN=PROMOTEDVDC,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=example,DC=com, parent of DSA with objectGUID bb2cb692-653e-4ab9-a04c-6abe24ff673e, sid S-1-5-21-4106630654-3360935162-1398896351-2062
../source4/rpc_server/drsuapi/updaterefs.c:371: Refusing DsReplicaUpdateRefs for sid S-1-5-21-4106630654-3360935162-1398896351-2062 with GUID bb2cb692-653e-4ab9-a04c-6abe24ff673e
UpdateRefs failed with WERR_DS_DRA_ACCESS_DENIED/NT code 0xc0002105 for bb2cb692-653e-4ab9-a04c-6abe24ff673e._msdcs.samba.example.com CN=Configuration,DC=samba,DC=example,DC=com
../source4/dsdb/common/util.c:4807: Failed to find account dn (serverReference) for CN=PROMOTEDVDC,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=example,DC=com, parent of DSA with objectGUID bb2cb692-653e-4ab9-a04c-6abe24ff673e, sid S-1-5-21-4106630654-3360935162-1398896351-2062
../source4/rpc_server/drsuapi/updaterefs.c:371: Refusing DsReplicaUpdateRefs for sid S-1-5-21-4106630654-3360935162-1398896351-2062 with GUID bb2cb692-653e-4ab9-a04c-6abe24ff673e
UpdateRefs failed with WERR_DS_DRA_ACCESS_DENIED/NT code 0xc0002105 for bb2cb692-653e-4ab9-a04c-6abe24ff673e._msdcs.samba.example.com CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com
Unable to convert 1.3.6.1.4.1.7165.4.6.1.6.9.35425.1 to an attid, and can_change_pfm=false!
../source4/dsdb/schema/schema_init.c:669: 'testAttr150293171935425dup': unable to map attributeID 1.3.6.1.4.1.7165.4.6.1.6.9.35425.1: WERR_NOT_FOUND
../source4/dsdb/schema/schema_init.c:916: dsdb_load_ldb_results_into_schema: failed to load attribute or class definition: CN=test-Attr1502931719-35425-dup,CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com:WERR_NOT_FOUND
ldb: dsdb_schema_from_db() failed: 19:Constraint violation: dsdb_schema load failed: dsdb_load_ldb_results_into_schema: failed to load attribute or class definition: CN=test-Attr1502931719-35425-dup,CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com:WERR_NOT_FOUND
dsdb_replicated_objects_commit: Failed to re-load schema after commit of transaction (working: 0x2abce9e8eef0/0, new: 0x2abce9e8eef0/0)
Failed to commit objects: WERR_INTERNAL_ERROR/NT_STATUS_INVALID_NETWORK_RESPONSE
Unable to convert 1.3.6.1.4.1.7165.4.6.1.6.9.35425.1 to an attid, and can_change_pfm=false!
../source4/dsdb/schema/schema_init.c:669: 'testAttr150293171935425dup': unable to map attributeID 1.3.6.1.4.1.7165.4.6.1.6.9.35425.1: WERR_NOT_FOUND
../source4/dsdb/schema/schema_init.c:916: dsdb_load_ldb_results_into_schema: failed to load attribute or class definition: CN=test-Attr1502931719-35425-dup,CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com:WERR_NOT_FOUND
...


The autobuild used this code:
https://git.samba.org/?p=metze/samba/wip.git;a=shortlog;h=e96478178c618b437a6ba69103f36bbe72ab261e
with "autobuild-private.sh samba samba-systemkrb5"
Comment 1 Andrew Bartlett 2017-08-17 08:36:55 UTC
Assigning to Tim as the GET_TGT work should help with this.
Comment 2 Stefan Metzmacher 2017-08-18 05:07:09 UTC
(In reply to Andrew Bartlett from comment #1)

Andrew, do you think the schema corruption is based on the same problem?
I'm seeing lot of messages like this lately:

/memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: Unable to convert 1.3.6.1.4.1.7165.4.6.1.6.20.40583 to an attid, and can_change_pfm=false!
/memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: ../source4/dsdb/schema/schema_init.c:669: 'testgeneratedlinkIDbacklink2150302021340583': unable t
o map attributeID 1.3.6.1.4.1.7165.4.6.1.6.20.40583: WERR_NOT_FOUND
/memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: ../source4/dsdb/schema/schema_init.c:916: dsdb_load_ldb_results_into_schema: failed to load attribute or class definition: CN=test-generated-linkID-backlink-21503020213-40583,CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com:WERR_NOT_FOUND
ldb: dsdb_schema_from_db() failed: 19:Constraint violation: dsdb_schema load failed: dsdb_load_ldb_results_into_schema: failed to load attribute or class definition: CN=test-generated-linkID-backlink-21503020213-40583,CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com:WERR_NOT_FOUND
/memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: dsdb_schema_from_db() failed: 19:Constraint violation: dsdb_schema load failed: dsdb_load_ldb_results_into_schema: failed to load attribute or class definition: CN=test-generated-linkID-backlink-21503020213-40583,CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com:WERR_NOT_FOUND
/memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: dsdb_get_schema: refresh_fn() failed
/memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: schema_load_init: dsdb_get_schema failed
/memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: module schema_load initialization failed : Operations error
/memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: module dsdb_notification initialization failed : Operations error
/memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: module rootdse initialization failed : Operations error
/memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: module samba_dsdb initialization failed : Operations error
/memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: Unable to load modules for tdb:///memdisk/metze/W/b548000/samba/bin/ab/ad_dc_ntvfs/private/sam.ldb: schema_load_init: dsdb_get_schema failed
/memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: Traceback (most recent call last):
/memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc:   File "/memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc", line 322, in <module>
/memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc:     kcc.load_samdb(opts.dburl, lp, creds, force=False)
/memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc:   File "bin/python/samba/kcc/__init__.py", line 2485, in load_samdb
/memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc:     (dburl, msg))
/memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: samba.kcc.kcc_utils.KCCError: Unable to open sam database tdb:///memdisk/metze/W/b548000/samba/bin/ab/ad_dc_ntvfs/private/sam.ldb : schema_load_init: dsdb_get_schema failed
../source4/dsdb/kcc/kcc_periodic.c:693: Failed samba_kcc - NT_STATUS_ACCESS_DENIED
Unable to convert 1.3.6.1.4.1.7165.4.6.1.6.20.40583 to an attid, and can_change_pfm=false!
../source4/dsdb/schema/schema_init.c:669: 'testgeneratedlinkIDbacklink2150302021340583': unable to map attributeID 1.3.6.1.4.1.7165.4.6.1.6.20.40583: WERR_NOT_FOUND
../source4/dsdb/schema/schema_init.c:916: dsdb_load_ldb_results_into_schema: failed to load attribute or class definition: CN=test-generated-linkID-backlink-21503020213-40583,CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com:WERR_NOT_FOUND
ldb: dsdb_schema_from_db() failed: 19:Constraint violation: dsdb_schema load failed: dsdb_load_ldb_results_into_schema: failed to load attribute or class definition: CN=test-generated-linkID-backlink-21503020213-40583,CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com:WERR_NOT_FOUND
Comment 3 Andrew Bartlett 2017-08-18 05:27:50 UTC
I think Garming said that was a different bug.  (Tim and Garming saw and thought about a lot of DRS bugs lately...). 

I've CC'ed Garming for further comment.
Comment 4 Garming Sam 2017-08-20 21:32:44 UTC
That looks like this bug:

https://bugzilla.samba.org/show_bug.cgi?id=12889

There are probably some patches written by Bob to address at least one particular cause of the issue (i.e. the prefixmap and schema are not fetched together).
Comment 5 Tim Beale 2017-08-31 23:13:36 UTC
We noticed this problem and should have fixed this as part of the GET_TGT work we've done. The following patch fixes it (but the change is dependent on the other GET_TGT client patches).
https://git.samba.org/?p=samba.git;a=commit;h=fae5df891c11f642cbede9e4e3d845c49c5f86b8

What's happening in this case is the serverReference linked attribute spans 2 different partitions. Depending on the order in which the partitions get replicated, the DC can receive the linked attribute before it receives the target object. When this happens, it can't resolve the target so it ends up dropping/ignoring the linked attribute.

Unfortunately, adding GET_TGT support wasn't enough to fix the problem completely (requesting that the link target gets re-sent doesn't help because the target object is in a different partition). The code change we made was to continue to add the forward-link in the case where a cross-partition target was unknown. Note that the backlink will still be missing, but the linked attribute is no longer dropped completely. Running 'samba-tool dbcheck' can then detect and fix the missing backlink. 

The missing serverReference problem should only potentially occur when installing a new DC. Although the same problem could potentially occur if other linked atributes span partitions.
Comment 6 Stefan Metzmacher 2017-09-12 10:20:14 UTC
Created attachment 13581 [details]
Patches for v4-7-test
Comment 7 Tim Beale 2017-09-12 21:28:51 UTC
Comment on attachment 13581 [details]
Patches for v4-7-test

I'm not sure that the full set of GET_TGT client patches is needed to fix this bug in 4.7. I can take a look if we can come up with a smaller patch-set.

Also note there's another client-side patch that's still pending review/delivery.
http://git.catalyst.net.nz/gw?p=samba.git;a=commit;h=2833cf21af72cd6d18488cbb08c736e7c57a7d3a
Without this, I think replication could potentially break when upgrading from an older Samba release.
Comment 8 Tim Beale 2017-09-14 00:31:00 UTC
Created attachment 13595 [details]
Proposed change-set to fix the bug on 4.7

The attached patches should hopefully fix the bug on 4.7. I've backported the test case to highlight the problem, and reworked the 4.7 repl_meta_data.c code so we can just fix the bug without backporting all the GET_TGT client changes. Just running the changes through autobuild now.
Comment 9 Tim Beale 2017-09-18 21:02:47 UTC
Created attachment 13605 [details]
Patch-set to fix the problem

The patches ran through autobuild OK. I've updated the attachment (just to tweak the commit description). I'm happy with them, and in my opinion they're ready to go in.

Note that most of these patches aren't exact cherry-picks of the commits in master. I've tried to backport the logic to match master as close as I can. In theory, the fix for cross-partition links should be independent of the GET_TGT changes. However, in master the bug-fix was interleaved with all the GET_TGT refactors. The alternative option is to cherry-pick ~30 patches that add GET_TGT client-side support into 4.7.