Created attachment 13480 [details] Full logs I just found this in samba.stderr of a private autobuild I'm not sure if this is one or more problem, but I refenctly saw the "Failed to find account dn (serverReference) for..." messages in a customer environment with 14 dcs and the serverReference attribute was in fact missing on about 7 of the servers. Failed to convert objects after retry: WERR_DS_DRA_SCHEMA_MISMATCH/NT_STATUS_INVALID_NETWORK_RESPONSE Failed to convert objects after retry: WERR_DS_DRA_SCHEMA_MISMATCH/NT_STATUS_INVALID_NETWORK_RESPONSE Failed to convert objects after retry: WERR_DS_DRA_SCHEMA_MISMATCH/NT_STATUS_INVALID_NETWORK_RESPONSE ../source4/dsdb/common/util.c:4807: Failed to find account dn (serverReference) for CN=PROMOTEDVDC,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=example,DC=com, parent of DSA with objectGUID bb2cb692-653e-4ab9-a04c-6abe24ff673e, sid S-1-5-21-4106630654-3360935162-1398896351-2062 ../source4/rpc_server/drsuapi/updaterefs.c:371: Refusing DsReplicaUpdateRefs for sid S-1-5-21-4106630654-3360935162-1398896351-2062 with GUID bb2cb692-653e-4ab9-a04c-6abe24ff673e UpdateRefs failed with WERR_DS_DRA_ACCESS_DENIED/NT code 0xc0002105 for bb2cb692-653e-4ab9-a04c-6abe24ff673e._msdcs.samba.example.com CN=Configuration,DC=samba,DC=example,DC=com ../source4/dsdb/common/util.c:4807: Failed to find account dn (serverReference) for CN=PROMOTEDVDC,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=samba,DC=example,DC=com, parent of DSA with objectGUID bb2cb692-653e-4ab9-a04c-6abe24ff673e, sid S-1-5-21-4106630654-3360935162-1398896351-2062 ../source4/rpc_server/drsuapi/updaterefs.c:371: Refusing DsReplicaUpdateRefs for sid S-1-5-21-4106630654-3360935162-1398896351-2062 with GUID bb2cb692-653e-4ab9-a04c-6abe24ff673e UpdateRefs failed with WERR_DS_DRA_ACCESS_DENIED/NT code 0xc0002105 for bb2cb692-653e-4ab9-a04c-6abe24ff673e._msdcs.samba.example.com CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com Unable to convert 1.3.6.1.4.1.7165.4.6.1.6.9.35425.1 to an attid, and can_change_pfm=false! ../source4/dsdb/schema/schema_init.c:669: 'testAttr150293171935425dup': unable to map attributeID 1.3.6.1.4.1.7165.4.6.1.6.9.35425.1: WERR_NOT_FOUND ../source4/dsdb/schema/schema_init.c:916: dsdb_load_ldb_results_into_schema: failed to load attribute or class definition: CN=test-Attr1502931719-35425-dup,CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com:WERR_NOT_FOUND ldb: dsdb_schema_from_db() failed: 19:Constraint violation: dsdb_schema load failed: dsdb_load_ldb_results_into_schema: failed to load attribute or class definition: CN=test-Attr1502931719-35425-dup,CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com:WERR_NOT_FOUND dsdb_replicated_objects_commit: Failed to re-load schema after commit of transaction (working: 0x2abce9e8eef0/0, new: 0x2abce9e8eef0/0) Failed to commit objects: WERR_INTERNAL_ERROR/NT_STATUS_INVALID_NETWORK_RESPONSE Unable to convert 1.3.6.1.4.1.7165.4.6.1.6.9.35425.1 to an attid, and can_change_pfm=false! ../source4/dsdb/schema/schema_init.c:669: 'testAttr150293171935425dup': unable to map attributeID 1.3.6.1.4.1.7165.4.6.1.6.9.35425.1: WERR_NOT_FOUND ../source4/dsdb/schema/schema_init.c:916: dsdb_load_ldb_results_into_schema: failed to load attribute or class definition: CN=test-Attr1502931719-35425-dup,CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com:WERR_NOT_FOUND ... The autobuild used this code: https://git.samba.org/?p=metze/samba/wip.git;a=shortlog;h=e96478178c618b437a6ba69103f36bbe72ab261e with "autobuild-private.sh samba samba-systemkrb5"
Assigning to Tim as the GET_TGT work should help with this.
(In reply to Andrew Bartlett from comment #1) Andrew, do you think the schema corruption is based on the same problem? I'm seeing lot of messages like this lately: /memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: Unable to convert 1.3.6.1.4.1.7165.4.6.1.6.20.40583 to an attid, and can_change_pfm=false! /memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: ../source4/dsdb/schema/schema_init.c:669: 'testgeneratedlinkIDbacklink2150302021340583': unable t o map attributeID 1.3.6.1.4.1.7165.4.6.1.6.20.40583: WERR_NOT_FOUND /memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: ../source4/dsdb/schema/schema_init.c:916: dsdb_load_ldb_results_into_schema: failed to load attribute or class definition: CN=test-generated-linkID-backlink-21503020213-40583,CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com:WERR_NOT_FOUND ldb: dsdb_schema_from_db() failed: 19:Constraint violation: dsdb_schema load failed: dsdb_load_ldb_results_into_schema: failed to load attribute or class definition: CN=test-generated-linkID-backlink-21503020213-40583,CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com:WERR_NOT_FOUND /memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: dsdb_schema_from_db() failed: 19:Constraint violation: dsdb_schema load failed: dsdb_load_ldb_results_into_schema: failed to load attribute or class definition: CN=test-generated-linkID-backlink-21503020213-40583,CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com:WERR_NOT_FOUND /memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: dsdb_get_schema: refresh_fn() failed /memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: schema_load_init: dsdb_get_schema failed /memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: module schema_load initialization failed : Operations error /memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: module dsdb_notification initialization failed : Operations error /memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: module rootdse initialization failed : Operations error /memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: module samba_dsdb initialization failed : Operations error /memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: Unable to load modules for tdb:///memdisk/metze/W/b548000/samba/bin/ab/ad_dc_ntvfs/private/sam.ldb: schema_load_init: dsdb_get_schema failed /memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: Traceback (most recent call last): /memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: File "/memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc", line 322, in <module> /memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: kcc.load_samdb(opts.dburl, lp, creds, force=False) /memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: File "bin/python/samba/kcc/__init__.py", line 2485, in load_samdb /memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: (dburl, msg)) /memdisk/metze/W/b548000/samba/source4/scripting/bin/samba_kcc: samba.kcc.kcc_utils.KCCError: Unable to open sam database tdb:///memdisk/metze/W/b548000/samba/bin/ab/ad_dc_ntvfs/private/sam.ldb : schema_load_init: dsdb_get_schema failed ../source4/dsdb/kcc/kcc_periodic.c:693: Failed samba_kcc - NT_STATUS_ACCESS_DENIED Unable to convert 1.3.6.1.4.1.7165.4.6.1.6.20.40583 to an attid, and can_change_pfm=false! ../source4/dsdb/schema/schema_init.c:669: 'testgeneratedlinkIDbacklink2150302021340583': unable to map attributeID 1.3.6.1.4.1.7165.4.6.1.6.20.40583: WERR_NOT_FOUND ../source4/dsdb/schema/schema_init.c:916: dsdb_load_ldb_results_into_schema: failed to load attribute or class definition: CN=test-generated-linkID-backlink-21503020213-40583,CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com:WERR_NOT_FOUND ldb: dsdb_schema_from_db() failed: 19:Constraint violation: dsdb_schema load failed: dsdb_load_ldb_results_into_schema: failed to load attribute or class definition: CN=test-generated-linkID-backlink-21503020213-40583,CN=Schema,CN=Configuration,DC=samba,DC=example,DC=com:WERR_NOT_FOUND
I think Garming said that was a different bug. (Tim and Garming saw and thought about a lot of DRS bugs lately...). I've CC'ed Garming for further comment.
That looks like this bug: https://bugzilla.samba.org/show_bug.cgi?id=12889 There are probably some patches written by Bob to address at least one particular cause of the issue (i.e. the prefixmap and schema are not fetched together).
We noticed this problem and should have fixed this as part of the GET_TGT work we've done. The following patch fixes it (but the change is dependent on the other GET_TGT client patches). https://git.samba.org/?p=samba.git;a=commit;h=fae5df891c11f642cbede9e4e3d845c49c5f86b8 What's happening in this case is the serverReference linked attribute spans 2 different partitions. Depending on the order in which the partitions get replicated, the DC can receive the linked attribute before it receives the target object. When this happens, it can't resolve the target so it ends up dropping/ignoring the linked attribute. Unfortunately, adding GET_TGT support wasn't enough to fix the problem completely (requesting that the link target gets re-sent doesn't help because the target object is in a different partition). The code change we made was to continue to add the forward-link in the case where a cross-partition target was unknown. Note that the backlink will still be missing, but the linked attribute is no longer dropped completely. Running 'samba-tool dbcheck' can then detect and fix the missing backlink. The missing serverReference problem should only potentially occur when installing a new DC. Although the same problem could potentially occur if other linked atributes span partitions.
Created attachment 13581 [details] Patches for v4-7-test
Comment on attachment 13581 [details] Patches for v4-7-test I'm not sure that the full set of GET_TGT client patches is needed to fix this bug in 4.7. I can take a look if we can come up with a smaller patch-set. Also note there's another client-side patch that's still pending review/delivery. http://git.catalyst.net.nz/gw?p=samba.git;a=commit;h=2833cf21af72cd6d18488cbb08c736e7c57a7d3a Without this, I think replication could potentially break when upgrading from an older Samba release.
Created attachment 13595 [details] Proposed change-set to fix the bug on 4.7 The attached patches should hopefully fix the bug on 4.7. I've backported the test case to highlight the problem, and reworked the 4.7 repl_meta_data.c code so we can just fix the bug without backporting all the GET_TGT client changes. Just running the changes through autobuild now.
Created attachment 13605 [details] Patch-set to fix the problem The patches ran through autobuild OK. I've updated the attachment (just to tweak the commit description). I'm happy with them, and in my opinion they're ready to go in. Note that most of these patches aren't exact cherry-picks of the commits in master. I've tried to backport the logic to match master as close as I can. In theory, the fix for cross-partition links should be independent of the GET_TGT changes. However, in master the bug-fix was interleaved with all the GET_TGT refactors. The alternative option is to cherry-pick ~30 patches that add GET_TGT client-side support into 4.7.
Closing this now, as it should be fixed in v4.8 onwards and v4.7 is EOL'd.