Bug 10973 - No objectClass found in replPropertyMetaData on ordinary objects (non-deleted)
No objectClass found in replPropertyMetaData on ordinary objects (non-deleted)
Status: RESOLVED FIXED
Product: Samba 4.1 and newer
Classification: Unclassified
Component: AD: LDB/DSDB/SAMDB
4.2.0rc2
All All
: P5 normal
: 4.3
Assigned To: Andrew Bartlett
Samba QA Contact
:
Depends on: 11429 11483
Blocks:
  Show dependency treegraph
 
Reported: 2014-11-28 11:32 UTC by piotrb
Modified: 2015-11-16 08:54 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description piotrb 2014-11-28 11:32:00 UTC
Hello,


during join as a DC (4.11) to 4.2.0rc2 comes


Replicating critical objects from the base DN of the domain
Partition[DC=example,DC=de] objects[101/101] linked_values[33/0]
Partition[DC=example,DC=de] objects[503/1084] linked_values[0/0]
Partition[DC=example,DC=de] objects[905/1084] linked_values[0/0]
Partition[DC=example,DC=de] objects[1185/1084] linked_values[1043/0]
No objectClass found in replPropertyMetaData for CN=HS_Mail_Einkauf,CN=Groups,DC=example,DC=de!

Failed to apply records: replmd_replicated_apply_add: error during DRS repl ADD: No objectClass found in replPropertyMetaData for CN=examplegroup,CN=Groups,DC=example,DC=de!
: Object class violation
Failed to commit objects: WERR_GENERAL_FAILURE
Join failed - cleaning up
checking sAMAccountName
Deleted CN=AD01,OU=Domain Controllers,DC=example,DC=de
Deleted CN=NTDS Settings,CN=AD01,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=example,DC=de
Deleted CN=AD01,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=example,DC=de
ERROR(<type 'exceptions.TypeError'>): uncaught exception - Failed to process chunk: NT_STATUS_UNSUCCESSFUL
  File "/usr/lib/python2.7/dist-packages/samba/netcmd/__init__.py", line 175, in _run
    return self.run(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/samba/netcmd/domain.py", line 555, in run
    machinepass=machinepass, use_ntvfs=use_ntvfs, dns_backend=dns_backend)
  File "/usr/lib/python2.7/dist-packages/samba/join.py", line 1172, in join_DC
    ctx.do_join()
  File "/usr/lib/python2.7/dist-packages/samba/join.py", line 1077, in do_join
    ctx.join_replicate()
  File "/usr/lib/python2.7/dist-packages/samba/join.py", line 817, in join_replicate
    replica_flags=ctx.domain_replica_flags)
  File "/usr/lib/python2.7/dist-packages/samba/drs_utils.py", line 256, in replicate
    schema=schema, req_level=req_level, req=req)


Any idea?

P.S. There is no deleted objects in AD (tombstone content is deleted within one day)

Greetings
Piotr
Comment 1 Peter Gerritsen 2014-12-17 15:35:41 UTC
We are experiencing the same problems while joining an eight DC to a domain. This happens when using 4.1.11 for the join, the rest of the DC's are running 4.1.4. The error is exactly the same except that is happens on a computer account in our case. We do have custom objectclasses on the computer accounts, but this has never stopped us from join this domain before with a new DC.
Comment 2 Peter Gerritsen 2014-12-23 00:09:22 UTC
Tried upgrading all DC's to 4.1.11 wheezy-backports to nu avail. Have changed properties on all users/groups/computers (by adding info to a certain field) but the problem return. Any suggestions?

Output from join:
root@ad-custloc:~# samba-tool domain join customer.andolan DC --realm=customer.andolan --dns-backend=BIND9_DLZ -U adminuser
Finding a writeable DC for domain 'customer.andolan'
Found DC gc.customer.andolan
Password for [customer\adminuser]:
workgroup is customer
realm is customer.andolan
checking sAMAccountName
Adding CN=AD-custloc,OU=Domain Controllers,DC=customer,DC=andolan
Adding CN=AD-custloc,CN=Servers,CN=Customercustloc,CN=Sites,CN=Configuration,DC=customer,DC=andolan
Adding CN=NTDS Settings,CN=AD-custloc,CN=Servers,CN=Customercustloc,CN=Sites,CN=Configuration,DC=customer,DC=andolan
Adding SPNs to CN=AD-custloc,OU=Domain Controllers,DC=customer,DC=andolan
Setting account password for AD-custloc$
Enabling account
Adding DNS account CN=dns-AD-custloc,CN=Users,DC=customer,DC=andolan with dns/ SPN
Setting account password for dns-AD-custloc
Calling bare provision
No IPv6 address will be assigned
Provision OK for domain DN DC=customer,DC=andolan
Starting replication
Schema-DN[CN=Schema,CN=Configuration,DC=customer,DC=andolan] objects[402/1603] linked_values[0/0]
Schema-DN[CN=Schema,CN=Configuration,DC=customer,DC=andolan] objects[804/1603] linked_values[0/0]
Schema-DN[CN=Schema,CN=Configuration,DC=customer,DC=andolan] objects[1206/1603] linked_values[0/0]
Schema-DN[CN=Schema,CN=Configuration,DC=customer,DC=andolan] objects[1603/1603] linked_values[0/0]
Analyze and apply schema objects
Partition[CN=Configuration,DC=customer,DC=andolan] objects[402/2084] linked_values[0/0]
Partition[CN=Configuration,DC=customer,DC=andolan] objects[804/2084] linked_values[0/0]
Partition[CN=Configuration,DC=customer,DC=andolan] objects[1206/2084] linked_values[0/0]
Partition[CN=Configuration,DC=customer,DC=andolan] objects[1608/2084] linked_values[0/0]
Partition[CN=Configuration,DC=customer,DC=andolan] objects[2010/2084] linked_values[0/0]
Partition[CN=Configuration,DC=customer,DC=andolan] objects[2084/2084] linked_values[104/0]
Replicating critical objects from the base DN of the domain
Partition[DC=customer,DC=andolan] objects[104/104] linked_values[794/0]
Partition[DC=customer,DC=andolan] objects[506/1961] linked_values[0/0]
Partition[DC=customer,DC=andolan] objects[908/1961] linked_values[0/0]
No objectClass found in replPropertyMetaData for CN=Marloes,OU=Users,OU=custloc3,OU=Locations,DC=customer,DC=andolan!

Failed to apply records: replmd_replicated_apply_add: error during DRS repl ADD: No objectClass found in replPropertyMetaData for CN=Marloes,OU=Users,OU=custloc3,OU=Locations,DC=customer,DC=andolan!
: Object class violation
Failed to commit objects: WERR_GENERAL_FAILURE
Join failed - cleaning up
checking sAMAccountName
Deleted CN=AD-custloc,OU=Domain Controllers,DC=customer,DC=andolan
Deleted CN=dns-AD-custloc,CN=Users,DC=customer,DC=andolan
Deleted CN=NTDS Settings,CN=AD-custloc,CN=Servers,CN=Customercustloc,CN=Sites,CN=Configuration,DC=customer,DC=andolan
Deleted CN=AD-custloc,CN=Servers,CN=Customercustloc,CN=Sites,CN=Configuration,DC=customer,DC=andolan
ERROR(<type 'exceptions.TypeError'>): uncaught exception - Failed to process chunk: NT_STATUS_UNSUCCESSFUL
  File "/usr/lib/python2.7/dist-packages/samba/netcmd/__init__.py", line 175, in _run
    return self.run(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/samba/netcmd/domain.py", line 555, in run
    machinepass=machinepass, use_ntvfs=use_ntvfs, dns_backend=dns_backend)
  File "/usr/lib/python2.7/dist-packages/samba/join.py", line 1172, in join_DC
    ctx.do_join()
  File "/usr/lib/python2.7/dist-packages/samba/join.py", line 1077, in do_join
    ctx.join_replicate()
  File "/usr/lib/python2.7/dist-packages/samba/join.py", line 817, in join_replicate
    replica_flags=ctx.domain_replica_flags)
  File "/usr/lib/python2.7/dist-packages/samba/drs_utils.py", line 256, in replicate
    schema=schema, req_level=req_level, req=req)
Comment 3 Peter Gerritsen 2014-12-28 12:17:55 UTC
Ok this seems te be a bug from 4.1.11 or maybe 4.1.10. I actaully downgraded the samba version on the new joining dc to 4.1.9 and then it succesfully joined the domain with no problems at all. After everything was live i upgraded 4.1.9 to 4.1.11 and it's still fine now. Dbcheck on 4.1.11 versions doesn't show any problems.

Regards,
Peter
Comment 4 Peter Gerritsen 2014-12-28 12:19:57 UTC
(In reply to Peter Gerritsen from comment #3)

Btw the problem did not exist on a clean created test environment with all 4.1.11 versions, i could join as many servers as i wanted without any problems. Directory was empty except for default accounts and ou's, sites and servers that joined...
Comment 5 Andrew Bartlett 2014-12-29 07:48:42 UTC
This error is deliberately placed, to prevent further replication of corrupt dateabases, and using an earlier version just hides the error, that was put in place just hides the error, causing it to happen later.

You have a partially replicated object.  Newer Samba versions have dbcheck and replication checks to ensure that this doesn't get propogated further. 

There are hints in the dbcheck error (when detecting this object) on how to proceed, but in short we just hope you have another DC with the full object, and suggest you force re-replication from the DC.
Comment 6 Peter Gerritsen 2015-01-03 10:17:58 UTC
Hi Andrew, Thanks for the information, although everything is working fine now all replications look good, kcc is succesfull on all dc's, showrepl on all dc's show no errors and dbcheck doesn't see any problems on the items it stopped at all? I do see normalisation errors but if run with --fix they do not go away. It seems samba-tool dbcheck --fix can't solve this issue:

ERROR: Normalisation error for attribute 'objectClass' in 'CN=PC000-105,OU=Workstations,OU=Computers,OU=Locationname,OU=Locations,DC=Client,DC=andolan'
Values/Order of values do/does not match: ['top', 'ieee802Device', 'andolanMachine', 'person', 'organizationalPerson', 'user', 'computer']/['top', 'andolanMachine', 'ieee802Device', 'person', 'organizationalPerson', 'user', 'computer']!
Not fixing attribute 'objectClass'

This error started to show up after added ieee802Device to the machine accounts, could this be the culprit then? I can't imagine since the object works fine over all dc's and it's not an object is stopped on during join.

I do like to fix the database problem if in anyway possible! Btw joining to any of the other dc's didn't work with the same error!

Thanks,
Peter
Comment 7 Andrew Bartlett 2015-01-03 23:19:42 UTC
The objectclass normalisation error is a differnt issue, which is only in the dbcheck tool, not in the data.  Our objectclass sorting isn't deterministic with auxillary classes, and we don't recognise that.
Comment 8 Arvid Requate 2015-07-15 16:01:06 UTC
I had a look at this again in two cases. In both cases the objectClass attribute itself was present, only the corresponding entry in replPropertyMetaData was not sorted to be at position #0 in the replPropertyMetaDataCtr1 array.

In one of the environments where this was observed, there where two Samba DCs and the first one initially was "provisioned" by joining it into a native AD (SBS). After that the native AD server was decommissioned and after that the second Samba DC was joined. This join probably worked because it was done with a Samba version which did not yet contain commit 9c9df40220234cba973e84b4985d90da1334a1d1.

I looked at one of the "broken" objects and the "originating_invocation_id" indicated that it the last change came from the native AD (with high probability... I could not verify the invocationID but the timestamps also point into that direction). So I cannot verify if replPropertyMetadata was sorted properly on the AD server and the order was changed during initial replication to Samba or if it was unsorted in the AD already. Anyway, my comment is not about the source of this sorting issue. It's about the effect of commit 9c9df40220234cba973e84b4985d90da1334a1d1, which causes failure of samba-tool domain join in some situations as reported in this original bug report.

To wrap up the user story: Then the customer updated to a Samba version containing the git commit that introduced this check and this message "replmd_replicated_apply_add: error during DRS repl ADD: No objectClass found in replPropertyMetaData". Then, due to some other reason the customer wanted to throw away the local sam.ldb and join again, which failed due to this sort order check.


So, the situation is: Samba domains updated from earlier versions might have replPropertyMetadata attributes that do not pass this "new" check. An update to a recent Samba version then breaks replication on these objects, causing samba-tool domain join to fail unexpectedly with a message that claims that objectClass is missing, which is actually not the case.


IMHO the check should be adjusted in two points:

* It should check if objectClass is really missing. This is what it claims. Instead it currently checks if DRSUAPI_ATTID_objectClass is element #0 in the replPropertyMetaDataCtr1 array.

* If objectClass is present but replPropertyMetaData is not sorted properly it should not break replication. It should issue a warning message instead (and maybe simply sort the stuff in a locally self-healing fashion).


Additionally I think samba-tool dbcheck should check and --fix the sort order if that's relevant/critical to the operation.
Comment 9 Andrew Bartlett 2015-07-15 21:03:18 UTC
Thanks for re-raising this, as I actually have a client issue with the same problem, and it helped me understand what is really going on.  

Indeed, a database created with custom schema prior to 61b978872fe86906611f64430b2608f5e7ea7ad8 (or a backported version of that patch) would indeed have custom schema sorted in front of objectclass.

I agree, we need to write a dbcheck rule to find and fix this.
Comment 10 Andrew Bartlett 2015-08-14 03:04:46 UTC
BTW, patches for this are on the mailing list.  I have dbcheck patches to re-sort the replPropertyMetaData, which will fix the old entries in the database before we started storing them sorted correctly
Comment 11 Stefan Metzmacher 2015-09-04 10:47:52 UTC
(In reply to Andrew Bartlett from comment #10)

Fixed in v4-3-test
Comment 12 Narendra Kumar 2015-11-02 12:28:02 UTC
(In reply to Peter Gerritsen from comment #4
Hi Andrew,
 
I am facing same issue can you please proivde me patch link so that i can remove this bug.
Comment 13 Andrew Bartlett 2015-11-16 08:54:59 UTC
No standalone patch is available for this (due to the scope of the change). 

You need to upgrade to Samba 4.3, then run dbcheck to fix the corrupt database.  Then you can join additional Samba 4.3 servers to the domain.