The symptoms of this issue include:
Replication failures with this error showing in the client side logs:
error during DRS repl ADD: No objectClass found in replPropertyMetaData for
Failed to commit objects:
A crash of the server, in particular the rpc_server process with
INTERNAL ERROR: Signal 11
The most common situation for this bug to manifest is that an object needs to be created, then deleted or renamed at any time during the server-side search where is would be replicated out for the first time.
However, any delete or rename may trigger the issue, but the consequences would be less obvious, instead of a clear failure some change to the object would just not be replicated.
Finally, a client reading LDAP at the time a rename or delete is being processed may not be returned the object subject to rename or delete, but would be returned the object if asked again.
The root cause is a lack of read locking in ldb_tdb due to a missing decrement of a reference counter in ldb_tdb. This caused an fcntl() lock not to be held and so the connection between the index and the main DB record not to be enforced.
Additionally, it was noticed that a read lock is required over the entire ldb_search() operation, including the subsequent searches in the module stack. This has required that new lock and unlock operations be added to ldb.
This issue will be fixed in ldb 1.2.0 and Samba 4.7.
*** Bug 12754 has been marked as a duplicate of this bug. ***
There appears to be a regression in failure cases, see https://bugzilla.samba.org/show_bug.cgi?id=12904
Andrew, can this be closed?
Fixed in master with 9063669a05a261657d5b9a60254bd1b9065e6423 for Samba 4.7
I'm running 4.6.7 and I believe that I'm hitting this bug when trying to join a 4.7 or 4.8 DC.
The wiki says "See BUG #12858 for more details and updated advise on database recovery for affected installations." but there are no details on how to fix/workaround this issue here. Please advise.
If you are hitting an issue consistently on a not-moving database then it won't be this issue, as you don't need read locks on a DB that isn't changing.
However you certainly could be seeing the impact of one of many DRS bugs we have had over the years, possibly leaving your existing DB in a not-happy state.
It is probably best to describe your full issue on the samba mailing list and I'll pick it up from there.