Bug 12858 - Read corruption in the AD DC: Objects renamed can vanish from LDAP and the replication state due to lack of read locks
Read corruption in the AD DC: Objects renamed can vanish from LDAP and the re...
Status: RESOLVED FIXED
Product: Samba 4.1 and newer
Classification: Unclassified
Component: AD: LDB/DSDB/SAMDB
4.6.5
All All
: P5 regression
: 4.7
Assigned To: Andrew Bartlett
Samba QA Contact
:
: 12754 (view as bug list)
Depends on: 12904
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-23 02:39 UTC by Andrew Bartlett
Modified: 2017-08-28 07:06 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Bartlett 2017-06-23 02:39:51 UTC
The symptoms of this issue include:

Replication failures with this error showing in the client side logs:
 error during DRS repl ADD: No objectClass found in replPropertyMetaData for
 Failed to commit objects:
 WERR_GEN_FAILURE/NT_STATUS_INVALID_NETWORK_RESPONSE

A crash of the server, in particular the rpc_server process with
 INTERNAL ERROR: Signal 11

The most common situation for this bug to manifest is that an object needs to be created, then deleted or renamed at any time during the server-side search where is would be replicated out for the first time.

However, any delete or rename may trigger the issue, but the consequences would be less obvious, instead of a clear failure some change to the object would just not be replicated.

Finally, a client reading LDAP at the time a rename or delete is being processed may not be returned the object subject to rename or delete, but would be returned the object if asked again.

The root cause is a lack of read locking in ldb_tdb due to a missing decrement of a reference counter in ldb_tdb.  This caused an fcntl() lock not to be held and so the connection between the index and the main DB record not to be enforced. 

Additionally, it was noticed that a read lock is required over the entire ldb_search() operation, including the subsequent searches in the module stack.  This has required that new lock and unlock operations be added to ldb.

This issue will be fixed in ldb 1.2.0 and Samba 4.7.
Comment 1 Andrew Bartlett 2017-06-23 02:46:01 UTC
*** Bug 12754 has been marked as a duplicate of this bug. ***
Comment 2 Garming Sam 2017-07-13 10:42:46 UTC
There appears to be a regression in failure cases, see https://bugzilla.samba.org/show_bug.cgi?id=12904
Comment 3 Stefan Metzmacher 2017-08-11 08:35:46 UTC
Andrew, can this be closed?
Comment 4 Andrew Bartlett 2017-08-28 07:06:52 UTC
Fixed in master with 9063669a05a261657d5b9a60254bd1b9065e6423 for Samba 4.7