Bug 13816 - dbcheck in the middle of the tombstone garbage collection causes replication failures
Summary: dbcheck in the middle of the tombstone garbage collection causes replication ...
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: AD: LDB/DSDB/SAMDB (show other bugs)
Version: 4.10.0rc2
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-03-01 14:22 UTC by Stefan Metzmacher
Modified: 2019-04-03 10:26 UTC (History)
3 users (show)

See Also:


Attachments
Testing patches for master (15.01 KB, patch)
2019-03-01 15:53 UTC, Stefan Metzmacher
no flags Details
Updated patches for master (15.32 KB, patch)
2019-03-07 14:03 UTC, Stefan Metzmacher
no flags Details
Patch for v4-10-test (97.23 KB, patch)
2019-03-19 11:00 UTC, Stefan Metzmacher
no flags Details
Patches for v4-10-test (100.70 KB, patch)
2019-03-27 08:56 UTC, Stefan Metzmacher
abartlet: review+
Details
Patches for v4-9-test (89.07 KB, patch)
2019-03-27 08:56 UTC, Stefan Metzmacher
abartlet: review+
Details
Patches for v4-8-test (89.13 KB, patch)
2019-03-27 08:57 UTC, Stefan Metzmacher
abartlet: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Metzmacher 2019-03-01 14:22:20 UTC
When the (deleted) parent of a deleted object
(with the DISALLOW_MOVE_ON_DELETE bit in systemFlags),
is removed before the object itself, dbcheck moved
it in the LostAndFound[Config] subtree of the partition
as an originating change. That means that the object
will be in tombstone state again for 180 days on the local
DC. And other DCs fail to replicate the object as
it's already removed completely there and the replication
only gives the name and lastKnownParent attributes, because
all other attributes should already be known to the other DC.

Typically this race is unlikely to happen, but it can happen
if samba is stopped/restarted by a cronjob and dbcheck also
runs via a cronjob in fix mode at the same time.

The result is a message in the destination DSA that
a replicated object doesn't have an objectClass attribute.
Comment 1 Stefan Metzmacher 2019-03-01 15:53:01 UTC
Created attachment 14887 [details]
Testing patches for master
Comment 2 Stefan Metzmacher 2019-03-01 17:56:49 UTC
The message in the log is:

No objectClass found in replPropertyMetaData
Comment 3 Stefan Metzmacher 2019-03-07 14:03:58 UTC
Created attachment 14909 [details]
Updated patches for master

The change compared to the first patchset is that we now
don't treat the rdn attribute (cn in most cases) as unexpected.
Comment 4 Stefan Metzmacher 2019-03-19 11:00:45 UTC
Created attachment 14946 [details]
Patch for v4-10-test
Comment 5 Stefan Metzmacher 2019-03-19 15:16:17 UTC
Comment on attachment 14946 [details]
Patch for v4-10-test

First we need to merge https://gitlab.com/samba-team/samba/merge_requests/311
and include the patches for backports
Comment 6 Stefan Metzmacher 2019-03-27 08:56:09 UTC
Created attachment 15005 [details]
Patches for v4-10-test
Comment 7 Stefan Metzmacher 2019-03-27 08:56:40 UTC
Created attachment 15006 [details]
Patches for v4-9-test
Comment 8 Stefan Metzmacher 2019-03-27 08:57:09 UTC
Created attachment 15007 [details]
Patches for v4-8-test
Comment 9 Karolin Seeger 2019-03-28 08:17:08 UTC
Pushed to autobuild-v4-{10,9,8}-test.
Comment 10 Karolin Seeger 2019-04-02 07:52:12 UTC
(In reply to Karolin Seeger from comment #9)
Pushed to v4-8-test and v4-9-test, pushed again to autobuild-v4-10-test.
Comment 11 Karolin Seeger 2019-04-03 10:26:30 UTC
(In reply to Karolin Seeger from comment #10)
Pushed to v4-10-test.
Closing out bug report.

Thanks!