The Samba-Bugzilla – Bug 12754
DRS replication can miss entries (locking up replication)
Last modified: 2017-06-29 06:44:51 UTC
Due to the way the uptodateness vector is calculated, if more than a 'page' of objects is replicated during a cycle, objects can be missed if earlier pages contain objects updated since the start of the cycle.
This is one (possible) cause of WRITE_FAULT errors during 'make test' as well as real-world replication issues on larger data sets.
A WIP patch by Garming and myself is attached, but it needs tests.
I wrote the patch up with Garming earlier this week based on his analysis of our flapping tests over Easter.
If we use the USN of an object at the time we fetch the full object to calculate the up-to-dateness vector, we risk ignoring objects that should appear later in the replication cycle.
This can happen if objects A B and C have USN:
but during replication of 3 pages of results, B is modified, getting USN 400
Then we send:
This is because the server sets an uptodateness vector of 400 at B, and client sends it back, causing the server to ignore C at 300, even when the USN check (alone) would have sent it.
Instead, only send an uptodatenss vector matching the USN seen at the time the cycle starts, and re-send the object later for the higher USN.
The conclusion of this bug is correct, but the mechanism is not.
The entries are missed if the entries are modified (such as by a rename or delete) during replication.
Patches are under development.
Closing in favour of a new bug that is much clearer.
*** This bug has been marked as a duplicate of bug 12858 ***