Bug 2094 - Keep the last-sync time for better two-way synchronization
Summary: Keep the last-sync time for better two-way synchronization
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 2.6.3
Hardware: All All
: P3 enhancement (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
: 7565 (view as bug list)
Depends on:
Reported: 2004-11-27 12:42 UTC by Matt McCutchen
Modified: 2010-07-14 06:16 UTC (History)
1 user (show)

See Also:

Original design document (7.64 KB, text/plain)
2010-07-13 23:17 UTC, Matt McCutchen
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matt McCutchen 2004-11-27 12:42:58 UTC
Add three options to rsync: `--last-sync-time', `--smart-orphans', and
`--careful-update'.  With these options, rsync could handle full two-way
synchronization of two folders in which files may be created, modified, and
deleted between synchronizations.  The file at:
describes my suggested changes in detail.
Comment 1 Wayne Davison 2005-02-25 08:55:53 UTC
The timestamp scheme works if no updates are made during the transfer time. 
Otherwise it becomes problematical the larger the file set becomes (because we
can't scan all the dirs at the exact same point in time).  Another failure case
is modifying the same file on both sides, but in slightly different ways --
whichever file was modified the most recently will overwrite the other side's
changes.  I really don't think that this set of options will take rsync close
enough to a two-way transfer utility to make it worthwhile.

The utility named "unison" is built to do a two-way synchronization of files,
and hndles all these issues (and uses the rsync algorithm to update changed
files).  I recommend checking it out.
Comment 2 Matt McCutchen 2010-07-12 15:02:10 UTC
*** Bug 7565 has been marked as a duplicate of this bug. ***
Comment 3 baya 2010-07-13 08:26:37 UTC
(In reply to comment #2)
> *** Bug 7565 has been marked as a duplicate of this bug. ***

sorry, I missed this request ((

as concerns unison. At least "the Unix owner and group ids are not propagated" by Unison. So Unison can't be used for master-master replication of the whole slices or directories by root.

As concerns  Wayne Davison comment. Of course, full replication in real time using events between two servers would be the best choice (maybe in the future it will become possible with rsync ;) ).
But when we set the last-sync-time===check-point just before the  synchronization process we do not depend on the "updates made during the transfer time". These new updates will be synchronized the next time.
Modifying the same file on both _master_ sides can occur very rarely and even in this case we mostly need only a new version, so a simple overriding will be enough (of course this should be "a note" for users).

currently just with --check-point and small bash script (I will attach it to bug 7565) I use it for bidirectional synchronization with small disadvantage - the old directories are deleted within 2 or 3 runs (depends on which side its deleted initially).

PS. of course adding comparison of the item with check-point into rsync code directly in the procedure of the file list checking will avoid running rcync twice. It is not so easy for me to write it right now, since it needs changes the direction of transfer during the file list checking.
Comment 4 Matt McCutchen 2010-07-13 23:17:23 UTC
Created attachment 5844 [details]
Original design document

The link in comment #0 is long since broken.  For reference, I'm attaching the design document that was previously there.
Comment 5 baya 2010-07-14 06:16:17 UTC
(In reply to comment #4)
> Created an attachment (id=5844) [details]
> Original design document
> The link in comment #0 is long since broken.  For reference, I'm attaching the
> design document that was previously there.

Thank you, Matt, it is just more than my proposal ))) 
To the purpose: 
1) the main point is that the last-sync-time===check-point must be created BEFORE the synchronization process. So the deletion during the update will be synchronized next time. In other cases we will have "mysteriously coming back" deletions (that occur during update).
2) --smart-orphans option is ambiguous and even dangerous:
 a) last-sync-time is equal check-point + dry-run
 b) last-sync-time + smart-orphans is equal check-point
VERY IMPORTANT NOTE: never increase the check-point value during any dry-runs, never increase check-point value if synchronization is not finished without errors - many new files will be deleted instead of being added. For the details look at bash wrapper attached to bug 7565.

Also see notes at my PS for avoiding running rcync twice. In my situation updating with small changes takes 3 - 5 minutes for 300MB. And this is just a testcase, I'm going to mirror ~10G.

Additional check must be added before moving tmp file to the original position, because the file at the original position can be updated or deleted during the long receiving, so it will be newer than received or not needed already.

By the way, at first run check_point can be 0 (zero), which means that no deletions occur. Just no-exists files will be added to another side.