=============================================================== ``Keep the last-sync time for better two-way synchronization'': Several proposed changes to rsync By Matt McCutchen =============================================================== My story ======== I would like to synchronize a folder on my computer and a folder on my school's server. Between synchronizations, files may be created, modified or deleted in both places, and I would like any combination of these changes made in one place to be made in the other place. In its present form, rsync can almost do this. I originally used this bash script: # sync rsync -rt --update (remote) (local) rsync -rt --update (local) (remote) This handles creation and modification well, but to get rid of a file, one must manually delete it from both places. I next tried two scripts, one to get and one to put: # sync-get rsync -rt --update --delete (remote) (local) # sync-put rsync -rt --update --delete (local) (remote) This worked, as long as I only modified one place between synchronizations. However, too often I lost changes by accidentally synchronizing the wrong way. Next I tried this kludge, which has been pretty successful so far: # sync rsync -rt --update (remote) (local) find (local) -size 0c -ok rm -R \{\} \; rsync -rt --update --delete (local) (remote) # sync-rm while [ "$1" != "" ] ; do cp (a zero-length file) $1 shift done Running sync-rm on a file ``deletion-marks'' it, which is implemented by truncating it to zero length. This has the advantage that deletion-marking is an ordinary change that competes with modification for ``most recent'' status. However, zero-length files that I want get deleted, and programs that are unaware of ``sync-rm'' will find files they delete mysteriously coming back. How to perform intelligent two-way synchronization? =================================================== The root of the problem is, if rsync sees a file in one place that does not exist in the other place, how does it know whether to copy or delete the file? Knowing the time of the last synchronization makes this possible. Suppose that today is Thursday, rsync last synchronized on Tuesday, and it finds a file `X' in place 1 but not in place 2. * If X has a Wednesday timestamp, then X was created since the last synchronization, so X should be copied to place 2. * If X has a Monday timestamp, then X was synchronized on Tuesday and existed in both places at that time. The copy of X in place 2 must have been deleted since then, so X should be deleted from place 1. Proposed changes to rsync ========================= To facilitate this kind of synchronization, I propose that several options be added to rsync: If given `--last-sync-time[=FILE]', rsync should maintain the time of last synchronization in the last-modified timestamp of `FILE'. `FILE' defaults to `.rsync-time' or something like that. rsync `touch'es `FILE' when synchronization finishes (unless `--dry-run' is given). `FILE' must be given as a path relative to the receiving directory (because the sending directory should never be modified), and rsync does not synchronize it like an ordinary file. I'm not sure what, if anything, should happen to `FILE' if synchronization fails. Both options described below need to know the time of last synchronization and thus require `--last-sync-time'. However, `--last-sync-time' can be used alone if the user wants to track the time of last synchronization for some other reason, perhaps if (s)he is about to start using one of these options. A second option, `--smart-orphans', should cause rsync to consider the time of last synchronization when it finds a file in one place and not in the other. During a `--smart-orphans' run, if the last synchronization occurred at time T, then rsync creates only files newer than T and deletes only files older than T. In particular, this means: * If rsync sees a file X on the sender but not on the receiver, it copies X if X is newer than T. Otherwise it does nothing. * If rsync sees a file X on the receiver but not on the sender, it deletes X if X is older than T and `--delete' was given. Otherwise it does nothing. If the time of last synchronization is kept, rsync can detect a potential problem: a file modified (or created) in both places since the last synchronization. Ordinarily, it copies the sender's version if that one is newer. If given a third option, `--careful-update', rsync should stop and ask the user what to do. `--careful-update' and `--update' are mutually exclusive because they specify two different behaviors on files that exist in both places. This option is useful for some applications of two-way synchronization but not for others; it would be nice, but it is not terribly important. Where are we now? ================= It seems to me that there are three main uses of rsync. They differ in who (`rsync' and/or `users') is allowed to write to each place and which way(s) synchronization is performed. With the options proposed above, uses 2 and 3 can be handled better. 1. Sender written by users ==> receiver written by rsync. Used for backup and mirroring. The basic command is: rsync (extra-options) --delete (sender) (receiver) `-t --update' makes it go much faster but has no effect on the end result. Use 1 is really a special case of use 2, so the command for use 2 also works. 2. Sender written by users ==> receiver written by rsync and users; rsync should not disturb user changes to the receiver. Used (for example) to distribute software updates to many computers on an organization's network without machine-specific customizations being lost. The basic command is: rsync (extra-options) -t --last-sync-time --update --smart-orphans --delete (sender) (receiver) Note that `--smart-orphans' is also meaningful in use 2. In combination with `--update', it makes rsync keep user changes to the receiver rather than bringing the receiver back to the state of the sender. In uses 1 and 2, the same sender can be used with multiple receivers. 3. Full two-way synchronization between two places written by rsync and users. Used to keep one's personal files updated between two computers, like the Windows Briefcase. The basic script is: rsync (extra-options) -t --last-sync-time --update --smart-orphans --delete (place2) (place1) rsync (extra-options) -t --last-sync-time --update --smart-orphans --delete (place1) (place2) Why this ordering of the places? I tend to think of place 1 as local and place 2 as remote, and it makes sense to use the local place for scratchwork. User-friendliness ================= Two suggestions to make rsync more usable to non-wizards: * Create a shell script for each use above so users can decide on a use and don't have to figure out exactly which options they need. * Integrate rsync with Nautilus and/or Konqueror by adding `Synchronize...' to the context menus for folders. Choosing this option opens a dialog box, and the user selects the type of synchronization and the other folder. When the user is ready, rsync is invoked with the appropriate options. (If someone decides to actually do this, I have a Glade design for this dialog box.) * Also add `Synchronize...' to the menu that appears when a folder is dragged with the middle mouse button. When the user selects it, the same dialog box appears, but both folders involved can be filled in automatically. * Even better: Save a folder's synchronization options in a file `.rsync-settings' inside. (And then use `--last-sync-time=.rsync-settings'.) When the user chooses `Synchronize...', the options can be filled in automatically. Tell Nautilus or Konqueror to show a special icon for folders containing a `.rsync-settings' file.