Bug 5954 - Use entire destination as basis data for every transfer
Summary: Use entire destination as basis data for every transfer
Status: RESOLVED WONTFIX
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.1.0
Hardware: Other Linux
: P3 enhancement (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-12-08 15:17 UTC by Jerome Haltom
Modified: 2008-12-23 14:34 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jerome Haltom 2008-12-08 15:17:29 UTC
I'd like rsync to be able to compare all files on both sides to each other. All of them.

Use case:  I have a 70GB music directory. I sync it between home and work. At home I run a retagger which tags and renames pretty much every file and directory. So they've all moved, they're all named differently, AND their contents have changed. This basically would result in 70GB of transfers.

If rsync were to calculate block hashes for EVERY FILE on BOTH SIDES, and then use those hashes as the sources of new files, this would not take the 2 weeks it would take otherwise. It might take an hour to calculate hashes. But that'd be fine.

So maybe --very-fuzzy, --really-fuzzy, or even --fuzzy --fuzzy --fuzzy.
Comment 1 Matt McCutchen 2008-12-08 16:40:11 UTC
(In reply to comment #0)
> If rsync were to calculate block hashes for EVERY FILE on BOTH SIDES, and then
> use those hashes as the sources of new files, this would not take the 2 weeks
> it would take otherwise. It might take an hour to calculate hashes. But that'd
> be fine.

--fuzzy chooses one destination file whose name looks the most similar to the source name and uses that as a basis, so it fits nicely into rsync's existing workflow.  Your suggestion of using the entire destination as a basis for every transfer is fundamentally different, so I wouldn't call it --very-fuzzy.  Implementing that in rsync would be a pain.  A technique that accomplishes essentially the same thing is to tar up the source and destination and then delta-transfer the tar file, assuming you have enough disk space.
Comment 2 Matt McCutchen 2008-12-23 12:48:07 UTC
I don't see an argument for implementing the proposal in rsync.
Comment 3 Jerome Haltom 2008-12-23 14:09:12 UTC
I suppose if you want to ignore my argument, then yes, there is no argument. But your response to my argument focused on the technical merits more than the practical. It would be one thing to say "I am not doing this." It's quite another to say "there is no argument at all."

If you want people to use rsync on common directory workloads, then such a proposal is not far fetched. If you want people to have to disable rsync on their directories and go out of band when making certain types of file changes then it is. That's your call though.

Would appreciate something more to the point though "rsync just shouldn't handle this" would have been a nice response.
Comment 4 Matt McCutchen 2008-12-23 14:32:44 UTC
I'm sorry, I didn't mean to be rude.

It's not unusual that someone proposes a new option and I suggest an alternative approach; the argument they then have to make is that the implementation in rsync would be superior enough to the alternative to justify the added complexity.  You haven't made such an argument, aside from "it would be convenient if rsync handled this so I don't have to use a separate tool for that part of the job", which applies to every feature request.

> If you want people to use rsync on common directory workloads, then such a
> proposal is not far fetched.

I don't recall anyone else raising the case of renames at the same time as small data changes.  If you have evidence that this is common, please present it.  Otherwise, this is just one of several use cases that rsync could optimize but doesn't; for example, see bug 5482.