The --fuzzy argument does an incredible job at syncing large files when it chooses the correct fuzzy basis.
However, the default "fuzzy-basis-destination-file-selection algorithm" is not correct for every situation, so I propose the ability to pass an argument to the fuzzy parameter that specifies which "fuzzy-basis-destination-file-selection algorithm" to use.
I've posted a question detailing my needs here:
In short, some of the files in my source-folder are 200GB in size. When rsync chooses the correct existing-destination-file for its "fuzzy basis", my synchronization (of these files) seems magical in term of the data that gets transferred over the wire.
However, when it chooses the wrong existing-destination-file as the source file's fuzzy basis, the data transfer can take days.
Look at the filenames in both my source-folder an destination-folder (below):
# Source Folder's new files (from today's on-site backup):
# Destination-Folder's old files (from yesterday's off-site backup):
In my case, the fuzzy-basis-selection-algorithm needs to select the existing destination-file that:
1) Has the same file extension as the source file
2) Begins with the most consecutively identical characters as the source file
The default algorithm does not meet these requirements.
Therefore, I propose the ability to pass an argument that allows the user to specify non-default fuzzy basis selection algorithms.
There should probably be a few common, baked-in ones (as time goes on) that you can choose from by name and it would be even more flexible if rsync also permitted the user the ability pass a file into the command that specifies a custom "fuzzy-basis-destination-file-selection algorithm".
Naturally, if these features are granted, the documentation would also need to be update to give guidance on specifying these things.
If these things are already implemented, and I have somehow overlooked them, would you kindly post an answer to my question here?:
Just a quick thought on a workaround...
It would be trivial to figure out the new name and best old file in a script. So, you could hard link the best old file to the new file name. Then rsync wouldn't even need --fuzzy to find it.
Thanks. Yeah, that's probably what I'll do. I may even write the script where it does some tasks parallel (running multiple rsync commands at the same time).
The current default "fuzzy-basis-destination-file-selection algorithm" selects the correct file most of the time. Maybe the reason it didn't today is because it is the first day of a new month and that made the file names be too different. I'm not sure.
The --fuzzy argument is really awesome and it is just a hair away from being exactly what I need for handling things with one command at the folder-level. If I could only modify the file-selection algorithm, it would be perfect.
Until then, I just have to write a script instead of being able to handle this within the command.