The --fuzzy argument does an incredible job at syncing large files when it chooses the correct fuzzy basis. However, the default "fuzzy-basis-destination-file-selection algorithm" is not correct for every situation, so I propose the ability to pass an argument to the fuzzy parameter that specifies which "fuzzy-basis-destination-file-selection algorithm" to use. I've posted a question detailing my needs here: https://unix.stackexchange.com/questions/538548/ In short, some of the files in my source-folder are 200GB in size. When rsync chooses the correct existing-destination-file for its "fuzzy basis", my synchronization (of these files) seems magical in term of the data that gets transferred over the wire. However, when it chooses the wrong existing-destination-file as the source file's fuzzy basis, the data transfer can take days. Look at the filenames in both my source-folder an destination-folder (below): # Source Folder's new files (from today's on-site backup): file100-2019_09-01_12am.log file100-2019_09-01_12am.lzo file101-2019_09-01_12am.log file101-2019_09-01_12am.lzo file102-2019_09-01_12am.log file102-2019_09-01_12am.lzo # Destination-Folder's old files (from yesterday's off-site backup): file100-2019_08-31_12am.log file100-2019_08-31_12am.lzo file101-2019_08-31_12am.log file101-2019_08-31_12am.lzo file102-2019_08-31_12am.log file102-2019_08-31_12am.lzo In my case, the fuzzy-basis-selection-algorithm needs to select the existing destination-file that: 1) Has the same file extension as the source file 2) Begins with the most consecutively identical characters as the source file The default algorithm does not meet these requirements. Therefore, I propose the ability to pass an argument that allows the user to specify non-default fuzzy basis selection algorithms. There should probably be a few common, baked-in ones (as time goes on) that you can choose from by name and it would be even more flexible if rsync also permitted the user the ability pass a file into the command that specifies a custom "fuzzy-basis-destination-file-selection algorithm". Naturally, if these features are granted, the documentation would also need to be update to give guidance on specifying these things. If these things are already implemented, and I have somehow overlooked them, would you kindly post an answer to my question here?: https://unix.stackexchange.com/questions/538548/
Just a quick thought on a workaround... It would be trivial to figure out the new name and best old file in a script. So, you could hard link the best old file to the new file name. Then rsync wouldn't even need --fuzzy to find it.
Thanks. Yeah, that's probably what I'll do. I may even write the script where it does some tasks parallel (running multiple rsync commands at the same time). The current default "fuzzy-basis-destination-file-selection algorithm" selects the correct file most of the time. Maybe the reason it didn't today is because it is the first day of a new month and that made the file names be too different. I'm not sure. The --fuzzy argument is really awesome and it is just a hair away from being exactly what I need for handling things with one command at the folder-level. If I could only modify the file-selection algorithm, it would be perfect. Until then, I just have to write a script instead of being able to handle this within the command.