Bug 2294 - Detect renamed files and handle by renaming instead of delete/re-send
Detect renamed files and handle by renaming instead of delete/re-send
Status: ASSIGNED
Product: rsync
Classification: Unclassified
Component: core
2.6.3
All All
: P4 enhancement
: ---
Assigned To: Wayne Davison
Rsync QA Contact
:
: 6996 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2005-02-01 11:56 UTC by Michael Wilson (dead mail address)
Modified: 2016-03-07 02:31 UTC (History)
14 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Wilson (dead mail address) 2005-02-01 11:56:08 UTC
It would be nice if rsync could detect identical files with differing names and
just copy/rename the files instead of sending the data all over again.

As I understand it, rsync creates a single long array listing of the filenames
and associated hashes.  If it's possible to index on the hashes, cross-checked
with file size, this should be fairly straight-forward, requiring no major
redesign to implement.

The enhancement is easily motivated if you think about what happens when rsync
is used to keep two large servers in sync, and a maintainer renames a top-level
directory on the source machine.
Comment 1 BlackB1rd 2005-02-11 01:58:50 UTC
I totally agree this one. With this enhancement there would be no longer
unnecessary traffic when some user has moved / copy'ed a large directory (which
is really annoying).
Comment 2 Wayne Davison 2005-02-12 14:49:37 UTC
This is the basic idea behind fuzzy.diff in the patches dir.  It does not
currently try to find a basis-file match based on size and mtime (just
similarity of names), but I plan to extend it with that functionality when I fix
some of the patch's other minor problems (see the patch for a list of them).
Comment 3 Wayne Davison 2005-02-13 22:24:02 UTC
Note that the --fuzzy patch has made it into the CVS version.  It only looks for
renamed files in the same directory as the file being created, though, so it is
not a full solution to files being moved around in the hierarchy, or directory
names changing (that will require a pre-scan on the receiving side, which is not
currently done unless --delete was specified).

I'll leave this open for now as a suggestion for a more extensive rename detector.
Comment 4 Wayne Davison 2006-02-07 07:25:49 UTC
There is now a patch named detect_renamed.diff in the patches dir that implements the basics of finding renamed files.  This will probably go onto the trunk for the release after 2.6.7.
Comment 5 Bill McGonigle 2006-03-21 14:06:00 UTC
Thanks.  This will be especially useful for log directories where logrotate is incrementing the filename number at each rotation period (httpd.10.gz -> httpd.11.gz).
Comment 6 Boris Folgmann 2007-07-11 09:50:03 UTC
I'm using rsync 2.6.9 to archive rotated log files to another machine, like Bill wrote. I tried 

rsync -avzh --partial --fuzzy src dest

and

rsync -avzh --partial --delete --fuzzy --delete-after src dest

but both calls always copy all renamed/rotated log files. And of course the files are still in the same directory after being rotated! The logs are very large (several gigs) so it takes too long to be a valuable solution.
Is the patch not included in 2.6.9 or did I miss something?
Add-on question: does rsync switch off -z for .gz files in the affected directory? I think that would be a good idea.
Comment 7 Matt McCutchen 2007-10-10 16:09:08 UTC
(In reply to comment #6)
> Is the patch not included in 2.6.9 or did I miss something?

Correct, --detect-renamed still exists as a patch; it is not in the main version of rsync.

> Add-on question: does rsync switch off -z for .gz files in the affected
> directory?

Yes, by default, rsync exempts files with a number of suffixes (including .gz) from -z.  Since rsync 3.0.0, you can customize the list of suffixes with --skip-compress=LIST .
Comment 8 Bill McGonigle 2008-11-30 17:24:40 UTC
(In reply to comment #5)
> Thanks.  This will be especially useful for log directories where logrotate is
> incrementing the filename number at each rotation period (httpd.10.gz ->
> httpd.11.gz).

Since I mentioned this specific use case, I should comment that I recently discovered the 'dateext' option to logrotate which provides a complete workaround in this scenario (which rsync handles perfectly) and might be the better solution for this case in general.

Back on topic, there's still great utility in detecting other rename cases, of course (I often see big .iso's get renamed).  I have to admit to having tried the patch, had trouble with short backups, and backed it out without making a good note of specifics.  What would be generally useful here for reporting problems against the patch?
Comment 9 Shahar Or 2009-03-22 03:00:42 UTC
Dear developers,

I'm interested in this feature so this is a reminder to whoever is involved in this and particularly to Wayne.

Also, I've found the name of the program "Unison" in the context of this issue twice on the mailing list.

Many blessings.
Comment 10 Wayne Davison 2009-12-21 12:35:38 UTC
*** Bug 6996 has been marked as a duplicate of this bug. ***
Comment 11 Philip Ganchev 2010-10-28 00:31:33 UTC
Here are some related discussions about this:

http://www.mail-archive.com/rsync@lists.samba.org/msg20283.html

http://markmail.org/message/kmazkprjvred2r5a

Comment 12 Paul 2011-01-28 20:38:47 UTC
Hi, I was about to enter a similar suggestion to this.  My very frequent use case is moving files from one directory to another.  In that situation the file name does not change--just the directory path leading to it.  These are often quite large files (0.2 to several GB) so avoiding re-copying them would speed things up a lot. 

Thanks

--Paul
Comment 13 Paul 2011-01-28 20:40:23 UTC
x
Comment 15 Michael Monnerie 2011-02-04 02:50:43 UTC
How to apply those 2 detect-renamed* patches? I did
git clone git://git.samba.org/rsync.git
and tried to
patch -p1 <patches/detect-renamed.diff
but that doesn't succeed. Which version would I need to check out to get the patches applied? Sorry, I don't know git.
Comment 16 Benjamin ANDRE 2011-02-04 10:22:34 UTC
you don't need git to get the sources : http://samba.anu.edu.au/ftp/rsync/
and choose "rsync-3.0.7.tar.gz" and "rsync-patches-3.0.7.tar.gz"

Ben
Comment 17 Michael Monnerie 2011-02-04 13:43:23 UTC
Damn, that was too easy ;-) Thanks a lot. I'll test the new detect-renamed* patches now.
Comment 18 Bug Reporter 2012-12-08 10:05:46 UTC
Has this issue been abandoned? It's been a "while"...
Comment 19 Norman Freudenberg 2014-01-04 22:56:00 UTC
Hey as far I found out there are two patches which still note made it into the last official release?
They are still buggy? 
Why didn't it made it to an official release? 
9 Years it quite a long time for a possible solution...
Comment 20 dkl 2014-03-02 03:08:37 UTC
I've been playing with the --detect-renamed patch
https://git.samba.org/?p=rsync-patches.git;a=blob;f=detect-renamed.diff;h=c3e6e846eab437e56e25e2c334e292996ee84345;hb=master

I can't get seem it to work.  Does it rely on other patches?

Anyway, in a simple test, using -vv -a --detect-renamed I can messages about "found renamed", etc, but in a real test, after renaming large directories, there is no speed up.  I can only surmise it's not actually renaming.

I have several applications where this would be a very handy feature to have.  I don't mind using the patch, if could just get it to work...

Btw, I'm on Mac OS 10.9.2.
Comment 21 Petr Pisar 2014-06-02 16:47:06 UTC
There is a bug #8847 in the patchset when partial-dir cannot be created. The fix is described there.
Comment 22 elatllat 2015-01-03 21:23:38 UTC
Wow 10 years.
Maybe one reason this has not been implemented is there are other options. 
For example I have been using a shell script as a wrapper to reduce the iteration of this bug, here is how it works:
1) Create 2 lists of files; destination and source with the files sizes and path
2) For each file that is in the destination but not the source
3) Create a subset of the source list containing file of the same size
4) If the subset > 0 hash the destination file and each file in the source subset until a match is found
5) Ensure the dir exists on the destination and move/rename the file.
6) On some systems hash can be as expensive as re-transferring the file so I added an option to move the file if there was one match (only sometimes hashing), and another to skip if more(never hashing).

Though as I am re-evaluating my backup strategy I am looking into git-annex and other solutions.
https://en.wikipedia.org/wiki/List_of_backup_software#Free_software
Comment 23 dajoker 2016-03-06 22:20:16 UTC
Looking for this capability prior to entering it as an enhancement request myself, I found everything here and basically have the same use case.  My version is that I am creating a regular backup of logs from many servers' services onto a single box, and doing so with rsync.  Some of those services still do the .1, .2, .3 file rotation, which makes for a lot of needless work, especially when these are 100+ MiB files.  It would be great if rsync could detect this to just transfer the new file and rename the old ninety-nine (or however-many).
Comment 24 Karl O. Pinc 2016-03-07 01:37:44 UTC
On Sun, 06 Mar 2016 22:20:16 +0000
samba-bugs@samba.org wrote:

> https://bugzilla.samba.org/show_bug.cgi?id=2294
> 
> --- Comment #23 from dajoker@gmail.com ---
> Looking for this capability prior to entering it as an enhancement
> request myself, I found everything here and basically have the same
> use case.  My version is that I am creating a regular backup of logs
> from many servers' services onto a single box, and doing so with
> rsync.  Some of those services still do the .1, .2, .3 file rotation,
> which makes for a lot of needless work, especially when these are
> 100+ MiB files.  It would be great if rsync could detect this to just
> transfer the new file and rename the old ninety-nine (or
> however-many).

It is not so hard to add the following to your logroate.conf.
Just saying.

# Add a date extension instead of just a number for rsync hardlinked
# backups. 
dateext
dateformat -%Y-%m-%d-%s

Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein
Comment 25 Andrey Gursky 2016-03-07 02:31:03 UTC
On Sun, 06 Mar 2016 22:20:16 +0000
samba-bugs@samba.org wrote:

> https://bugzilla.samba.org/show_bug.cgi?id=2294
> 
> --- Comment #23 from dajoker@gmail.com ---
> Looking for this capability prior to entering it as an enhancement request
> myself, I found everything here and basically have the same use case.  My
> version is that I am creating a regular backup of logs from many servers'
> services onto a single box, and doing so with rsync.  Some of those services
> still do the .1, .2, .3 file rotation, which makes for a lot of needless work,
> especially when these are 100+ MiB files.  It would be great if rsync could
> detect this to just transfer the new file and rename the old ninety-nine (or
> however-many).

Maybe unison could handle such renames better?

Regards,
Andrey