Bug 6746 - With -H --i-r, --link-dest should replace previously transferred hard links
With -H --i-r, --link-dest should replace previously transferred hard links
Status: NEW
Product: rsync
Classification: Unclassified
Component: core
3.0.6
x86 Linux
: P3 normal
: ---
Assigned To: Wayne Davison
Rsync QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-09-20 04:30 UTC by Dieter Ferdinand
Modified: 2011-04-04 17:24 UTC (History)
2 users (show)

See Also:


Attachments
Test script (385 bytes, application/x-shellscript)
2011-04-01 21:01 UTC, Matt McCutchen
no flags Details
adjusted test script for bug 6746 (401 bytes, text/plain)
2011-04-04 14:38 UTC, ED Fochler
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dieter Ferdinand 2009-09-20 04:30:15 UTC
hello,
i have a problem with the --link-dest-option.

i make backup from some systems with this option for every day in the month.

some of the files are copied intead to make hard-links to the identical source file from one of the given link-destination dir.

after the last update of rsync, it seems to get better. but by the check for files, which are copied and not linked, i find many of this files and i dont know, why rsync make this.

the source systems are windows with cygwin and linux with different systems (suse sles 10, self compiled system).

i use a script for the backup and this options for all backups:
rsync -l -o -g -H -r -t -D -p -v -P --force --timeout=$RSYNC_TIMEOUT --link-dest=dir source destination

the option --link-dest ist used more the one time for some backups.

here one sample of one file:
----rwx---    1 root     root       377608 Thu Jun 25 10:25:51 2009 17/d/daten/Programme/StarMoney Business 4.0 S-Edition/app/sfktools.dll
----rwx---    1 root     root       377608 Thu Sep 17 04:44:10 2009 17/d/daten/Programme/StarMoney Business 4.0 S-Edition/app/sfktools.dll
----rwx---    1 root     root       377608 Thu Sep 17 04:44:10 2009 17/d/daten/Programme/StarMoney Business 4.0 S-Edition/app/sfktools.dll
----rwx---    1 root     root       377608 Thu Jun 25 10:25:51 2009 18/d/daten/Programme/StarMoney Business 4.0 S-Edition/app/sfktools.dll
----rwx---    1 root     root       377608 Fri Sep 18 04:27:05 2009 18/d/daten/Programme/StarMoney Business 4.0 S-Edition/app/sfktools.dll
----rwx---    1 root     root       377608 Fri Sep 18 04:27:05 2009 18/d/daten/Programme/StarMoney Business 4.0 S-Edition/app/sfktools.dll
----rwx---    1 root     root       377608 Thu Jun 25 10:25:51 2009 19/d/daten/Programme/StarMoney Business 4.0 S-Edition/app/sfktools.dll
----rwx---    1 root     root       377608 Sat Sep 19 04:11:58 2009 19/d/daten/Programme/StarMoney Business 4.0 S-Edition/app/sfktools.dll
----rwx---    1 root     root       377608 Sat Sep 19 04:11:58 2009 19/d/daten/Programme/StarMoney Business 4.0 S-Edition/app/sfktools.dll

i used this options for ls to get the list:
    ls -dlt --full-time  "$F"
    ls -dltc --full-time  "$F"
    ls -dltu --full-time  "$F"

this is always the same unchanged file for the backups from three day this month.

i try something, to find out, why this file ist not linked, but i can't find the reasen!

for some files, i find a reasen for this. this files have a creation or changed time in the future, but this is not by all files. i correct this time or delete ist, if no more needed, and it works again.

i search for an option, to link all files from link-dest-dir which are identical to save space on harddisk, even the times or rights of the file is changed. rsync always make new copies, if rights, owner or time is changed. for backups, i only need the last setting, so the file can be linked and the attributes can changed to the new settings.

goodby
Comment 1 Theo Band 2010-03-11 03:14:27 UTC
I have a similar problem. The content of some directories will not get hard linked but copied instead. The command I use is:

rsync -a \
    --link-dest first_dir \
    --link-dest second_dir \
    /export/home/ \
    --exclude '.mozilla/firefox/*/Cache' \
    --exclude '.mozilla/firefox/*/urlclassifier3.sqlite' \
    --exclude '.Trash/' \
    target_machine:/some/backup/dir/

The directory that I found to be copied instead of linked has this full path name:

/home/pave/rood/atp/Conversion Patterns - atp/

This directory also contains two spaces and a dash (-). Could this be a coincident?

rsync --version
rsync  version 3.0.7  protocol version 30


Comment 2 Dieter Ferdinand 2010-03-11 09:55:54 UTC
hello,
this is an other problem like my problem.

on my system, rsync should delete an existing file and link it to an identical file.
but this don't work, so i delete the complete target-directory und rsync make the links.

i think, you can have a directory or files with dates in future. in this case rsync always copy the files because the files have always the actual date/time for rsync.

goodby
Comment 3 Theo Band 2010-03-11 13:18:19 UTC
(In reply to comment #2)
> hello,
> this is an other problem like my problem.
> 
> on my system, rsync should delete an existing file and link it to an identical
> file.
> but this don't work, so i delete the complete target-directory und rsync make
> the links.
That't exactly what I do. Three times a day a new dir is created based on the source and hardlinked to the previous backup dirs (I use three backup dirs)
> 
> i think, you can have a directory or files with dates in future. in this case
> rsync always copy the files because the files have always the actual date/time
> for rsync.

No that is not the case.

I debugged somewhat further. The same folder+files is linked correctly if I only do a short rsync test for this user's homedir (/export/home/pave) . So it's not the filename or dir name. I have found other directories with old files. Some of them are hard linked, other are not. That included files without spaces or dashes in the path.
I noticed the problem because my backup space decreased much harder than I expected.
Comment 4 Todd Lewis 2011-02-14 09:54:46 UTC
Perhaps related: I'm using rsync 3.0.7 between two Fedora 13 boxes. Every day I backup a tree of just under 10000 files and directories. I keep 30 days worth of backups in directories with names like image.2011-02-11_0520, image.2011-02-12_0520, image.2011-02-13_0520... I found some time ago by trial and error that --link-dest=<some_prior_backup> would only work if specified 20 or fewer times.  Recently I noticed increased disk usage on the backup machine, and discovered that my backups no longer had hard links. It was as if --link-dest was not specified at all. Further trial and error showed that where 20 --link-dest options would work before, now only 12 would. I have cut that down to 5, and it seems to be working, but why it would work with 20 before, 12 now, and fail silently in any case seems... strange. 
Comment 5 ED Fochler 2011-04-01 17:28:19 UTC
--protocol=29 solves this behavior for me.  It seems the short circuit behavior of rsync v3 doesn't check/remember inodes of files in the transfer request but not currently being transferred.

My setup:
cd /home ; rsync -Hax --link-dest=.. ed edcopy ;

My usage:
cd /backup ;
rsync -PHOhavix --protocol=29 /home wabhome.append/ 

with protocol=29 hardlinks everything perfectly, after I watch the file list count up.  Without, it copies.
Comment 6 ED Fochler 2011-04-01 17:46:07 UTC
curiously, the --dry-run indicates correct hardlinking behavior, but the run makes files.

rsync -PHOhavixn /home wabhome.append/ 

using rsync v3.0.7 on MacOSX and OpenBSD.
Comment 7 ED Fochler 2011-04-01 18:54:05 UTC
(In reply to comment #6)
I apologize for talking to myself on-list, but I've become enlightened about my problem.

Alphabetical order matters.  If A has already been copied and B is a new hardlink, then all will work as expected.

if B has already been copied, and A is the new hardlink, rsync v3 doesn't know it's an inode match until later in the transfer.

In my case a new file A was copied over and then B was hardlinked to it.  For all files in the current transfer, the linking worked, but the transfer was not necessary as the file already existed in the destination, just later in transfer.

I would say that using -P for snapshots implies --protocol=29 for the current time.


I don't know if my problem is related to the link-dest bug that Dieter originally described, but is the same as Theo's problem.
Comment 8 Matt McCutchen 2011-04-01 21:01:56 UTC
Created attachment 6364 [details]
Test script

(In reply to comment #7)
> Alphabetical order matters.  If A has already been copied and B is a new
> hardlink, then all will work as expected.
> 
> if B has already been copied, and A is the new hardlink, rsync v3 doesn't know
> it's an inode match until later in the transfer.

Ah... That behavior is by design (see the last paragraph of the description of --hard-links in the man page), but it's pretty annoying that it can waste space with --link-dest.  See the attached test script.

The fix would be to check every file entry against the --link-dest dir even if a previous hard link has already been transferred, and if there is a match, go back and replace all the previous hard links with links to the --link-dest file.  This would require rsync to store all the previous hard link paths instead of just the last one.
Comment 9 Matt McCutchen 2011-04-01 21:11:39 UTC
I'm morphing this bug report to cover the issue described in comment #8.  Anyone who is seeing a different issue, please file a new bug report with steps to reproduce.

(In reply to comment #0)
> i search for an option, to link all files from link-dest-dir which are
> identical to save space on harddisk, even the times or rights of the file is
> changed. rsync always make new copies, if rights, owner or time is changed. for
> backups, i only need the last setting, so the file can be linked and the
> attributes can changed to the new settings.

See bug 4793.

(In reply to comment #2)
> on my system, rsync should delete an existing file and link it to an identical
> file.
> but this don't work, so i delete the complete target-directory und rsync make
> the links.

See bug 5644.
Comment 10 ED Fochler 2011-04-04 14:38:55 UTC
Created attachment 6372 [details]
adjusted test script for bug 6746

Matt, nice test script.  That does accurately re-create the problem I see.  I adjusted the ls command at the end to show inode numbers of the 3 relevant files that should all be hardlinked. Just checking 2 does not adequately test for the bug.
Comment 11 ED Fochler 2011-04-04 14:45:06 UTC
I actually meant "adjusted test script for bug 6746"

I hate monday.
Comment 12 Matt McCutchen 2011-04-04 17:24:55 UTC
Comment on attachment 6372 [details]
adjusted test script for bug 6746

Sure... I was looking at the link counts (2 or 3).

The wrong bug number was my fault from the first attachment.  Corrected.