Run the following in an empty directory: mkdir src dest linkdest touch src/f1 rsync -a src/f1 src/f2 touch linkdest/f1 ln linkdest/f1 linkdest/f2 rsync -Ha src/ dest/ --link-dest=../linkdest/ The source files src/f1 and src/f2 are both identical to the single link-dest file that has the names linkdest/f1 and linkdest/f2. Rsync links linkdest/f1 instead of copying src/f1 and links linkdest/f2 instead of copying src/f2. Now dest/f1 and dest/f2 refer to the same file while src/f1 and src/f2 refer to different files. I believe that, when -H is specified, two destination dentries should refer to the same file if and only if the corresponding source dentries do, even though there may be hard links outside the transfer to both source files and (because of --link-dest destination files. Thus, rsync should guard against linking to the same --link-dest file several times. When rsync links to a --link-dest file, it should check whether a file with the same device and inode numbers has already been used; if so, rsync should copy the file into the destination instead of linking it. When -H is not specified, I reason that the user doesn't care about hard links and this check is unnecessary. The remark at the end of comment 1 of bug 3692 led me to discover this bug. (Let's see if Bugzilla correctly hyperlinks that reference.)
> ... > The source files src/f1 and src/f2 are both identical to the single > link-dest file that has the names linkdest/f1 and linkdest/f2. the contents are the same but some of the meta information is not. > Rsync links linkdest/f1 instead of copying src/f1 and links linkdest/f2 > instead of copying src/f2. Now dest/f1 and dest/f2 refer to the same > file while src/f1 and src/f2 refer to different files. can you post the output inline with commands as i'm not seeing this here... [] ls -aliT . [] <nothing> [] sleep 2 ; mkdir src [] sleep 2 ; mkdir dest [] sleep 2 ; mkdir linkdest [] sleep 2 ; touch src/f1 [] sleep 2 ; rsync -a src/f1 src/f2 [] sleep 2 ; touch linkdest/f1 [] sleep 2 ; ln linkdest/f1 linkdest/f2 [] ls -aliT * dest: <nothing> linkdest: 238252 -rw-r--r-- 2 moo moo 0 Apr 17 14:12:30 2006 f1 238252 -rw-r--r-- 2 moo moo 0 Apr 17 14:12:30 2006 f2 src: 238250 -rw-r--r-- 1 moo moo 0 Apr 17 14:12:26 2006 f1 238251 -rw-r--r-- 1 moo moo 0 Apr 17 14:12:26 2006 f2 # i modified the erroneous[?] link-dest usage of both a relative and a # nonexistant path. in the past i had problems with relative. should # rsync complain about nonexistance with -v? [] sleep 2 ; rsync -Hav src/ dest/ --link-dest=`pwd`/linkdest/ ./ f1 f2 [] ls -aliT * dest: 238287 -rw-r--r-- 1 moo moo 0 Apr 17 14:12:26 2006 f1 238288 -rw-r--r-- 1 moo moo 0 Apr 17 14:12:26 2006 f2 linkdest: 238252 -rw-r--r-- 2 moo moo 0 Apr 17 14:12:30 2006 f1 238252 -rw-r--r-- 2 moo moo 0 Apr 17 14:12:30 2006 f2 src: 238250 -rw-r--r-- 1 moo moo 0 Apr 17 14:12:26 2006 f1 238251 -rw-r--r-- 1 moo moo 0 Apr 17 14:12:26 2006 f2 > I believe that, when -H is specified, two destination dentries should refer > to the same file if and only if the corresponding source dentries do true, but not the same inode number across src and dest. only within src. and only within dest. 'same file' means 'same inode number'. there are cross host and cross filesystem aspects as well. besides, src and dest must be different inode sets, otherwise munging a source file would also munge your backup ;-] in your example, at least as reproduced above, src/{f1,f2} are not hardlinks. so rsync has no obligation to, and indeed should not, make dest/{f1,f2} hardlinks. with or without -H. if src/{f1,f2} were hardlinks, with -H would preserve that relationship in dest. without -H would just make copies. remember, a --link-dest directory is only used as a reference to save space/time in the destination. the dest must always mirror the src regardless of what's in any chosen --link-dest dir. in your example, linkdest/{f1,f2} could be thought of as a previous mirror of the src that _did_ have src/{f1,f2} hardlinked, before something came along and broke them up in the src. thus now the new dest has them correctly separated. > <remainder of report> not sure how to read that. though i think the current behaviour is correct. as an aside, any change to: name, perm, uid, gid, size, contents, mtime, existance, symlink '-> source' or hardlink relationship will/should cause the -Ha --link-dest --delete version to be ignored and a new copy to be made in dest. there is both mtime and hardlink change in the example.
(In reply to comment #1) > the contents are the same but some of the meta information is not. Oops: the mtimes were different. Now I'm not sure how I tickled the bug the first time. Corrected script: mkdir src dest linkdest touch src/f1 rsync -a src/f1 src/f2 rsync -a src/f1 linkdest/f1 ln linkdest/f1 linkdest/f2 rsync -Ha src/ dest/ --link-dest=../linkdest/ > > Rsync links linkdest/f1 instead of copying src/f1 and links linkdest/f2 > > instead of copying src/f2. Now dest/f1 and dest/f2 refer to the same > > file while src/f1 and src/f2 refer to different files. > > can you post the output inline with commands as i'm not seeing this here... Please try again with the corrected script. On my computer, "find . -ls" after the corrected script has finished produces the following output (some spaces removed to make it narrower): 1908 0 drwx------ 5 matt matt 120 Apr 17 16:11 . 482092 0 drwx------ 2 matt matt 96 Apr 17 16:11 ./src 482105 0 -rw------- 1 matt matt 0 Apr 17 16:11 ./src/f1 482106 0 -rw------- 1 matt matt 0 Apr 17 16:11 ./src/f2 482098 0 drwx------ 2 matt matt 96 Apr 17 16:11 ./dest 482107 0 -rw------- 4 matt matt 0 Apr 17 16:11 ./dest/f1 482107 0 -rw------- 4 matt matt 0 Apr 17 16:11 ./dest/f2 482101 0 drwx------ 2 matt matt 96 Apr 17 16:11 ./linkdest 482107 0 -rw------- 4 matt matt 0 Apr 17 16:11 ./linkdest/f1 482107 0 -rw------- 4 matt matt 0 Apr 17 16:11 ./linkdest/f2 > # i modified the erroneous[?] link-dest usage of both a relative and a > # nonexistant path. in the past i had problems with relative. should > # rsync complain about nonexistance with -v? No, my --link-dest usage is correct. The description of --link-dest=DIR says, "If DIR is a relative path, it is relative to the destination directory." Please try my corrected script and see if the rest of your remarks still apply.
> "If DIR is a relative path, it is relative to the destination directory." apologies indeed, missed that in the man page. > Please try my corrected script and see if the rest of your remarks still > apply. yep, that that looks like a bug now. seems that dest/{f1,f2} should each get their own unique inums as they are unique in the src and linkdest is just a stale image of src. i used the cvs HEAD to test. maybe rsync is seeing that src/{f1,f2} still have everything _but_ the inode relationship in linkdest the same, assumes that's enough and links the dest versions back to linkdest. -H would imply to check that too. > ... still not sure of the description of the proposed solution. seems that some rather crazy hardlink counts would be out there in the wild but that as long as the structures that hold the pictures for src, linkdest and dest are the same, except for the inums between them and old non-matching gunk in linkdest, it'd be cool. but hey, you're probably right, some of us people just have brain drain from filing taxes ;-] part 2... and running a plain one does not fix them once broken either, yikes. figured until fixed i could just run this over top of it and be done. # /tmp/rsync -Haxv --delete ./src/ ./dest/ # find src linkdest dest -ls 550518 4 drwxr-xr-x 2 root wheel 512 Apr 17 21:02 src 550488 0 -rw-r--r-- 1 root wheel 0 Apr 17 21:02 src/f1 550512 0 -rw-r--r-- 1 root wheel 0 Apr 17 21:02 src/f2 550482 4 drwxr-xr-x 2 root wheel 512 Apr 17 21:02 linkdest 550264 0 -rw-r--r-- 4 root wheel 0 Apr 17 21:02 linkdest/f1 550264 0 -rw-r--r-- 4 root wheel 0 Apr 17 21:02 linkdest/f2 569089 4 drwxr-xr-x 2 root wheel 512 Apr 17 21:02 dest 550264 0 -rw-r--r-- 4 root wheel 0 Apr 17 21:02 dest/f1 550264 0 -rw-r--r-- 4 root wheel 0 Apr 17 21:02 dest/f2 if you blow away dest and do a plain -Ha copy it works as expected. i'd rather have an accurate copy over free cpu/ram if that matters. doesn't look to be new as it's present in: rsync267 rsync20050802 rsync264pre2 rsync263 cheers all.
Rsync has other problems with outdated hard-links not being broken. For instance: echo data >foo ln foo bar rsync -aH foo bar dest/ rm bar cp -p foo bar rsync -aH foo bar dest/ That sequence will not break the hard-link that exists in the destination files. However, if either of the iles had been touched, the second rsync would have broken the link when updating the file (assuming that --inplace wasn't used). The bug you cited with --link-dest springs from the same roots as this. It would require the in-memory hashing of the inode of every hard-linked file on the recieving side for rsync to be able to break links that were no longer present, and that would be quite a lot of extra memory when using --link-dest to a large hierarchy of mostly unchanged files. I don't see this being fixed soon, but I should take a look at it after I work on reducing rsync's memory requirements.
Wayne, if you consider the breaking of outdated hard links not to be part of the expected behavior of -H, please add a clarification to this effect to the man page.
Created attachment 3129 [details] Suggested clarification
There's another aspect to this problem: a file's attributes can be tweaked unexpectedly through an outdated hard link. Example: $ echo data >foo $ chmod 600 foo $ ln foo bar $ rsync -aHi foo bar dest/ >f+++++++++ foo hf+++++++++ bar => foo $ rm bar $ cp -a foo bar $ chmod 644 bar $ rsync -aHi foo bar dest/ .f...p..... bar .f...p..... foo $ rsync -aHi foo bar dest/ .f...p..... bar .f...p..... foo $ rsync -aHi foo bar dest/ .f...p..... bar .f...p..... foo --no-tweak-hlinked would fix this.
Wayne, I'd like to see the wording in the man page amplified to what I originally proposed in comment #6 to ensure that there is no confusion about the role of -H in a backup process. This just came up on the rsnapshot list: http://sourceforge.net/mailarchive/forum.php?thread_name=1258603850.25245.6.camel%40mattlaptop2.local&forum_name=rsnapshot-discuss
Created attachment 4964 [details] Updated man page patch This patch revises the --hard-links description to describe both cases (nonempty destination and --link-dest), intentionally leaving open the possibility of more (I had --detect-renamed-lax without --delete in mind but didn't want to mention it on the trunk). I also took the opportunity to revise the --inplace description. For posterity, here is the tug-of-war case I mentioned: $ mkdir src dest $ touch dest/1 $ ln dest/1 dest/2 $ echo foo >src/1 $ echo blort >src/2 $ rsync -rt --inplace -i src/ dest/ .d..t...... ./ >f.st...... 1 >f.st...... 2 $ rsync -rt --inplace -i src/ dest/ >f.st...... 1 $ rsync -rt --inplace -i src/ dest/ >f.st...... 2 $ rsync -rt --inplace -i src/ dest/ >f.st...... 1 $ rsync -rt --inplace -i src/ dest/ >f.st...... 2
This bug is still living in 3.0.9. Check https://lists.samba.org/archive/rsync/2012-August/027799.html
Note -- This is shouldn't be qualified as an enhancement, as if the -H option is used, it is supposed to duplicate the hard link structure on the source. Not doing so is a bug. Just like cp in core had (may still have) a bug in coreutils when copying -- and it's related to this exact same thing... copying from a source preserving but ignoring file OS (windows) to an OS where case is different (but they were hard linked to each other). cp wanted to copy Afile => a dir had 'afile' & 'Afile'. It thought it needed to remove [aA]file, and copy over only Afile, as it ws an updated version (while 'afile' still existed on source as a separate older file).... hmmm....this sounds like a similar case. Unfortunately, they passed it off as a cygwin only bug -- but they coudln't tell the difference between lower and upper case versions of the same file when they were linked -- even though they can showup in a dir listing as separate. Dunnow if it got fixed or not -- I stopped using cp -u where there was danger of hard links...AFAIK, they never fixed it because it was tossed off to the cygwin group who promptly forgot about it.
hi folks, I've run into this problem in a couple of cases that I think haven't been mentioned so far: - Every month I rsync my boot disk to an external disk and then take a ZFS copy-on-write snapshot of the external disk. This means I want to use --inplace --no-whole-file to avoid writing (therefore copying) more than necessary, but in combination with --hard-links this can result in incorrect file content (as others have described) if I initially hard link some files but then later unlink and then modify them. - Credit where it's due, the man page does warn me of the above problem, so currently I just don't use --inplace --no-whole-file and take the disk usage hit. However, although this eliminates problems with file content being incorrect, AIUI it still only breaks hard links when one of the formerly-hardlinked files changes, which means I still can end up with destination hard links that aren't on the source if the source files weren't changed when the link was broken. In practice I can't think why I would break a hard link but keep the contents the same, but it's frustrating to have this barrier in the way of me being able to say without reservation that my backup replicates my filesystem exactly, and that I will be able to restore from my backup without causing any problems. Overall, then, I'm a big fan of the idea that --hard-links (or if not, then some additional flag) should make the target directory look exactly like the source directory in link structure.