Bug 5403 - -H (--hard-links) is broken when sending to remote
Summary: -H (--hard-links) is broken when sending to remote
Status: RESOLVED FIXED
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.0.4
Hardware: x86 NetBSD
: P3 normal (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-04-18 01:28 UTC by Geoff Wing
Modified: 2008-07-29 20:21 UTC (History)
0 users

See Also:


Attachments
output from 2.6.9 to 3.0.3 showing hardlink success (36.48 KB, text/plain)
2008-07-28 20:22 UTC, Geoff Wing
no flags Details
output from 3.0.3 to 3.0.3 showing hardlink failure (e.g. no => lines) (93.56 KB, text/plain)
2008-07-28 20:23 UTC, Geoff Wing
no flags Details
Adding 1 to the dev number to avoid a 0 (337 bytes, patch)
2008-07-29 09:33 UTC, Wayne Davison
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Geoff Wing 2008-04-18 01:28:12 UTC
Running local to local, hard-link support has exhibited no obvious problems.

Running local to remote, a file hard-linked, say 153 times, would be sent 153 times and stored on the remote 153 times.  Both ends running 3.0.2.
This seems to be independent of recursion options - exhibited without -r, with -r, and with -r --no-i-r

Not tested:
- interaction between 3.* and 2.*
- running remote to remote, or remote to local
Comment 1 Geoff Wing 2008-04-21 23:17:16 UTC
Running local 2.6.9 to remote 3.0.2 works.

Running local 3.0.2 to remote 3.0.2 works for _some_ hardlinks.  I haven't isolated which conditions cause failure.
Comment 2 Wayne Davison 2008-07-28 20:11:50 UTC
Any more info on this?
Comment 3 Geoff Wing 2008-07-28 20:22:55 UTC
Created attachment 3431 [details]
output from 2.6.9 to 3.0.3 showing hardlink success
Comment 4 Geoff Wing 2008-07-28 20:23:44 UTC
Created attachment 3432 [details]
output from 3.0.3 to 3.0.3 showing hardlink failure (e.g. no => lines)
Comment 5 Wayne Davison 2008-07-28 20:57:59 UTC
Things to try:

Use --protocol=29 on the 3.0.3 -> 3.0.3 transfer and see if that makes the hard links work.

Use 3.1.0dev on both systems (either from the git repository or the latest nightly tar file) and use the --debug=hlink4 option (with no need for so much general verbosity, and no --protoocl=29 option either) and that might help to illuminate what is happening.
Comment 6 Geoff Wing 2008-07-28 21:20:52 UTC
--protocol=29 with 3.0.3 -> 3.0.3 didn't work

Using --debug=hlink4 with rsync-HEAD-20080727-2332GMT both ends didn't work either.
# rsync -e rsh -avvRHW --debug=hlink4 --delete-after /rescue remotemachine:/
opening connection using: ...
...
/rescue/zgrep
total: matches=0  hash_hits=0  false_alarms=0 data=652232880
deleting in rescue
sent 652320746 bytes  received 2922 bytes  17871881.32 bytes/sec
total size is 652232880  speedup is 1.00
#
Comment 7 Wayne Davison 2008-07-28 21:53:47 UTC
Were there some changes for the rsync to update?

I'd like to see the debug output that was generated by the --debug=hlink4 run.  And know what files should have been hard-linked that weren't.
Comment 8 Geoff Wing 2008-07-28 22:26:39 UTC
I'm emptying out the destination directory before each run (though there are the same results leaving one or two files in there).

There's no difference in the output shown with --debug=hlink4 (except the first line redisplaying the running options and the stats line at the end)

Am I missing some compile time thing?

Running with --debug=all4 gives no '=>' outputs.  Is that still supposed to be occuring near the end of phase 1?  
Comment 9 Wayne Davison 2008-07-29 09:33:30 UTC
Created attachment 3438 [details]
Adding 1 to the dev number to avoid a 0

Interesting.  I think that can mean only one thing:  you have a device that has a number of 0.  Try running this perl command after you cd into the source dir:

perl -e 'print "dev: ", (stat("."))[0], "\n"'

If it prints "dev: 0", that is the problem and this patch should solve it.
Comment 10 Geoff Wing 2008-07-29 17:20:48 UTC
Fantastic.  That is indeed the situation.  "/" mount on NetBSD seems to always have device as 0 (at least on all my machines).  The patch did work for me.

I do have a device 0x80000002.  I guess it's conceivable that someone could have a device 0xffffffff with 32bit device numbers - seems pretty unlikely though.  If it's only used internally I'm wondering why bother overloading it.

Still, I'm quite happy with it :-)

Thank you,
Geoff
Comment 11 Wayne Davison 2008-07-29 20:21:23 UTC
I have also added a cast to int64 (which is the internal type used for all device and inode numbers) so that a 32-bit device number with all bits on will still be non-zero in the hard-link processing (as long as a dev_t is unsigned, which it should be).  Of course, a 64-bit device number with all bits on will overflow to zero, but the current hash code must reserve one number (out of the 18,446,744,073,709,551,615 values available) to indicate that a hash position is empty, and so choosing an st_dev of 0xffff_ffff_ffff_ffff as the odd-man-out seems like the best choice.

This fix will go out in the next 3.0.4 pre-release (which will probably be the final pre-release for 3.0.4).