Bug 10352 - link-by-hash hardlink-collection maintenance mode
Summary: link-by-hash hardlink-collection maintenance mode
Status: NEW
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.1.1
Hardware: All All
: P5 enhancement (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-30 17:15 UTC by Jim Klimov
Modified: 2014-01-19 22:40 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jim Klimov 2013-12-30 17:15:25 UTC
It is possible that files, for which link-by-hash has created the structured hardlink directories, would change over time. Of course, this would also change their hashes.

I propose that a maintenance mode be added to the link-by-hash patch to discover hash-filenames that no longer match the hash of their contents. One way would be to incorporate some metadata into the filename of the hash-file (such as the size which exists there today - maybe add the last modification timestamp also) and find the hash-filenames whose actual filesystem metadata does not match the metadata stored in the name. Such files would be candidates for recalculation of the content hash and subsequent renaming to match both the current hash value and FS metadata.
Comment 1 Wayne Davison 2014-01-19 22:40:48 UTC
The idea of the link-by-hash patch was for use on backup servers where files are never modified in place.  However, perhaps having a way to specify what info goes into the hash file names would allow someone to do this on their own.  Ultimately, though, switching over to a filesystem that supports de-duplicating writes and copy-on-write would be the best bet in the long run.