The Samba-Bugzilla – Bug 10352
link-by-hash hardlink-collection maintenance mode
Last modified: 2014-01-19 22:40:48 UTC
It is possible that files, for which link-by-hash has created the structured hardlink directories, would change over time. Of course, this would also change their hashes.
I propose that a maintenance mode be added to the link-by-hash patch to discover hash-filenames that no longer match the hash of their contents. One way would be to incorporate some metadata into the filename of the hash-file (such as the size which exists there today - maybe add the last modification timestamp also) and find the hash-filenames whose actual filesystem metadata does not match the metadata stored in the name. Such files would be candidates for recalculation of the content hash and subsequent renaming to match both the current hash value and FS metadata.
The idea of the link-by-hash patch was for use on backup servers where files are never modified in place. However, perhaps having a way to specify what info goes into the hash file names would allow someone to do this on their own. Ultimately, though, switching over to a filesystem that supports de-duplicating writes and copy-on-write would be the best bet in the long run.