Bug 10051 - Improved long file-name handling
Summary: Improved long file-name handling
Status: NEW
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.1.0
Hardware: All All
: P5 enhancement (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-07-27 18:14 UTC by Haravikk
Modified: 2019-08-03 12:00 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Haravikk 2013-07-27 18:14:31 UTC
One of the issues with rsync between two different systems is the possibility of file-systems with stricter limitations on the length of a file name or even file path. Now, the latter I'm not sure can be resolved easily, but long names cause two main errors:

rsync: recv_generator: failed to stat "/foo/really_long_name": File name too long (36)
rsync: mkstemp "/foo/" failed: No such file or directory (2)

Basically any attempt to stat an existing file on the receiving end will fail (it probably isn't there anyway). mkstemp then later fails presumably because the temporary name is too long so no file is actually created, it then creates the strange second error which will report the target root as not existing, even though it does.


What I would like to propose is a new feature for handling long file-names, by adding something like the following:

--long-hash (md5|sha1|sha2|none)
--long-hash-ext .rsync.hashed

Quite simply, if a file-name is encountered that is too long for the target file-system, then it is run through the specified hashing algorithm, with the resulting hash being used as the name instead when transferring the file (or looking for an existing file).

The default setting of none would throw an rsync error instead, with the assumption being that renaming the file could introduce errors. For example if you were rsyncing an application bundle but something was renamed then the cloned application may not be functional, so an error would be preferable. However, if you're using rsync for a backup then you may be okay with renaming the file to ensure that it is at least copied.

A possible to addition to this feature would be:

--long-hash-namefile *.rsync.name

Basically this lets you choose a format for a name-file; any file that has to have its named hashed would have a name-file created alongside it using the specified format. If rsync is sending a hashed file with a matching name-file then it can open this in order to restore the original file-name.

For example:

I want to rsync the filename "hugefilename.txt", with the md5 algorithm set rsync will send this as "520b0999cd97ae3af36744e0f9cb1839.rsync.hashed" and create alongside it a file named "520b0999cd97ae3af36744e0f9cb1839.rsync.name" containing the original filename of "huge filename".

Of course naming of the parameters is entirely for example purposes, but hopefully you get the idea. Basically a file with long filename has the name hashed and a suitable extension added, if rsync encounters a file with that extension then it can look for a name-file to expand. When syncing to a folder, if a file has a long file-name then rsync can hash that file-name and look for a .rsync.hashed file to run its usual checks against.
Comment 1 Haravikk 2019-08-03 12:00:05 UTC
Wow, was about to post basically this same feature, forgetting I'd already requested it six years ago!

There's definitely still an argument to be made for rsync to handle file names better when they are invalid on the target device, however my original proposal is far too basic.

I'd like to propose the following altered options:

--rename-dest [error|md5|sha1|sha2]
    Determines the behaviour when a filename from the source is invalid on the target,
    either due to length or invalid characters. By default, an error is produced, 
    otherwise a hashing algorithm can be specified to create a compact new name for 
    the file.
--rename-dest-ext .rsync
    Sets a file extension for renamed files.
--dest-meta .meta
    Specifies the file extension to use for meta files, into which additional data 
    about a file's transfer can be written. For example, if a file is renamed, then a
    file with the same hashed name but this extension will be created, containing the
    original name of the file. For example, a file called "birthday/anniversary.jpg" 
    is invalid on the target and so is renamed a1df35adf4b3df93458d84c014b56465.rsync
    and alongside it is stored a1df35adf4b3df93458d84c014b56465.meta with the line:
        name:25:birthday/anniversay.jpg
    Note the length is specified so characters in the original name cannot interfere 
    with the meta file itself.

--source-meta .meta
    Specifies the file extension used for meta files on the source side of the 
    transfer, allowing rsync to check for such files and use them when transferring 
    files. For example, in the case of a renamed file the meta file will contain the 
    original file name, allowing rsync to attempt to transfer the file under its 
    original name, if the new target supports it (e.g- a transfer to original source).
--meta .meta
    Shorthand for specifying both --dest-meta and --source-meta at the same time.


Maybe there's still a more elegant way to do this? What's certain at least is that rsync could really use a way to more reliably handle files that cannot be transferred properly.

I opted to go for a generic meta file concept as it's possible rsync could use this for other features in future, for example, files too large for a target filesystem, could be split, with a meta file entry detailing how to reassemble them from smaller pieces.