The Samba-Bugzilla – Bug 3444
Deal with case-insensitive file-systems better (perhaps by adding an option)
Last modified: 2016-12-02 22:22:50 UTC
When cygwin is involved on either end of the transmission, when the only difference between a file on the local machine and a file on the remote machine is the case of their file name and that the --delete option is provided, the destination file is deleted and the source file is not copied.
It's pretty clear that the trouble is the case insensitivity of the filesystem. In fact, I reproduced this behavior on a vfat partition mounted on Linux:
mkdir test1 test2
touch --reference=test1/Foo test2/foo
rsync -avvvii --delete-after test1/ test2/
Note that statting foo, Foo, or FOO will find the destination file, but readdir gives its name as foo. When rsync decides to copy test1/Foo, it stats test2/Foo and finds a matching file: it has nothing to do. Then it scans test2/ for things to delete and finds an extraneous file test2/foo, which it deletes. So --delete-after can cause data loss.
When --delete-before or --delete-during is used, rsync deletes the destination file as unmatched and then copies the source file all over again; this is inefficient, but it does not cause data loss and it brings case on the destination into line with case on the source.
One way to fix this data loss problem is to detect the case insensitivity of the destination filesystem and compare files case insensitively with the entries of the file list when deciding whether to delete them. But that's really ugly. Having rsync remember the i-node numbers of the files it creates and skip deleting those files might work, but it's still a hack. For the moment, avoid --delete-after on filesystems that accept inexact filenames, including case insensitive ones and ones that perform translation/canonicalization of character encodings (Mac OS X?).
The behavior you reported ("the source file is not copied") only occurs when --delete-after is used, not --delete (as Matt mentioned). There are other potential problems with copying files to a case-insensitive file system, such as two files getting copied that differ only in the case of their name.
The best solution is probably to add an option to tell rsync to consider all such same-except-for-case names as identical. In the patches dir there is an implementation of this named ignore-case.diff. Note that the patch that came with 2.6.6 builds, but is ineffective. The version in CVS should be OK.
> Note that the patch that came
> with 2.6.6 builds, but is ineffective. The version in CVS should be OK.
Do you mean that the ignore-case.diff patch that is found in the patches directory of the 2.6.6 source distribution does not work ? And that I should use the patch that is found in the CVS tree instead ?
Thanks for the hint.
(In reply to comment #3)
> Do you mean that the ignore-case.diff patch that is found in the patches
> directory of the 2.6.6 source distribution does not work ?
That's right. The patch does not modify the main name-comparing function (which changed from when the patch was written).
> And that I should use the patch that is found in the CVS tree instead ?
That patch will only apply to the CVS tree, not to 2.6.6. However, you could first apply the patches/ignore-case.diff from the 2.6.6 source, and then only apply the change to flist.c from the CVS patch (it's the first file in the CVS patch) and I assume that would get it working in 2.6.6.
--ignore-case solves the critical problem of unwanted file deletions.
There still is a (less critical) problem when one want to change the case of a file name. Even on case insensitive file systems, the file names have case differences, only they are not meaningfull. When the same tree is used on a file system when case are meaningfull, the maintainer of the tree is very likely to
change the case of a file. The typical example is when A.txt is created on a case insensitive file system and refered to by an application as a.txt (lowercase). When the application runs on a case sensitive file system, it fails. If, for some reason, it is impractical to change the application to use A.txt instead of a.txt, one want to change the case of the file.
I'm sure you can imagine loads of situations where changing the case of a file is relevant and I'm also sure the rsync user expects this to support such a change in some way.
That problem would not exist if case insensitive file systems changed all file names to lowercase. But case insensitive file systems are not case ignorant and that complicates the situation a bit.
I have been experiencing strange behaviour in a situation like this. I'm not sure if this is supposed to occur but it doesn't seem like it should. In my case I am using rsync to copy images from my Windows machine to a remote Linux server. Filenames are showing up like CRW_1951.CRW in Windows but they are created in the Linux system as crw_1951.crw - in other words something is converting case during the rsync process. Rsync on Windows is 2.6.4 using cwRsync and on Linux kernel 188.8.131.52, rsync version 2.6.3. This is creating problems obviously as the change in case means that files don't compare correctly and in particular using --delete causes all the files to be removed on the next sync after they have been copied once. I would expect the files should be copied with case intact - something is altering that and I don't know if this is due to Windows weird behaviour, rsync or even some ssh interaction. I'd be most interested in some option that would force case to be consistently the same as I have noticed that not all files follwo this pattern. For example, I have files like CRW_2063-Edit-2.psd and CRW_1965.xmp that actually do keep their case intact when created on the receiving end. Is this problem being caused by differing versions that don't talk properly to each other?
(In reply to comment #0)
> When cygwin is involved on either end of the transmission, when the only
> difference between a file on the local machine and a file on the remote machine
> is the case of their file name and that the --delete option is provided, the
> destination file is deleted and the source file is not copied.
(In reply to comment #6)
Note - after further looking at this I found that rsync is creating temp transfer files (at Linux end) with correct case but after the file is received and renamed to it's final name the case has been changed - eg. the temp file .CRW_1951.CRW.oJMn81 will get renamed upon completion to crw_1951.crw - which just seems wrong to me - but what is causing this?
That's some compatibility code outside of rsync, since rsync doesn't change the case of names. Back in the old days, a filesystem could only hold upper-case letters. When mixed case came into effect, some folks wanted to display an all-upper-case name as lower-case to make it look better.
Check with the cygwin folks, as there is probably a mount option available to avoid this translation. Or make sure that your file suffixes are always in
lower case (e.g. CRW_1951.crw).
looks like a patch is already there
PLEASE consider this in the next build! this will help both windows <-> windows and linux <-> windows!
I hate to just post a "me too", but, me too. This bug has been idle for four years.
I use rsync between MacOS HFS+ systems all the time, and having files be deleted and re-transferred just because the capitalization on an intervening directory name has changed is really annoying and wasteful.
What the FUCK guys?! This thing was reported 6 years ago, and somebody provided a patch (which was written back in 2003), but then you sit on it for 9 years??! That's certainly a nice "patches welcome, but really, we don't give a shit" attitude you have there.
Do you realize how popular this option is needed? Do I really need to LMGTFY?
The problem is so big an issue that the ignore-case.diff patch is well known. GNU tar is running circles around you on this issue. It can --ignore-case, specify an exclude file, --no-ignore-case, and specify another (case-sensitive) exclude file.
Samba, on the other hand, doesn't have the common decency to provide a --ignore-case option within core rsync. This shouldn't be a patch. I use Debian packages, and generally don't compile sources. GNU tar has it in their core, and they don't care if it's Windows or not. It's not a Windows problem. Some devices or some systems will output media in different cases of extentions, like JPG/jpg or GIF/Gif/gif.
Just put the patch in, write some docs on it, close this bug, and move on!
I believe that bug #7951 and bug #10448 are the duplicates of this one. Would be nice to have an ability to define more readable masks unlike "*.[Pn][Nn][Gg]".