When rsyncing a large amount of data (about 200GB) the destination takes up much more space (about an additional 40GB). When -S is specified it takes less but still moe than the source. Looking at the file sizes, they seem the same. The difference is in the amount being reported used by df and ls. When using cp -a the space used is identical.
Does the source tree contains hard-linked files?
If so, has -H rsync option been used?
cp -a would preserve hard links.
(In reply to comment #1)
> Does the source tree contains hard-linked files?
> If so, has -H rsync option been used?
> cp -a would preserve hard links.
Yes, the command was:
rsync -vaH /source/ /dest
There are a few places where space can be different:
1. (Already covered) hard-linked files in the transfer becoming unlinked. (needs -H)
2. The blocksize of the destination filesystem is different from the source filesystem, so "du" (which counts the wasted space in its block total) can report a different value if the amount of wasted space is different.
3. There could be some wasted space in directory files (depending on the filesystem) because rsync uses a temporary-file name and renames it at the end. (This would probably be a small amount of space, however.)
4. (Also already covered) Sparse files need to be copied sparsely. (requires -S)
I can't think of any other reasons for the sizes to differ. I ran some simple tests and wasn't able to reproduce the problem (in fact, for one sparse-file I created, "cp -a" changed the blocksize from 9 to 133, but rsync kept the file at 9 blocks).
Here are some things to check:
1. Was the test between cp and rsync done on the same hard-disk partition (so that changes in block size are ruled out)?
2. Was -S used during the first copy, not just the updates? (Rsync won't know that a file needs to be updated if its mtime and byte-size match, even if it was copied the first time without -S.)
3. What are the actual files that differ? I'd suggest running this on each of the resulting dirs:
find . -printf '%p\t%b\t%s\t%n\n' | sort >/tmp/foo.txt
You can then compare both the output files and see which items differ in block- and/or byte-size. If you find a difference, figure out what it's due to: changes in sparse size? directory-size variance? hard-link count wrong? Hopefully that will help you to narrow down what is the cause of you're seeing.
Closing due to lack of response. If there is more to say, please feel free to add a comment and re-open.