Bug 3485 - rsync uses more space in destination even with -S specified
Summary: rsync uses more space in destination even with -S specified
Status: RESOLVED WORKSFORME
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 2.6.6
Hardware: x86 Linux
: P3 major (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-02-05 04:58 UTC by Richard Lennox
Modified: 2006-10-15 13:59 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Lennox 2006-02-05 04:58:17 UTC
When rsyncing a large amount of data (about 200GB) the destination takes up much more space (about an additional 40GB).  When -S is specified it takes less but still moe than the source.  Looking at the file sizes, they seem the same.  The difference is in the amount being reported used by df and ls.  When using cp -a the space used is identical.
Comment 1 Wojtek Pilorz 2006-02-05 13:05:27 UTC
Does the source tree contains hard-linked files?
If so, has -H rsync option been used?
cp -a would preserve hard links.
Comment 2 Richard Lennox 2006-02-06 04:21:19 UTC
(In reply to comment #1)
> Does the source tree contains hard-linked files?
> If so, has -H rsync option been used?
> cp -a would preserve hard links.
> 

Yes, the command was:

rsync -vaH /source/ /dest
Comment 3 Wayne Davison 2006-02-06 14:16:44 UTC
There are a few places where space can be different:

1. (Already covered) hard-linked files in the transfer becoming unlinked. (needs -H)

2. The blocksize of the destination filesystem is different from the source filesystem, so "du" (which counts the wasted space in its block total) can report a different value if the amount of wasted space is different.

3. There could be some wasted space in directory files (depending on the filesystem) because rsync uses a temporary-file name and renames it at the end. (This would probably be a small amount of space, however.)

4. (Also already covered) Sparse files need to be copied sparsely. (requires -S)

I can't think of any other reasons for the sizes to differ.  I ran some simple tests and wasn't able to reproduce the problem (in fact, for one sparse-file I created, "cp -a" changed the blocksize from 9 to 133, but rsync kept the file at 9 blocks).

Here are some things to check:

1. Was the test between cp and rsync done on the same hard-disk partition (so that changes in block size are ruled out)?

2. Was -S used during the first copy, not just the updates?  (Rsync won't know that a file needs to be updated if its mtime and byte-size match, even if it was copied the first time without -S.)

3. What are the actual files that differ?  I'd suggest running this on each of the resulting dirs:

find . -printf '%p\t%b\t%s\t%n\n' | sort >/tmp/foo.txt

You can then compare both the output files and see which items differ in block- and/or byte-size.  If you find a difference, figure out what it's due to: changes in sparse size? directory-size variance? hard-link count wrong?  Hopefully that will help you to narrow down what is the cause of you're seeing.
Comment 4 Wayne Davison 2006-10-15 13:59:45 UTC
Closing due to lack of response.  If there is more to say, please feel free to add a comment and re-open.