Bug 6816 - Delta-transfer algorithm does not reuse already transmitted identical blocks
Summary: Delta-transfer algorithm does not reuse already transmitted identical blocks
Status: NEW
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.0.5
Hardware: Other All
: P3 enhancement (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-10-15 10:24 UTC by Martin Scharrer
Modified: 2009-10-15 10:29 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Scharrer 2009-10-15 10:24:47 UTC
Hi,

I observed the following behavior of rsync: If a file contains identical blocks (e.g. all-zero, etc.) then these blocks are not re-transfered but reused by the delta-transfer algorithm - BUT only if one of these blocks is already in the destination file. If not or if the destination file does not exists yet, all identical blocks are copied over and over again. In some special cases (e.g. large sparse files which are rsync'ed --inplace, i.e. -S can't be used) it is much better to interrupt the rsync operations after a while and restart it so that the identical blocks are reused, not re-transfered.

A good (but kind of trivial) example whould be a big file (say 1GB) only containing zeros (dd if=/dev/zero of=file bs=1M count=1k) which is transfered without the -S option. If the file does not exists at the destination it is copied as a whole like e.g. 'scp' whould do it. I my case it is copied with about 2MB/s. But if the file already exists, even which only a very small size, the identical blocks are reused and the "transfer speed" is around the destination hard drive I/O speed (in my case 60-120MB/s, target is a tmpfs ramdisk).
I also tested this with a file with pseudo-random, but repeating content (dd if=/dev/urandom of=temp bs=1M count=10; cat temp temp ... temp > file). If the first rsync process is aborted and restarted after the first repeating block was transfered the second rsync process is only sending meta-data, because the existing content is just replicated.

It would be great if the delta-transfer algorithm would be extended to account for identical to-be-send data blocks, i.e. first send the first appearance of such a block and then simply reuse it during the same rsync process. IMHO this should not be so difficult to implement, because most needed functionality is already there.
Comment 1 Martin Scharrer 2009-10-15 10:29:29 UTC
This enhancement would also effectively solve bug #5801, also reported by me.