Bug 3117 - rsyncing a file to a partially downloaded copy is extremely slow
Summary: rsyncing a file to a partially downloaded copy is extremely slow
Status: CLOSED WONTFIX
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 2.6.6
Hardware: All Linux
: P3 enhancement (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-09-25 06:32 UTC by Philipp Rumpf
Modified: 2006-03-12 02:56 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Rumpf 2005-09-25 06:32:38 UTC
I would imagine that it is common to use rsync to rsync a file to a partial
download of itself (or a prefix of the file that might arise from something
being appended to the source file).  However, for large files, this seems to be
extremely slow since many small chunks of constant size are compared.

While the --block-size option can help with this, the block size to use has to
be calculated for each rsync invocation to avoid retransmitting an average of
block-size/2 bytes.

I would suggest that modifying the rsync algorithm to initially compare chunks
of exponentially increasing size until a mismatch is found would probably be
worth it in terms of the total bandwidth saved.  Even if you disagree with that,
a quick-and-dirty fix would be an option that would cause rsync to check for the
case that the larger file results from the smaller file by appending data before
going into the full rsync algorithm.  I believe this wouldn't take more than a
couple of minutes for someone familiar with rsync internals.

It certainly seems odd that rsync is essentially unusable for something that
wget --continue deals with.

To help searching for this bug: log files append appending live streams partial
download aborted download interrupted download restarting rsync restart
Comment 1 Wayne Davison 2005-10-17 10:56:01 UTC
The --append option was added (it's in CVS now) to make it easy to continue the
sending of large files.

Because of the pipelined nature of the current rsync algorithm, it is not
possible for the two sides to interact in some kind of block-probing algorithm
(which would slow the algorithm due to round-trip delays).