13645 – Improve efficiency when resuming transfer of large files

Bug 13645 - Improve efficiency when resuming transfer of large files

Summary: Improve efficiency when resuming transfer of large files

Status:	RESOLVED WONTFIX

Alias:	None

Product:	rsync
Classification:	Unclassified
Component:	core (show other bugs)
Version:	3.0.9
Hardware:	All All

Importance:	P5 enhancement (vote)
Target Milestone:	---
Assignee:	Wayne Davison
QA Contact:	Rsync QA Contact

URL:
Keywords:

Depends on:
Blocks:

Reported:	2018-10-05 17:34 UTC by Rob Janssen
Modified:	2019-08-05 10:37 UTC (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Rob Janssen 2018-10-05 17:34:31 UTC

When transferring large files over a slow network, we interrupt rsync at the beginning of business hours leaving the transfer unfinished.

The command used is: rsync -av --inplace --bwlimit=400 hostname::module /dest

When restarting the transfer, a lot of time is "wasted" while first the local system is reading the partially transferred file and sends the checksums to the remote, which only then starts to read the source file until it finds something to transfer.  So nothing happens until 2 times the time required to read the partial transfer from the disks!  When the partial file is many many GB, this can take hours.

Suggestions:
1. when the source is larger than the destination, immediately begin to transfer from the offset in the source equal to the size of the destination.  it is already known that this part will have to be transferred.
2. try to do the reading of the partial file at the destination and the same part of the source in parallel (so the time is halved), and preferably also in parallel to 1.

Of course these optimizations (at least #2) may actually decrease performance when the transfer is local (not over slow network) and the disk read rate is negatively affected by reading at two different places in parallel.  So #2 should only be attempted when the transfer is over a network.

Comment 1 Kevin Korb 2018-10-05 17:41:00 UTC

If you are sure the file has not been changed since it was partially copied, see --append.

Comment 2 Rob Janssen 2018-10-05 17:50:50 UTC

Thanks, that helps a lot for this particular use case.
(the files are backups)

Comment 3 Wayne Davison 2018-11-20 22:02:02 UTC

Rsync is never going to assume that a file can be continued, as it doesn't know what the old data is compared to the source. You can tell rsync to assume that the early data is all fine by using --append, but that can cause you problems if any non-new files need an update that is not an append.

Comment 4 Rob Janssen 2018-11-21 08:59:26 UTC

Ok you apparently did not understand what I proposed.
However it is not that important as in our use case we can use --append.