Bug 11474 - Retry delay for lost connection
Summary: Retry delay for lost connection
Status: NEW
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.1.2
Hardware: All All
: P5 enhancement (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
Depends on:
Reported: 2015-08-30 21:35 UTC by Haravikk
Modified: 2015-08-30 21:35 UTC (History)
0 users

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Haravikk 2015-08-30 21:35:26 UTC
Currently when a connection is lost, rsync will abort the rest of the transfer, forcing it to be started from the beginning once again. Since rsync is pretty much designed around transferring only changed files this isn't usually a big deal, however with very large (in size or quantity) transfers, or transfers over slower connections, resuming can be a very slow process; a backup that might take four hours and gets interrupted, could take two or three hours to actually resume as it runs through previously transferred files necessarily.

What I would like to propose is that a new option be added that will cause rsync to wait instead of failing when a connection is lost, and instead try to re-establish the connection at periodic intervals until that time limit is reached. If the connection is re-established then the transfer will resume where it left off, thus skipping any previous files.

I'm uncertain how difficult this would be to implement, as it shouldn't really matter to the receiving side at what point the sender begins (it'll just treat it like a brand new transfer that just happens to begin at that point). The main question mark that I can think of is how delayed actions (like deletions) are handled; if some files are tracked only on the receiving side then the sender may need to be able to track what the receiver should know, so that it can be sent as part of the "new" transfer. If this would be too complex, the simplest option might be to have this flag require the use of --delete-during or --delete-before instead of --delete-after or --delete-delay, with similar treatment of any other affected options.

I think that connection issues are probably one of the more common reasons for very large transfers to fail, and being able to have rsync handle these itself would make things a lot easier, and faster. It would also make scripting rsync error handling simpler, as most other errors represent a fault that needs to be addressed separately, rather than something that can be solved by retrying. I've seen several scripts that incorrectly put rsync transfers in a loop so they will immediately retry on encountering any error; while this may look neat and be fine for connection issues, it's no good for errors such as incompatible file-names, since these require user intervention, and don't actually stop the transfer (only each single file that fails).