I'm using rsync to mirror a 2TB archive from a LOCAL machine to a REMOTE one. The initial sync was performed over a 1gbit lan. Now, it is being performed over the internet with 2 servers at 50mbits. I've tried everything mentioned in the FAQ and other places to remedy this problem to no avail. I've tweaked my ~/.ssh/config to keep it alive (client/server), and changed some of the rsync settings to keep it active and not timeout. However, the problem persists. I've attempted to log the dbug info as mentioned in the FAQ. Below are my rsync calls and 100 lines of debug output: $ rsync -aivz --partial --delete-during --stats --progress --rsync-path=/tmp/rsync-debug /volume/ -e ssh user@host.org:/home/user/volume_backup ; [pid 4195] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59, 999996}) [pid 4195] read(0, "3/\225\177\10d\233>\"C\27\260~\332\235`U ^\10H\312\3654+\223\277J\372m\213n"..., 4092) = 4092 [pid 4195] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59, 999995}) [pid 4195] read(0, "\374\17\0\7", 4) = 4 [pid 4195] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59, 999996}) [pid 4195] read(0, "\373\266\317\333!]\367\227U\345\27w\206l\307\246\3727\t\205\337;\207\366U\363\33\356{\375\244\273"..., 4092) = 4092 [pid 4195] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59, 999996}) [pid 4195] read(0, "\374\17\0\7", 4) = 4 [pid 4195] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59, 999996}) [pid 4195] read(0, "\276;\232\325\353C\300\315v\7Q8\25\v4\303\35\234\217\273\230\327U\373\245;\342\222\4\17\367\305"..., 4092) = 4092 [pid 4195] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59, 999996}) [pid 4195] read(0, "\374\17\0\7", 4) = 4 [pid 4195] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59, 999996}) [pid 4195] read(0, "TK\244\3\255\223\246d4\214\235\325r\216\214\241\203\347\331Q\232t@\333#)\f\25\7\30\230\r"..., 4092) = 4092 [pid 4195] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59, 999996}) [pid 4195] read(0, "", 4) = 0 [pid 4195] select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left {59, 999994}) [pid 4195] write(4, "S\0\0\nrsync: connection unexpected"..., 87) = 87 [pid 4194] <... select resumed> ) = 1 (in [3], left {31, 468490}) [pid 4195] rt_sigaction(SIGUSR1, {SIG_IGN, [], SA_RESTORER, 0x7fac05ca1af0}, <unfinished ...> [pid 4194] read(3, <unfinished ...> [pid 4195] <... rt_sigaction resumed> NULL, 8) = 0 [pid 4194] <... read resumed> "S\0\0\nrsync: connection unexpected"..., 8184) = 87 [pid 4195] rt_sigaction(SIGUSR2, {SIG_IGN, [], SA_RESTORER, 0x7fac05ca1af0}, <unfinished ...> [pid 4194] select(4, [3], [1], [1], {60, 0} <unfinished ...> [pid 4195] <... rt_sigaction resumed> NULL, 8) = 0 [pid 4194] <... select resumed> ) = 1 (out [1], left {59, 999993}) [pid 4195] close(1 <unfinished ...> [pid 4194] write(1, "S\0\0\nrsync: connection unexpected"..., 87 <unfinished ...> [pid 4195] <... close resumed> ) = 0 [pid 4194] <... write resumed> ) = -1 EPIPE (Broken pipe) [pid 4195] write(3, "\0Ar\264\f\3151\225\350G\v\365\207\203\253\224\37\t}y\255^q}\241\250\341\2Lq2\361"..., 57465 <unfinished ...> [pid 4194] --- SIGPIPE (Broken pipe) @ 0 (0) --- [pid 4194] write(2, "rsync: writefd_unbuffered failed"..., 90rsync: writefd_unbuffered failed to write 87 bytes to socket [generator]: Broken pipe (32)) = 90 [pid 4194] write(2, "\n", 1 ) = 1 [pid 4194] rt_sigaction(SIGUSR1, {SIG_IGN, [], SA_RESTORER, 0x7fac05ca1af0}, NULL, 8) = 0 [pid 4194] rt_sigaction(SIGUSR2, {SIG_IGN, [], SA_RESTORER, 0x7fac05ca1af0}, NULL, 8) = 0 [pid 4194] getpid() = 4194 [pid 4194] kill(4195, SIGUSR1) = 0 [pid 4194] write(2, "rsync error: error in rsync prot"..., 90rsync error: error in rsync protocol data stream (code 12) at io.c(1530) [generator=3.0.7]) = 90 [pid 4194] write(2, "\n", 1 ) = 1 [pid 4194] gettimeofday({1294605317, 147297}, NULL) = 0 [pid 4194] select(0, NULL, NULL, NULL, {0, 100000} <unfinished ...> [pid 4195] <... write resumed> ) = 57465 [pid 4195] --- SIGUSR1 (User defined signal 1) @ 0 (0) --- [pid 4195] close(3) = 0 [pid 4195] lstat("G/_/videos/.The.Good.The.Bad.And.The.Ugly.1966.EXTENDED.720p.mkv.rPfKnC", {st_mode=S_IFREG|0700, st_size=1799413881, ...}) = 0 [pid 4195] chmod("G/_/videos/.The.Good.The.Bad.And.The.Ugly.1966.EXTENDED.720p.mkv.rPfKnC", 0755) = 0 [pid 4195] rename("G/_/videos/.The.Good.The.Bad.And.The.Ugly.1966.EXTENDED.720p.mkv.rPfKnC", "G/_/videos/The.Good.The.Bad.And.The.Ugly.1966.EXTENDED.720p.mkv" <unfinished ...> [pid 4194] <... select resumed> ) = 0 (Timeout) [pid 4194] gettimeofday({1294605317, 247806}, NULL) = 0 [pid 4194] exit_group(12) = ? <... rename resumed> ) = 0 select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left {59, 999992}) write(4, "Y\0\0\nrsync error: error in rsync "..., 93) = -1 EPIPE (Broken pipe) --- SIGPIPE (Broken pipe) @ 0 (0) --- rt_sigaction(SIGUSR1, {SIG_IGN, [], SA_RESTORER, 0x7fac05ca1af0}, NULL, 8) = 0 rt_sigaction(SIGUSR2, {SIG_IGN, [], SA_RESTORER, 0x7fac05ca1af0}, NULL, 8) = 0 gettimeofday({1294605317, 421295}, NULL) = 0 select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout) gettimeofday({1294605317, 521602}, NULL) = 0 exit_group(12) = ? Process 4195 detached
could rsync perhaps implement a new flag to ask it to retry a certain number of times, or perhaps infinitely... never give up? caching the ssh login would probably be requred for ssh connections.
Rsync doesn't support a reconnect because the protocol is stateful, and currently only supports starting the transfer from the beginning again (which can be done by the caller). What you need to strace is the other side of the connection. If both sides are getting a broken connection error, then the connection went down and there is nothing rsync can do about it. If there was some kind of error on the other side, then you can attempt to resolve that error (so that the remote rsync doesn't go away). See http://rsync.samba.org/issues.html and the rsync-debug script for a possible way to strace the remote side. Finally, if your connection is just flaking out, you can try dividing up the mirroring task into smaller chunks that can be sent by a series of rsync tasks. One extreme example (that is probably overkill) would be to rsync just a single directory at a time by using -d instead of -r and still using --delete: #!/usr/bin/perl use strict; my $SRC = '/tmp/src'; # set to '/volume' my $DEST = '/tmp/dest'; # set to 'user@host.org:/path/...' chdir $SRC or die "Unable to chdir to $SRC: $!"; $/ = "\0"; open PIPE, '-|', qw( find . -type d -print0 ) or die $!; while (<PIPE>) { chomp; if (system qw( rsync -aidR --no-r --del ), "$_/", $DEST) { warn "Got error from rsync for directory $_\n"; } } close PIPE;
My problem was solved when I swapped out the network cable between my box and my router. Apparently, one of the 8 pieces of copper in the CAT-5 were broken.