Bug 7903 - Rsync connection closed errors
Summary: Rsync connection closed errors
Status: RESOLVED WORKSFORME
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.0.7
Hardware: x64 Linux
: P3 major (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-01-09 14:54 UTC by .:triune.
Modified: 2011-06-04 19:17 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description .:triune. 2011-01-09 14:54:31 UTC
I'm using rsync to mirror a 2TB archive from a LOCAL machine to a REMOTE one. The initial sync was performed over a 1gbit lan. Now, it is being performed over the internet with 2 servers at 50mbits. I've tried everything mentioned in the FAQ and other places to remedy this problem to no avail. I've tweaked my ~/.ssh/config to keep it alive (client/server), and changed some of the rsync settings to keep it active and not timeout. However, the problem persists. I've attempted to log the dbug info as mentioned in the FAQ. Below are my rsync calls and 100 lines of debug output:


$ rsync -aivz --partial --delete-during --stats --progress --rsync-path=/tmp/rsync-debug /volume/ -e ssh user@host.org:/home/user/volume_backup ;


[pid  4195] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59, 999996})
[pid  4195] read(0, "3/\225\177\10d\233>\"C\27\260~\332\235`U ^\10H\312\3654+\223\277J\372m\213n"..., 4092) = 4092
[pid  4195] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59, 999995})
[pid  4195] read(0, "\374\17\0\7", 4)   = 4
[pid  4195] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59, 999996})
[pid  4195] read(0, "\373\266\317\333!]\367\227U\345\27w\206l\307\246\3727\t\205\337;\207\366U\363\33\356{\375\244\273"..., 4092) = 4092
[pid  4195] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59, 999996})
[pid  4195] read(0, "\374\17\0\7", 4)   = 4
[pid  4195] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59, 999996})
[pid  4195] read(0, "\276;\232\325\353C\300\315v\7Q8\25\v4\303\35\234\217\273\230\327U\373\245;\342\222\4\17\367\305"..., 4092) = 4092
[pid  4195] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59, 999996})
[pid  4195] read(0, "\374\17\0\7", 4)   = 4
[pid  4195] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59, 999996})
[pid  4195] read(0, "TK\244\3\255\223\246d4\214\235\325r\216\214\241\203\347\331Q\232t@\333#)\f\25\7\30\230\r"..., 4092) = 4092
[pid  4195] select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59, 999996})
[pid  4195] read(0, "", 4)              = 0
[pid  4195] select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left {59, 999994})
[pid  4195] write(4, "S\0\0\nrsync: connection unexpected"..., 87) = 87
[pid  4194] <... select resumed> )      = 1 (in [3], left {31, 468490})
[pid  4195] rt_sigaction(SIGUSR1, {SIG_IGN, [], SA_RESTORER, 0x7fac05ca1af0},  <unfinished ...>
[pid  4194] read(3,  <unfinished ...>
[pid  4195] <... rt_sigaction resumed> NULL, 8) = 0
[pid  4194] <... read resumed> "S\0\0\nrsync: connection unexpected"..., 8184) = 87
[pid  4195] rt_sigaction(SIGUSR2, {SIG_IGN, [], SA_RESTORER, 0x7fac05ca1af0},  <unfinished ...>
[pid  4194] select(4, [3], [1], [1], {60, 0} <unfinished ...>
[pid  4195] <... rt_sigaction resumed> NULL, 8) = 0
[pid  4194] <... select resumed> )      = 1 (out [1], left {59, 999993})
[pid  4195] close(1 <unfinished ...>
[pid  4194] write(1, "S\0\0\nrsync: connection unexpected"..., 87 <unfinished ...>
[pid  4195] <... close resumed> )       = 0
[pid  4194] <... write resumed> )       = -1 EPIPE (Broken pipe)
[pid  4195] write(3, "\0Ar\264\f\3151\225\350G\v\365\207\203\253\224\37\t}y\255^q}\241\250\341\2Lq2\361"..., 57465 <unfinished ...>
[pid  4194] --- SIGPIPE (Broken pipe) @ 0 (0) ---
[pid  4194] write(2, "rsync: writefd_unbuffered failed"..., 90rsync: writefd_unbuffered failed to write 87 bytes to socket [generator]: Broken pipe (32)) = 90
[pid  4194] write(2, "\n", 1
)           = 1
[pid  4194] rt_sigaction(SIGUSR1, {SIG_IGN, [], SA_RESTORER, 0x7fac05ca1af0}, NULL, 8) = 0
[pid  4194] rt_sigaction(SIGUSR2, {SIG_IGN, [], SA_RESTORER, 0x7fac05ca1af0}, NULL, 8) = 0
[pid  4194] getpid()                    = 4194
[pid  4194] kill(4195, SIGUSR1)         = 0
[pid  4194] write(2, "rsync error: error in rsync prot"..., 90rsync error: error in rsync protocol data stream (code 12) at io.c(1530) [generator=3.0.7]) = 90
[pid  4194] write(2, "\n", 1
)           = 1
[pid  4194] gettimeofday({1294605317, 147297}, NULL) = 0
[pid  4194] select(0, NULL, NULL, NULL, {0, 100000} <unfinished ...>
[pid  4195] <... write resumed> )       = 57465
[pid  4195] --- SIGUSR1 (User defined signal 1) @ 0 (0) ---
[pid  4195] close(3)                    = 0
[pid  4195] lstat("G/_/videos/.The.Good.The.Bad.And.The.Ugly.1966.EXTENDED.720p.mkv.rPfKnC", {st_mode=S_IFREG|0700, st_size=1799413881, ...}) = 0
[pid  4195] chmod("G/_/videos/.The.Good.The.Bad.And.The.Ugly.1966.EXTENDED.720p.mkv.rPfKnC", 0755) = 0
[pid  4195] rename("G/_/videos/.The.Good.The.Bad.And.The.Ugly.1966.EXTENDED.720p.mkv.rPfKnC", "G/_/videos/The.Good.The.Bad.And.The.Ugly.1966.EXTENDED.720p.mkv" <unfinished ...>
[pid  4194] <... select resumed> )      = 0 (Timeout)
[pid  4194] gettimeofday({1294605317, 247806}, NULL) = 0
[pid  4194] exit_group(12)              = ?
<... rename resumed> )                  = 0
select(5, NULL, [4], [4], {60, 0})      = 1 (out [4], left {59, 999992})
write(4, "Y\0\0\nrsync error: error in rsync "..., 93) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
rt_sigaction(SIGUSR1, {SIG_IGN, [], SA_RESTORER, 0x7fac05ca1af0}, NULL, 8) = 0
rt_sigaction(SIGUSR2, {SIG_IGN, [], SA_RESTORER, 0x7fac05ca1af0}, NULL, 8) = 0
gettimeofday({1294605317, 421295}, NULL) = 0
select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
gettimeofday({1294605317, 521602}, NULL) = 0
exit_group(12)                          = ?
Process 4195 detached
Comment 1 .:triune. 2011-01-10 20:29:02 UTC
could rsync perhaps implement a new flag to ask it to retry a certain number of times, or perhaps infinitely... never give up? caching the ssh login would probably be requred for ssh connections.
Comment 2 Wayne Davison 2011-06-04 16:21:24 UTC
Rsync doesn't support a reconnect because the protocol is stateful, and currently only supports starting the transfer from the beginning again (which can be done by the caller).

What you need to strace is the other side of the connection.  If both sides are getting a broken connection error, then the connection went down and there is nothing rsync can do about it.  If there was some kind of error on the other side, then you can attempt to resolve that error (so that the remote rsync doesn't go away).  See http://rsync.samba.org/issues.html and the rsync-debug script for a possible way to strace the remote side.

Finally, if your connection is just flaking out, you can try dividing up the mirroring task into smaller chunks that can be sent by a series of rsync tasks.  One extreme example (that is probably overkill) would be to rsync just a single directory at a time by using -d instead of -r and still using --delete:

#!/usr/bin/perl
use strict;
my $SRC = '/tmp/src'; # set to '/volume'
my $DEST = '/tmp/dest'; # set to 'user@host.org:/path/...'
chdir $SRC or die "Unable to chdir to $SRC: $!";
$/ = "\0";
open PIPE, '-|', qw( find . -type d -print0 ) or die $!;
while (<PIPE>) {
    chomp;
    if (system qw( rsync -aidR --no-r --del ), "$_/", $DEST) {
        warn "Got error from rsync for directory $_\n";
    }
}
close PIPE;
Comment 3 .:triune. 2011-06-04 16:52:45 UTC
My problem was solved when I swapped out the network cable between my box and my router. Apparently, one of the 8 pieces of copper in the CAT-5 were broken.