we are doing rsync from server A -> B and A->C. A, C are running on Linux 2.4.9-e-34 and B is running on Linux 2.4.9-e-38 the following rsync command works just fine from A->B: rsync --exclude-from /local/dba/scripts/shoprod1_exclude_applfiles.txt -rlvz -e ssh /prod/applmgr/1159/ applprod@twnprod1:/prod/applmgr/1159/ the "same" command doesnt work from A->C: the following rsync command works just fine from A->B: rsync --exclude-from /local/dba/scripts/shoprod1_exclude_applfiles.txt -rlvz -e ssh /prod/applmgr/1159/ applprod@twnprod2:/prod/applmgr/1159/ we get the followin error at different stages, e.g. build stage or after processing a few files: rsync: writefd_unbuffered failed to write 4092 bytes: phase "send_file_name": broken Pipe rsync error: error in rsync protocol data stream (code 12) at io.c(836) we are using rsync2.6.2 protocol version 28. Im at a loss to understand why this is happening. I already tried -vv option -> same error comes after processing some files from the exclusion file If I use the -vvv option, it hangs on a particular command like this on server A: [sender] make_file(per/11.5.0/help/US/puploadw.htm,*,2) [sender] make_file(per/11.5.0/help/US/puplorgd.htm,*,2) --> this is where it hangs.. doing a strace on the rsync process on server A shows : select(5,NULL, [4], NULL, {16,970000}) = 0 (Timeout) select(5,NULL, [4], NULL, {60, 0}) = 0 (Timeout) select(5,NULL, [4], NULL, {60, 0}) = 0 (Timeout) .. strace on rsync process on server C shows: select(2,NULL, [1], NULL, {29,290000}) = 0 (Timeout) select(2,NULL, [1], NULL, {60, 0}) = 0 (Timeout) select(2,NULL, [1], NULL, {60, 0}) = 0 (Timeout) .. and so it hangs.. without doing anything..
Please see the issues/debugging page for instructions on how to figure out what is going on: http://rsync.samba.org/issues.html Note the recommendation to upgrade to 2.6.3 for its improved error reporting.
(In reply to comment #1) > Please see the issues/debugging page for instructions on how to figure out what > is going on: > > http://rsync.samba.org/issues.html > > Note the recommendation to upgrade to 2.6.3 for its improved error reporting. Hi.. we did extensive reserach on this and tried various alternatives: One interesting thing that we noticed is that if the size of "any one particular" directory is more than 152M, the rsync fails and encounters a hang state. Again, this is only the case with with A->C transfer. A->B Transfer is working fine with the same set of directories. A has rsync 2.6.2 B has rsync 2.5.7 : A->B works C has rsync 2.6.2 : A->C doesnt work for directories > 152m We have checked that this is also not due to some processes running and rsync trying to overwrite that file. This error is encountered for admin/log Are there any specific kernal parameters which need to be set for enabling transfer of huge (read directories with size > some threshold) directories using rsync. It pretty much looks as if some OS Limit is being crossed here. e.g. the ls command needs some maxsize kernal parameter to be able to show file listing if the # of files > some threshold number. any pointers on this will be appreciated. We would really like to use rsync for syncing up code trees for our erp product which we are implementing for our customer. thanks a lot.
Created attachment 797 [details] Command line session of rsync hanging
Created attachment 798 [details] rsync hanging strace output
I am experiencing similar problems on debian woody (linux 2.4.26 and after upgrading to linux 2.4.28). This problem started suddenly after over 100 days of clean daily operation. Rsyncing any two folders of larger size causes rsync to hang and the process waits until it is manually killed. I have reproduced this with rsyncing a remote directory to a local directory and rsyncing two local directories. I have tried rsyncing two local directories on different hard drives with the same result. To test if it was some disk IO error, I've ran md5deep on the directory to be rsynced and it finishes properly. This error occurs with the debian woody version of rsync (2.5.5) and also with the latest rsync downloaded from source (rsync-2.6.3). Rsync always hangs during the first stage when calculating what files to sync. I have attached a sample shell session and strace output.
The cited strace shows that rsync is hanging because of all the verbose messages coming from the receiver aren't getting read by the sender. So, just reduce the verbosity and it should run fine (2.6.3, that is -- I assume that 2.5.5 was hanging for a different reason). I'll check into this to see about resolving the problem, but it may take a while. Finally, a hang bug is quite different from write-failed bug, so your bug, Jeremy, is not related to this bug report's original purpose.
I think I am experiencing the same errors. I am using the backup example on http://rsync.samba.org/examples.html and it gives me the following error after some minutes of work: Read from remote host <remote_host>: Connection reset by peer rsync: writefd_unbuffered failed to write 4 bytes: phase "unknown" [sender]: Broken pipe (32) rsync: connection unexpectedly closed (42088 bytes received so far) [sender] rsync error: error in rsync protocol data stream (code 12) at io.c(359) On the other side I get an error about the broken pipe too. I included the debug information in the attachment. Could you please give me an indication when this problem will be solved? I use rsync for backups, and now I have to sync all of my data manually :(. Thanks!
Created attachment 817 [details] Debug information on 'other' side
To diagnose this bug further, I need a system-call trace of the program that is going away first (not the program that notices the closed pipe because the other program went away).
I fixed the bug that was occurring because of -vvv. If there's still another hang, please re-open this or file a new bug report.
root@xxxx [~]# rsync -avz -e ssh /home/ xxxx@xxxxx:xxxx building file list ... rsync: writefd_unbuffered failed to write 4092 bytes: phase "send_file_entry": Broken pipe rsync error: error in rsync protocol data stream (code 12) at io.c(515) This method works on all of our servers except this one. Is there any way to resolve it?
Strange, got this error with 2.5.7 when the destination "module" was read-only. 2.6.6 prints user-friendly info that the module is not writeable.
(In reply to comment #12) > Strange, got this error with 2.5.7 when the destination "module" was read-only. > 2.6.6 prints user-friendly info that the module is not writeable. Just one of the many bug fixes in the newer versions. This fix is even mentioned on the Issues and Debugging webpage (item #4).
I also had the same problem ocurring randomly on large file transfers between an IDE disk and a disk attached via USB2.0 using RedHat FC4 and rsync version 2.6.4 protocol version 29. Error messages are: "rsync: writefd_unbuffered failed to write 4 bytes: phase "unknown" [sender]: Broken pipe (32) rsync error: timeout in data send/receive (code 30) at io.c(181) rsync: connection unexpectedly closed (241914 bytes received so far) [sender] rsync error: error in rsync protocol data stream (code 12) at io.c(420)" I found by limiting the bandwidth and setting a large timeout that the problem/symptoms went away. ie I added the following switches: --bwlimit=8192 --timeout=600 Note this did not happen on smaller files at all, and would not happen on the same file when I was transferring large ones. Anyway hope that helps some. Regards, Andrew Morris --bwlimit=8192
Created attachment 2309 [details] log error file rsync command and error massage.
George, the first error message "Received disconnect from 20.20.10.250: 2: Corrupted MAC on input" (which was not printed by rsync) indicates pretty clearly that corruption in the network connection, not a bug in rsync, caused the failure.
I experienced the same error when rsync to an external usb ide drive. When it happened, the system hanged. Following Andrew Morris with --bwlimit=8192 --timeout=600 setting got rid of system hanging. However, it causes rsync to abort after 600 seconds. The problem is due to the disk auto spin down after no data transferring to or from the disk for 15 minutes. This can happen when rsync is removing a large file in the disk. "df /disk_mount_point" once every 5 minutes walks around the problem. Hock Seng