Bug 1959 - writefd_unbuffered failed to write 4092 bytes phase send_file_entry broken pipe
Summary: writefd_unbuffered failed to write 4092 bytes phase send_file_entry broken pipe
Status: CLOSED FIXED
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 2.6.3
Hardware: x86 Linux
: P3 normal (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-10-21 14:49 UTC by Gaurav Verma
Modified: 2008-05-13 05:25 UTC (History)
0 users

See Also:


Attachments
Command line session of rsync hanging (100.57 KB, text/plain)
2004-11-19 05:47 UTC, Jeremy Lowery
no flags Details
rsync hanging strace output (397.65 KB, text/plain)
2004-11-19 05:48 UTC, Jeremy Lowery
no flags Details
Debug information on 'other' side (2.46 KB, text/plain)
2004-12-01 17:59 UTC, Maarten
no flags Details
log error file (2.95 KB, patch)
2007-02-25 04:41 UTC, George Buslovitch
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gaurav Verma 2004-10-21 14:49:28 UTC
we are doing rsync from server A -> B and A->C. 

A, C are running on Linux 2.4.9-e-34 and B is running on Linux 2.4.9-e-38

the following rsync command works just fine from A->B:

rsync --exclude-from /local/dba/scripts/shoprod1_exclude_applfiles.txt -rlvz -e
ssh /prod/applmgr/1159/ applprod@twnprod1:/prod/applmgr/1159/

the "same" command doesnt work from A->C:

the following rsync command works just fine from A->B:

rsync --exclude-from /local/dba/scripts/shoprod1_exclude_applfiles.txt -rlvz -e
ssh /prod/applmgr/1159/ applprod@twnprod2:/prod/applmgr/1159/

we get the followin error at different stages, e.g. build stage or after
processing a few files:

rsync: writefd_unbuffered failed to write 4092 bytes: phase "send_file_name":
broken Pipe
rsync error: error in rsync protocol data stream (code 12) at io.c(836)

we are using rsync2.6.2 protocol version 28.

Im at a loss to understand why this is happening.

I already tried -vv option -> same error comes after processing some files from
the exclusion file

If I use the -vvv option, it hangs on a particular command like this on server A:

[sender] make_file(per/11.5.0/help/US/puploadw.htm,*,2)
[sender] make_file(per/11.5.0/help/US/puplorgd.htm,*,2)
--> this is where it hangs..

doing a strace on the rsync process on server A shows :

select(5,NULL, [4], NULL, {16,970000}) = 0 (Timeout)
select(5,NULL, [4], NULL, {60, 0}) = 0 (Timeout)
select(5,NULL, [4], NULL, {60, 0}) = 0 (Timeout)
..

strace on rsync process on server C shows:

select(2,NULL, [1], NULL, {29,290000}) = 0 (Timeout)
select(2,NULL, [1], NULL, {60, 0}) = 0 (Timeout)
select(2,NULL, [1], NULL, {60, 0}) = 0 (Timeout)
..

and so it hangs.. without doing anything..
Comment 1 Wayne Davison 2004-10-21 15:19:44 UTC
Please see the issues/debugging page for instructions on how to figure out what
is going on:

http://rsync.samba.org/issues.html

Note the recommendation to upgrade to 2.6.3 for its improved error reporting.
Comment 2 Gaurav Verma 2004-10-22 07:20:59 UTC
(In reply to comment #1)
> Please see the issues/debugging page for instructions on how to figure out what
> is going on:
> 
> http://rsync.samba.org/issues.html
> 
> Note the recommendation to upgrade to 2.6.3 for its improved error reporting.


Hi.. we did extensive reserach on this and tried various alternatives:

One interesting thing that we noticed is that if the size of "any one
particular" directory is more than 152M, the rsync fails and encounters a hang
state. 

Again, this is only the case with with A->C transfer. 
A->B Transfer is working fine with the same set of directories.

A has rsync 2.6.2
B has rsync 2.5.7 : A->B works
C has rsync 2.6.2 : A->C doesnt work for directories > 152m

We have checked that this is also not due to some processes running and rsync
trying to overwrite that file.

This error is encountered for admin/log 

Are there any specific kernal parameters which need to be set for enabling
transfer of huge (read directories with size > some threshold) directories using
rsync. It pretty much looks as if some OS Limit is being crossed here.

e.g. the ls command needs some maxsize kernal parameter to be able to show file
listing if the # of files > some threshold number.

any pointers on this will be appreciated. We would really like to use rsync for
syncing up code trees for our erp product which we are implementing for our
customer.

thanks a lot.
Comment 3 Jeremy Lowery 2004-11-19 05:47:43 UTC
Created attachment 797 [details]
Command line session of rsync hanging
Comment 4 Jeremy Lowery 2004-11-19 05:48:14 UTC
Created attachment 798 [details]
rsync hanging strace output
Comment 5 Jeremy Lowery 2004-11-19 05:50:15 UTC
I am experiencing similar problems on debian woody (linux 2.4.26 and after
upgrading to linux 2.4.28). This problem started suddenly after over 100 days of
clean daily operation.

Rsyncing any two folders of larger size causes rsync to hang and the process
waits until it is manually killed. I have reproduced this with rsyncing a remote
directory to a local directory and rsyncing two local directories. I have tried
rsyncing two local directories on different hard drives with the same result. To
test if it was some disk IO error, I've ran md5deep on the directory to be
rsynced and it finishes properly. This error occurs with the debian woody
version of rsync (2.5.5) and also with the latest rsync downloaded from source
(rsync-2.6.3). 

Rsync always hangs during the first stage when calculating what files to sync.
I have attached a sample shell session and strace output.
Comment 6 Wayne Davison 2004-11-19 10:13:56 UTC
The cited strace shows that rsync is hanging because of all the verbose messages
coming from the receiver aren't getting read by the sender.  So, just reduce the
verbosity and it should run fine (2.6.3, that is -- I assume that 2.5.5 was
hanging for a different reason).

I'll check into this to see about resolving the problem, but it may take a while.

Finally, a hang bug is quite different from write-failed bug, so your bug,
Jeremy, is not related to this bug report's original purpose.
Comment 7 Maarten 2004-12-01 17:56:49 UTC
I think I am experiencing the same errors. I am using the backup example on
http://rsync.samba.org/examples.html and it gives me the following error after
some minutes of work:

Read from remote host <remote_host>: Connection reset by peer
rsync: writefd_unbuffered failed to write 4 bytes: phase "unknown" [sender]:
Broken pipe (32)
rsync: connection unexpectedly closed (42088 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(359)

On the other side I get an error about the broken pipe too. I included the debug
information in the attachment.

Could you please give me an indication when this problem will be solved? I use
rsync for backups, and now I have to sync all of my data manually :(.

Thanks!
Comment 8 Maarten 2004-12-01 17:59:50 UTC
Created attachment 817 [details]
Debug information on 'other' side
Comment 9 Wayne Davison 2005-02-27 13:54:23 UTC
To diagnose this bug further, I need a system-call trace of the program that is
going away first (not the program that notices the closed pipe because the other
program went away).
Comment 10 Wayne Davison 2005-03-18 19:44:36 UTC
I fixed the bug that was occurring because of -vvv.  If there's still another
hang, please re-open this or file a new bug report.
Comment 11 ben 2005-04-21 08:46:55 UTC
root@xxxx [~]# rsync -avz -e ssh /home/ xxxx@xxxxx:xxxx
building file list ... rsync: writefd_unbuffered failed to write 4092 bytes: 
phase "send_file_entry": Broken pipe
rsync error: error in rsync protocol data stream (code 12) at io.c(515)

This method works on all of our servers except this one.

Is there any way to resolve it?
Comment 12 Wejn 2005-08-24 05:19:15 UTC
Strange, got this error with 2.5.7 when the destination "module" was read-only.
2.6.6 prints user-friendly info that the module is not writeable.
Comment 13 Wayne Davison 2005-08-24 09:41:07 UTC
(In reply to comment #12)
> Strange, got this error with 2.5.7 when the destination "module" was read-only.
> 2.6.6 prints user-friendly info that the module is not writeable.

Just one of the many bug fixes in the newer versions.  This fix is even
mentioned on the Issues and Debugging webpage (item #4).
Comment 14 Andrew Morris 2006-02-24 08:39:59 UTC
I also had the same problem ocurring randomly on large file transfers between an IDE disk and a disk attached via USB2.0 using RedHat FC4 and rsync version 2.6.4  protocol version 29.
Error messages are:
"rsync: writefd_unbuffered failed to write 4 bytes: phase "unknown" [sender]: Broken pipe (32)
rsync error: timeout in data send/receive (code 30) at io.c(181)
rsync: connection unexpectedly closed (241914 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(420)"

I found by limiting the bandwidth and setting a large timeout that the problem/symptoms went away.  ie I added the following switches:
  --bwlimit=8192 --timeout=600

Note this did not happen on smaller files at all, and would not happen on the same file when I was transferring large ones. Anyway hope that helps some.
Regards,
Andrew Morris

--bwlimit=8192
Comment 15 George Buslovitch 2007-02-25 04:41:48 UTC
Created attachment 2309 [details]
log error file

rsync command and error massage.
Comment 16 Matt McCutchen 2007-02-25 10:41:30 UTC
George, the first error message "Received disconnect from 20.20.10.250: 2: Corrupted MAC on input" (which was not printed by rsync) indicates pretty clearly that corruption in the network connection, not a bug in rsync, caused the failure.
Comment 17 Hock Seng 2008-05-13 05:25:37 UTC
I experienced the same error when rsync to an external usb ide drive. When it happened, the system hanged. Following Andrew Morris with --bwlimit=8192 --timeout=600 setting got rid of system hanging. However, it causes rsync to abort after 600 seconds.

The problem is due to the disk auto spin down after no data transferring to or from the disk for 15 minutes. This can happen when rsync is removing a large file in the disk. "df /disk_mount_point" once every 5 minutes walks around the problem.

Hock Seng