Bug 3488 - writefd_unbuffered failed to write 4096 bytes: phase "unknown" [generator]: Broken pipe (32)
Summary: writefd_unbuffered failed to write 4096 bytes: phase "unknown" [generator]: B...
Status: CLOSED INVALID
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 2.6.6
Hardware: Sparc Solaris
: P3 major (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-02-06 12:11 UTC by Vivien Wu
Modified: 2006-03-12 02:56 UTC (History)
0 users

See Also:


Attachments
Truss output. (621.80 KB, text/plain)
2006-02-06 12:23 UTC, Vivien Wu
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Vivien Wu 2006-02-06 12:11:17 UTC
Problem:
The fileset (file list) is large and it seems the rsync is hung somewhere with write error; however, if rsync the hung-file individually, it will go through fine.

Both local and remote machines are in Solaris 9, rsync 2.6.6. 
However, the problems are also reproceable in rsync 2.5.6 and 2.6.3.

Use the rsync-debug script as instructed. And the detail truss output is attached.

(last 100+ lines)
19763:	lstat64("armstrong/.armstrong4-3_fdi.txt.z__.c1ayMM", 0xFFBFE910) = 0
19763:	time()						= 1139251239
19763:	utime("armstrong/.armstrong4-3_fdi.txt.z__.c1ayMM", 0xFFBFE87C) = 0
19763:	lchown("armstrong/.armstrong4-3_fdi.txt.z__.c1ayMM", 1000, 200) = 0
19763:	chmod("armstrong/.armstrong4-3_fdi.txt.z__.c1ayMM", 0444) = 0
19760:	read(0, " S T R I N G _ T Y P E _".., 263160)	= 263160
19760:	read(0, " R I N G _ M C F _ N A M".., 263160)	= 263160
19760:	poll(0xFFBFD820, 2, 60000)			= 1
19760:	write(1, "FC0F\007\0\0D4 lD1FA19D7".., 4096)	= 4096
19760:	time()						= 1139251239
19760:	read(0, " O O L _ I S _ G N D _ P".., 263160)	= 263160
19763:	rename("armstrong/.armstrong4-3_fdi.txt.z__.c1ayMM", "armstrong/armstrong4-3_fdi.txt.z__") = 0
19763:	open64("armstrong/armstrong4-3_nlspccache.txt.z_+", O_RDONLY) Err#2 ENOENT
19763:	getpid()					= 19763 [19760]
19763:	lstat64("armstrong/.armstrong4-3_nlspccache.txt.z_+.d1ayMM", 0xFFBFE870) Err#2 ENOENT
19760:	read(0, " o l u m n   I / O "\r\n".., 18766)	= 18766
19760:	close(0)					= 0
19763:	open64("armstrong/.armstrong4-3_nlspccache.txt.z_+.d1ayMM", O_RDWR|O_CREAT|O_EXCL, 0600) = 1
19763:	fchmod(1, 0600)					= 0
19760:	lstat64("armstrong/armstrong4fbga1508.pk+", 0xFFBFF668) = 0
19760:	open64("armstrong/armstrong4fbga1508.pk+", O_RDONLY) = 0
19763:	write(1, " S J _ 2 0 0 6 : 0 2 : 0".., 208)	= 208
19763:	close(1)					= 0
19763:	lstat64("armstrong/.armstrong4-3_nlspccache.txt.z_+.d1ayMM", 0xFFBFE910) = 0
19763:	time()						= 1139251239
19763:	utime("armstrong/.armstrong4-3_nlspccache.txt.z_+.d1ayMM", 0xFFBFE87C) = 0
19763:	lchown("armstrong/.armstrong4-3_nlspccache.txt.z_+.d1ayMM", 1000, 200) = 0
19763:	chmod("armstrong/.armstrong4-3_nlspccache.txt.z_+.d1ayMM", 0640) = 0
19760:	read(0, " T O _ 2 0 0 5 : 1 0 : 2".., 3491)	= 3491
19760:	close(0)					= 0
19763:	rename("armstrong/.armstrong4-3_nlspccache.txt.z_+.d1ayMM", "armstrong/armstrong4-3_nlspccache.txt.z_+") = 0
19760:	lstat64("armstrong/armstrong4fbga1508.pk_", 0xFFBFF668) = 0
19763:	open64("armstrong/armstrong4-3_nlspccache.txt.z__", O_RDONLY) Err#2 ENOENT
19760:	open64("armstrong/armstrong4fbga1508.pk_", O_RDONLY) = 0
19763:	getpid()					= 19763 [19760]
19763:	lstat64("armstrong/.armstrong4-3_nlspccache.txt.z__.e1ayMM", 0xFFBFE870) Err#2 ENOENT
19763:	open64("armstrong/.armstrong4-3_nlspccache.txt.z__.e1ayMM", O_RDWR|O_CREAT|O_EXCL, 0600) = 1
19763:	fchmod(1, 0400)					= 0
19763:	poll(0xFFBFD148, 1, 60000)			= 1
19763:	read(0, " ) R1E95F396 VB5 G9ABD\r".., 8184)	= 8184
19763:	poll(0xFFBFD148, 1, 60000)			= 1
19763:	read(0, "F2ACE3 EF5 tD9 M %ED05EF".., 8184)	= 8184
19760:	read(0, "E1\01615 P o l y t r o n".., 263552)	= 263552
19763:	poll(0xFFBFD148, 1, 60000)			= 1
19763:	read(0, " wFDE7 BB41F9F87AA86B882".., 8184)	= 8184
19763:	poll(0xFFBFD148, 1, 60000)			= 1
19763:	read(0, "01A7C0 HD2   +03 Y HF2 A".., 8184)	= 8184
19763:	poll(0xFFBFD148, 1, 60000)			= 1
19763:	read(0, " pA2CCD9B1A01B - ,B1 ] J".., 8184)	= 8184
19763:	poll(0xFFBFD148, 1, 60000)			= 1
19763:	read(0, "14 9D5F6BDAC f 7 v8A10 Y".., 8184)	= 6375
19760:	read(0, "\r\n\t B O O L _ I S _ V".., 263552)	= 263552
19760:	poll(0xFFBFD828, 2, 60000)			= 1
19760:	write(1, "FC0F\007BF\nCA99B714B3 %".., 4096)	= 4096
19760:	time()						= 1139251239
19760:	read(0, "1214 c011B\0\n s8001B1B3".., 263552)	= 263552
19760:	read(0, " T Y P E _ N A M E   =  ".., 263552)	= 263552
19760:	read(0, " G _ X 1 8 _ D Q S _ P A".., 263552)	= 263552
19760:	read(0, " _ I D   =   - 1\r\n\t I".., 263552)	= 263552
19760:	read(0, "   - 1\r\n\t I N T _ C O".., 263552)	= 263552
19760:	poll(0xFFBFD828, 2, 60000)			= 1
19760:	write(1, "FC0F\00796C3D611F5\f nC2".., 4096)	= 4096
19760:	time()						= 1139251239
19760:	read(0, " X 3 6 _ D Q S _ P A D _".., 263552)	= 263552
19760:	read(0, " R E S P O N D I N G _ X".., 263552)	= 263552
19760:	read(0, "   f a l s e\r\n\t B O O".., 263552)	= 263552
19760:	read(0, " R E F   =   f a l s e\r".., 263552)	= 263552
19760:	read(0, " 1 5 9\r\n\t B O O L _ I".., 263552)	= 263552
19763:	poll(0xFFBFD148, 1, 60000)	(sleeping...)
19760:	poll(0xFFBFD828, 2, 60000)	(sleeping...)
19763:	poll(0xFFBFD148, 1, 60000)			= 1
19763:	read(0, "DF", 8184)				= 1
19763:	poll(0xFFBFD148, 1, 60000)	(sleeping...)
19763:	poll(0xFFBFD148, 1, 60000)			= 1
19763:	read(0, "\f\f U q0599AA19FF W _07".., 8184)	= 3152
19763:	poll(0xFFBFD148, 1, 60000)	(sleeping...)
19763:	poll(0xFFBFD148, 1, 60000)			= 1
19763:	read(0, "FF89BFF29FFB /E714F7 _FD".., 8184)	= 4608
19763:	poll(0xFFBFD148, 1, 60000)	(sleeping...)
19763:	poll(0xFFBFD148, 1, 60000)			= 1
19763:	read(0, " m9C 6990E B88891B '\nF7".., 8184)	= 2048
19763:	poll(0xFFBFD148, 1, 60000)	(sleeping...)
19763:	poll(0xFFBFD148, 1, 60000)			= 1
19763:	read(0, "16 186C4 eD7 ,\n g1B BE6".., 8184)	= 1537
19763:	poll(0xFFBFD140, 1, 60000)	(sleeping...)
19760:	poll(0xFFBFD828, 2, 60000)			= 0
19760:	poll(0xFFBFD828, 2, 60000)	(sleeping...)
19763:	poll(0xFFBFD140, 1, 60000)			= 1
19763:	read(0, "7F19D784 } ~03 d o }19\f".., 8184)	= 2048
19763:	poll(0xFFBFD148, 1, 60000)	(sleeping...)
19760:	poll(0xFFBFD828, 2, 60000)			= 0
19760:	poll(0xFFBFD828, 2, 60000)	(sleeping...)
19763:	poll(0xFFBFD148, 1, 60000)			= 1
19763:	read(0, " UE1 / r @19 .E3 ^ S & m".., 8184)	= 2048
19763:	poll(0xFFBFD148, 1, 60000)	(sleeping...)
19760:	poll(0xFFBFD828, 2, 60000)			= 0
19760:	poll(0xFFBFD828, 2, 60000)	(sleeping...)
19763:	poll(0xFFBFD148, 1, 60000)			= 1
19763:	read(0, "10 / f99 K kF4 5 XA4 U\0".., 8184)	= 2048
19763:	poll(0xFFBFD148, 1, 60000)	(sleeping...)
19760:	poll(0xFFBFD828, 2, 60000)			= 0
19760:	poll(0xFFBFD828, 2, 60000)	(sleeping...)
19763:	poll(0xFFBFD148, 1, 60000)			= 0
19763:	poll(0xFFBFD148, 1, 60000)			= 1
19763:	read(0, " 2BB 6\n WD1F4 Y ^ U ( /".., 8184)	= 2048
19763:	poll(0xFFBFD148, 1, 60000)	(sleeping...)
19760:	poll(0xFFBFD828, 2, 60000)			= 0
19760:	poll(0xFFBFD828, 2, 60000)	(sleeping...)
19763:	poll(0xFFBFD148, 1, 60000)			= 1
19763:	read(0, "B3 b K x n8FE9E095F2E6 u".., 8184)	= 2048
19763:	poll(0xFFBFD148, 1, 60000)	(sleeping...)
19760:	poll(0xFFBFD828, 2, 60000)			= 0
19760:	poll(0xFFBFD828, 2, 60000)	(sleeping...)
19763:	poll(0xFFBFD148, 1, 60000)			= 1
19763:	read(0, "D4 G 8D89B0FB9 +BD M10E1".., 8184)	= 2048
19763:	poll(0xFFBFD148, 1, 60000)	(sleeping...)
19760:	poll(0xFFBFD828, 2, 60000)			= 0
19760:	poll(0xFFBFD828, 2, 60000)	(sleeping...)
19763:	poll(0xFFBFD148, 1, 60000)			= 1
19763:	read(0, " B\bDA f U k z ZB8 C 3 |".., 8184)	= 1536
19763:	poll(0xFFBFD148, 1, 60000)	(sleeping...)
19760:	poll(0xFFBFD828, 2, 60000)			= 0
19760:	poll(0xFFBFD828, 2, 60000)	(sleeping...)
19760:	poll(0xFFBFD828, 2, 60000)			= 1
19763:	poll(0xFFBFD148, 1, 60000)			= 1
19760:	write(1, "FC0F\007B7C71207 iE9 mCA".., 4096)	Err#32 EPIPE
19760:	    Received signal #13, SIGPIPE [ignored]
19763:	read(0, 0x000F00A8, 8184)			Err#145 ETIMEDOUT
19760:	fstat64(2, 0xFFBFC680)				= 0
19763:	write(5, " .\0\0\b r s y n c :   r".., 50)	= 50
rsync: writefd_unbuffered failed to write 4096 bytes: phase "unknown" [generator]: Broken pipe (32)
19763:	sigaction(SIGUSR1, 0xFFBFD028, 0xFFBFD0A8)	= 0
19760:	write(2, " r s y n c :   w r i t e".., 100)	= 100
19763:	sigaction(SIGUSR2, 0xFFBFD028, 0xFFBFD0A8)	= 0
19760:	sigaction(SIGUSR1, 0xFFBFD738, 0xFFBFD7B8)	= 0
19760:	sigaction(SIGUSR2, 0xFFBFD738, 0xFFBFD7B8)	= 0
19760:	getpid()					= 19760 [19759]
19760:	kill(19763, SIGUSR1)				= 0
rsync error: error in rsync protocol data stream (code 12) at io.c(1099)
19760:	write(2, " r s y n c   e r r o r :".., 73)	= 73
19760:	_exit(12)
19763:	unlink("armstrong/.armstrong4-3_nlspccache.txt.z__.e1ayMM") = 0
19763:	    Received signal #16, SIGUSR1 [ignored]
19763:	      siginfo: SIGUSR1 pid=19760 uid=0
19763:	write(5, " H\0\0\b r s y n c   e r".., 76)	Err#32 EPIPE
19763:	    Received signal #13, SIGPIPE [ignored]
19763:	_exit(12)
Comment 1 Vivien Wu 2006-02-06 12:23:24 UTC
Created attachment 1722 [details]
Truss output.
Comment 2 Wayne Davison 2006-02-06 14:34:28 UTC
(In reply to comment #0)
> The fileset (file list) is large and it seems the rsync is hung
> somewhere with write error

I'm assuming by "hung" you mean more of a "hang up", as in problem-causing item where rsync died, rather than "hung" as in the process keeps running but stops doing anything useful.

The truss output shows 2 processes on the server side which discover that the remote connection has closed, so there's nothing going wrong with rsync. If the client side is seeing a similar error, this just means that the network connection failed and both sides noticed it.
Comment 3 Vivien Wu 2006-02-06 15:09:48 UTC
>I'm assuming by "hung" you mean more of a "hang up", as in problem-causing item
>where rsync died, rather than "hung" as in the process keeps running but stops
>doing anything useful.

It's more of the latter. The rsync from the sender did not die immediately while the receiver (the host which generated the truss output) had already closed the connection. The sender stayed alive but doing nothing until it reached the timeout (3600). Btw, the sender and receiver is linked by WAN.. so maybe it's the WAN link which drops the link? I'll check.. thanks.

Comment 4 Vivien Wu 2006-02-10 12:26:31 UTC
(In reply to comment #3)

We had the MIS turned off the hardware compression unit for the US-UK link and the problem went away. Turned it back on and it failed again. So it's probably the WAN quality and probably has nothing to do with rsync itself. 


> It's more of the latter. The rsync from the sender did not die immediately
> while the receiver (the host which generated the truss output) had already
> closed the connection. The sender stayed alive but doing nothing until it
> reached the timeout (3600). Btw, the sender and receiver is linked by WAN.. so
> maybe it's the WAN link which drops the link? I'll check.. thanks.