Bug 2328 - cygwin rsync hangs when initiated remotely after transfering some files
Summary: cygwin rsync hangs when initiated remotely after transfering some files
Status: CLOSED LATER
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 2.6.3
Hardware: x86 Linux
: P3 normal (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-02-08 20:31 UTC by cfinley
Modified: 2006-12-19 21:14 UTC (History)
3 users (show)

See Also:


Attachments
rsync compile on cygwin with HAVE_SOCKETPAIR commented from config.h (3.04 KB, text/plain)
2005-03-10 16:05 UTC, cfinley
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description cfinley 2005-02-08 20:31:52 UTC
I am trying to backup a Windows XP SP2 workstation to a Debian GNU/Linux server
using ssh, cygwin and rsync.

It appears the XP rsync locks up after transferring some files (it transfers a
few more files each time it is attempted) but hangs every time. The XP rsync
process must be killed manually, even if I cancel the rsync on the Linux side.

The Linux account can ssh into XP using client keys.
I can scp the entire source data without any trouble.
No XP event log entries (except key authentication & service start-up).

The Linux initiated rsync command:
rsync -vrte ssh --stats --progress xpuser@xpsource:/cygdrive/c/Data/
/home/usershare/xpuser/

I can initiate the rsync from the XP machine and it runs smoothly:
rsync -vrte ssh --stats --progress /cygdrive/c/Data/
debuser@destin:/home/usershare/xpuser/

Debian Sarge Destination:
Linux 2.4.26-1-686 
rsync  version 2.6.3  protocol version 28
OpenSSH_3.8.1p1 Debian-8.sarge.4, OpenSSL 0.9.7e 25 Oct 2004

Windows XP Pro SP2 Source:
Athlon 64 3200+
cygwin 1.5.12-1
cygrunsrv 1.0-1
rsync 2.6.3-1
openssh 3.9p1-2

The rsync-debug script dies immediately without transferring any files:
protocol version mismatch - is your shell clean?
(see the rsync man page for an explanation)
rsync error: protocol incompatibility (code 2) at compat.c(60)

The log file on XP contains:
rsync: writefd_unbuffered failed to write 4 bytes: phase "unknown" [sender]:
Broken pipe (32)
rsync error: error in rsync protocol data stream (code 12) at
/home/lapo/packaging/tmp/rsync-2.6.3/io.c(909)

This created a zero length out.dat file:
ssh xpuser@xpsource /bin/true > out.dat

Searching Google, this seemed to be the closest match:
http://www.linuxquestions.org/questions/history/265520

any help?

Your time and expertise are greatly appreciated. Please forgive my newb-ness.
Let me know if I should try anything else.

Chris
Comment 1 Wayne Davison 2005-02-25 17:02:21 UTC
If you could try commenting out HAVE_SOCKETPAIR from config.h and re-compiling
rsync, it would be nice to know if that makes rsync stop hanging.  If you don't
have the cygwin source, you should be able to use their setup.exe tool to grab
it and build it using their patches (such as the one to open temp files in
binary mode).
Comment 2 Sukotto 2005-03-05 20:19:56 UTC
(In reply to comment #0)
See also http://www.cygwin.com/ml/cygwin/2003-10/msg00129.html
Comment 3 cfinley 2005-03-10 16:01:56 UTC
After commenting out HAVE_SOCKETPAIR, the behavior remained similar: rsync
started remotely over ssh transfered a few files and then froze. It looks like
it transfered the file differences still.

runtests.sh failed on the deamon test while compiling. Attached is the compile
output. I just tried initiating rsync using ssh.

(In reply to comment #1)
> If you could try commenting out HAVE_SOCKETPAIR from config.h and re-compiling
[...snip...]
Comment 4 cfinley 2005-03-10 16:05:26 UTC
Created attachment 1024 [details]
rsync compile on cygwin with HAVE_SOCKETPAIR commented from config.h
Comment 5 cfinley 2005-03-10 16:19:07 UTC
What is working:
I have rsyncd running from a startup batch file on Windows/cygwin:
c:\cygwin\bin\rsync.exe --config=/cygdrive/c/cygwin/etc/rsyncd/rsyncd.conf
--daemon --no-detach --address localhost

SSHD is running as a service using cygrunsrv.

Trying to run rsyncd as a service gives me "event: rsyncd : PID 1468 : starting
service `rsyncd' failed: signal 11 raised." - is there a way to get better
information on the error?

rsync on the Linux backup server can use the SSH tunnel to the windows machine
connecting to the rsyncd deamon and successfully backup the "Module". It works
on about 3GB of files (fails on the Windows registry files as suspected).
Comment 6 Jim Kleckner 2005-03-17 00:22:15 UTC
My belief is that rsync over ssh is tickling a deadlock
race condition in cygwin.

See this message and trace it backwards for more context:
 http://cygwin.com/ml/cygwin-patches/2005-q1/msg00015.html

I have recently re-volunteered to the author to help out getting his
patches tested but have heard nothing yet.

If this is the cause, then it is deep in the cygwin
interaction with some ill-defined system calls for queues.

Note that one possible workaround is to "push" from the
Windows system rather than to "pull" it from another system,
although this is not always possible because of firewalls.
Comment 7 Wayne Davison 2005-03-17 02:11:05 UTC
I certainly suspected that this was a problem in the cygwin pipe/socketpair
handling.  Thanks for the extra testing Chris, and for the work on getting this
fixed in cygwin Jim!
Comment 8 Wayne Davison 2005-03-18 19:40:28 UTC
I'm marking the cygwin hang bugs as "LATER" because this is a bug is in the
cygwin pipe code, so it is outside rsync's control.  We'll revisit this issue
later after we hear that the cygwin code has been fixed.

I wonder if specifying a --bwlimit might work around the problem by ensuring
that the pipes can't fill up enough to deadlock.  While we're waiting for a
cygwin fix, give that a try.
Comment 9 cfinley 2005-03-18 23:35:10 UTC
I tried the bandwidth limit option with the same lock-up behavior. I think I had
the setting down to 8 (8KB/s?).

Is there a thread or bug reference with CYGWIN?
Comment 10 Jim Kleckner 2005-03-22 11:26:05 UTC
(In reply to comment #9)

> Is there a thread or bug reference with CYGWIN?

The previously mentioned link is the best reference:
 http://cygwin.com/ml/cygwin-patches/2005-q1/msg00015.html 
Cygwin seems to only sort of use bugzilla.  The community
prefers to use the various mailing lists to track and work
things out.
Comment 11 Mike 2005-04-13 06:45:10 UTC
The legendary cygwin/rsync/ssh hang problem.  I have been tracking this for a
while now and can say that the latest cygwin install appears to have fixed the
problem on one of the setups that has consistantly failed in the past.  Have not
put the update on to any production boxes yet, but it looks promising.  From the
threads that I have read on the cygwin mailing lists, it would seem that a pipe
problem in the cygwin1.dll has been resolved (non-blocking pipes that blocked?)
The relevant cygcheck -s info:

$ cygcheck -s

Cygwin Configuration Diagnostics
Current System Time: Wed Apr 13 14:30:19 2005

Windows XP Home Edition Ver 5.1 Build 2600 Service Pack 2
.
.
    Cygwin DLL version info:
        DLL version: 1.5.14
        DLL epoch: 19
        DLL bad signal mask: 19005
        DLL old termios: 5
        DLL malloc env: 28
        API major: 0
        API minor: 126
        Shared data: 4
        DLL identifier: cygwin1
        Mount registry: 2
        Cygnus registry name: Cygnus Solutions
        Cygwin registry name: Cygwin
        Program options name: Program Options
        Cygwin mount registry name: mounts v2
        Cygdrive flags: cygdrive flags
        Cygdrive prefix: cygdrive prefix
        Cygdrive default prefix:
        Build date: Fri Apr 1 13:40:00 EST 2005
        Shared id: cygwin1S4
.
.
rsync                2.6.3-1
.
.
openssh              4.0p1-1


Regards,
Mike
Comment 12 cfinley 2005-04-19 10:28:09 UTC
(In reply to comment #11)
Have you had a chance to further test rsync with cygwin?
I just updated cygwin on a Windows XP Pro machine and tried several times to
initiate rsync over SSH from Debian Sarge; unfortunately, the transfer still
hangs. I can use a SSH tunnel initiated from Debian and then start a sync using
the rsyncd (deamon) running on WinXP.

    Cygwin DLL version info:
        DLL version: 1.5.14
        DLL epoch: 19
        DLL bad signal mask: 19005
        DLL old termios: 5
        DLL malloc env: 28
        API major: 0
        API minor: 126
        Shared data: 4
        DLL identifier: cygwin1
        Mount registry: 2
        Cygnus registry name: Cygnus Solutions
        Cygwin registry name: Cygwin
        Program options name: Program Options
        Cygwin mount registry name: mounts v2
        Cygdrive flags: cygdrive flags
        Cygdrive prefix: cygdrive prefix
        Cygdrive default prefix:
        Build date: Fri Apr 1 13:40:00 EST 2005
        Shared id: cygwin1S4

openssh              4.0p1-1

rsync                2.6.3-1

-- 
Chris Finley
Comment 13 Steve Graham 2005-06-15 01:36:38 UTC
(In reply to comment #0)

I have windows XP Professional installed ( new installation last week) with the 
rsync 2.6.3. Before the rebuild I was using rsync OK ( sorry, dont know which 
version!) from XP to Solaris rsync daemon ( version 2.5.5 ) with no problems.

I now get the same lock up/ hang etc. as Chris describes but the only way out 
is to power off the XP machine, killing the rsync doesn't help since it appears 
that the network connection has been trashed.

The command is a simple: rsync -va <filename> <machine>::Dir/.

rsync works OK when I mount the remote drive locally ( destination 
is /cygdrive/h/Dir/.) or if I use <username>@<machine>:/home/<machine>/Dir/. - 
this goes through ssh with no problems.