Sometime backup is done, but freqently not. rsync daemon 2.6.6 is insnstaled @ linux (PLD) with kernel 2.4.31. rsync client 1.6.6 is instaled @ Linux (PLD) with kernel 2.4.28. client connected to server by 100Mb lan.
Created attachment 1357 [details] strace -f /usr/local/rsync-2.6.6/bin/rsync
Think I am experiencing the same here. @szczur: Did it work for you with another version of rsync? Are you using an stunnel between the machines or anything? Is the rsync-receiver or the sender the problem?
Created attachment 1464 [details] Backtrace and syscall of client+server This is the backtrace from rsync on client and server using rsync 2.6.6-2 as currently shipped by Fedora-development - also using the separate debuginfo-package they provide: http://download.fedora.redhat.com/pub/fedora/linux/core/development/i386/debug/
See attached backtrace. Client+server are hung in a select-syscall. The sync is currently runnign through an stunnel for security-reasons, but a colleague said the same occured even without stunnel running. Killing the sync and restarting it a few times, finally makes the sync go through step by step. It seems that it has got to do something with the number of files which need to be synced or similar. The file which is synced when the hang occurs is not "locked" or anything - it's a static image that was created hours ago in this case. PS: Please update version-number of this bug to 2.6.6!
Think, we observe the same problem here. I implemented a compilerserver since our compiler is dongled. Source files (mainly *.c, *.h and *.asm) are synced with the server (Win XP, SP2, Cygwin, rsync 2.6.3) before the Makefile is invoked there via ssh. So far, everything works fine. Afterwards, the generated files (mainly *.o, *.d, *.lst and *.hex) shall be synced back to the client (Win XP SP2, Cygwin with rsync 2.6.6) I tried different versions, ssh tunnel (single :) and rsync server (doble :), the problem remains the same. The server process hangs. I have to kill it with 'kill -9 <pid> and after that, the client prints the following error message: rsync: read error: Connection reset by peer (104) rsync error: error in rsync protocol data stream (code 12) at /home/lapo/packaging/tmp/rsync-2.6.6/io.c(584) rsync: connection unexpectedly closed (8157 bytes received so far) [generator] rsync error: error in rsync protocol data stream (code 12) at /home/lapo/packaging/tmp/rsync-2.6.6/io.c(434) Error: Communication channel to compilerserver crashed. The sync finally goes through when I repeat the back-syncing sequence several times, thus killing the server process and invoking the sync again.
Created attachment 1495 [details] Simple shellscript for reproducing the bug This simple script reproduces the problem reliably (at least on our machines here). Once started, the script logs in via ssh to the server and generates there 500 files with a fixed size of 5000 bytes. After that, rsync is invoked to sync these files with the client machine. The sync-process crashes when it is run for the second time, thus when files with the same name already exist on the client. Strangly enough, the number of the file where the crash occurs is somewhere between 100 and 200, if I reduce the amount of files to sync from 500 to 200, however, everything works fine.
Tried the shell-script here: Creating files, syncing, creating files with other content, syncing - all fine. Even with 2500 instead of 500 files. Too bad :-( Any possibility too track it down further with the backtrace or so?
Created attachment 1497 [details] rsync server configuration
Created attachment 1498 [details] rsync server log
Created attachment 1499 [details] Shellscript using rsync server
I tried the same script with our Linux server as rsync server (rsync 2.6.0). In this constellation, everything is ok. So maybe, I am confronted with a pure Cygwin problem that is different from the original bug 2957 that has been reported on a Linux system. Do you have the possibility to reproduce it with Cygwin or shall I report the problem in the cygwin list? Anyway, the problem is not independent of timing. I wrote yesterday that the script has to be invoked twice (assuming testdir and its contents don't exist), thus the files have to be overwritten on the client. I just switched to rsync server mode (::) and activated debugging messages with '-vvvvvvvvvv'. Now, it gets stuck already in the first run. > Any possibility too track it down further with the backtrace or so? I'm not familiar with cygwin debugging. The cygwin 'ps' command shows an 'O' which means that rsync gets stuck in an 'Output' operation. $ ps PID PPID PGID WINPID TTY UID STIME COMMAND 3796 3196 3796 3364 1 1013 16:15:11 /usr/bin/bash O 1588 2436 1588 2944 ? 1013 16:16:48 /usr/bin/rsync 3408 3796 3408 3200 1 1013 16:17:24 /usr/bin/ps Invoking 'strace' then just shows $ strace -p 1588 Attached to pid 1588 (windows pid 2944) 11 11 [unknown (0x108)] rsync 1588 _cygtls::remove: wait 0x0 Don't know if this is useful. Please tell me what to do, if necessary.
(In reply to comment #1) > Created an attachment (id=1357) [edit] > strace -f /usr/local/rsync-2.6.6/bin/rsync This strace shows that the sender is just waiting to send some data down the socket, which means that the receiver hasn't cleared the socket to allow more data to be written. Thus, we need to see what the receiver is/was doing. If the receiving side is the server, you should run the daemon like this: strace -f rsync --daemon --no-detach If you run that inside a "script" session, all the strace output will be logged.
(In reply to comment #3) > Created an attachment (id=1464) [edit] > Backtrace and syscall of client+server There are 3 rsync processes involved in a transfer: two on the receiving side, and one on the sending side. We'd need to see strace info for all 3 to get the full information for what's happening.
(In reply to comment #11) > So maybe, I am confronted with a pure > Cygwin problem that is different from the original bug 2957 that has been > reported on a Linux system. Anytime a remote-shell is used in a Cygwin scenario, you are subject to Cygwin's pipe-data bug where data can potentially be lost. As soon as they fix this, the hangs should stop. Until then, you can use daemon mode directly (not daemon mode over ssh).
Over here no cygwin is involved. Only rsync over an stunnel using xinetd. Colleague said he even tried that without stunnel, but with no better success. I did try the test-script as well - but neither with 500 nor 2500 generated random-files I was able to reproduce the problem "cleanly" unfortunately :-( I did attach a backtrace and the syscall in which it hangs already. Is more detailed information needed? Since the problem (at least here) is not cleanly reproduceable with demo-data unfortunately the _full_ strace-log might include sensitive data, that I wouldn't appreciate to be online. So in case you could narrow the problem (strace-output) a bit down or provide some kind of test-scripts that would really help!
Seems as using a rsync daemon via TCP solves the problem for me. I have not encountered any of the afore-mentioned problems so far with it. Thank you very much for this information. Cheers, Peter
It really seems to be a cygwin bug. Sorry for not trying TCP before... Thanx again
I'm using it via ssltunnel here - so I suspect iit does neither have to do with tcp-or-not-tcp nor cygwin (running under Linux). Is there any test-script I can help you with, or any further information?
(In reply to comment #18) > Is there any test-script I can help you with, > or any further information? As mentioned in comment #13 (which was a reply to you), we need to see what all 3 processes were doing right before the hang in order to narrow the problem down. The 2 processes you traced were both doing what they were supposed to be doing, so we need to see what the 3rd process was doing.
Also, don't forget the advice from the first entry on the issues-and-debugging page about reporting the send/receive queue info from netstat: http://rsync.samba.org/issues.html That will also help.
Think I am having the same problem. Sometimes the backup is done, sometimes not. Restarting rsync daemon seems to stops this error from appearing. I am using cwrsync server and client. - cwrsync version, 2.6.6 (also happens with 2.6.5). - rsync daemon running without ssh. No tunnel. - connected by 100Mb LAN - rsyncd.conf, module readonly = false Client script (.bat): #rsync.exe -va --progress --del --max-delete=50 --link-dest=/%yesterday%/ %sourceunx% rsync://%server%/%module%/%today% Output: #rsync: read error: Connection reset by peer (104) #rsync error: error in rsync protocol data stream code (code 12) at io.c(584)
Hello, I was receiviong a similar error. I have put the results of my debug at: http://www.awsolutions.info/rsync-26762.out Any help would be greatly appreciated. Thanks, Brian
Brian: You're getting a syntax error. This bug is about a hang. I'd suggest posting to the mailing list to get help with the command you're running.
Created attachment 3404
I believe I have this bug occurring between two particular Linux systems over SSH, pushing the backup, the client running 3.0.3, and the server running 2.6.9. The client performs much of the backup, then proceeds to hang completely. An strace in the client ends with the following: select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left {60, 0}) write(4, "\252{\5\204\361\300\371I?\202\25\322\370\355t#\301\375\222\303O\220\357\251\301<S?\360<\364\312"..., 2295) = 2295 select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left {60, 0}) write(4, "\377\377\377\377", 4) = 4 select(6, [5], [], NULL, {60, 0}) = 0 (Timeout) select(6, [5], [], NULL, {60, 0}) = 0 (Timeout) select(6, [5], [], NULL, {60, 0}) = 0 (Timeout) select(6, [5], [], NULL, {60, 0}) = 0 (Timeout) select(6, [5], [], NULL, {60, 0} <unfinished ...> Is there anything I can do to assist in tracking down this bug? It's quite the irritant as it breaks my backup scripts for that server.
John, it would be helpful to have a stack trace and strace for each of the three processes (sender, generator, and receiver).
I wasn't able to get a stack trace, as none of the rsync processes were compiled with debugging information, except the client, which lacked gdb (or an easy method of installing it). However, I have straces for each. The client is from startup, and the two server processes had strace attached as quickly as possible after startup, at roughly the same time. Client: http://fudgeman.org/pub/strace.log Parent process on server: http://fudgeman.org/pub/strace-parent.log Child process on server: http://fudgeman.org/pub/strace-child.log The server-side errors were after the client was terminated, after it hung.
There are a lot of bugreports related to rsync hanging mysteriously, some of which may be duplicates of each other: https://bugzilla.samba.org/show_bug.cgi?id=1442 https://bugzilla.samba.org/show_bug.cgi?id=2957 https://bugzilla.samba.org/show_bug.cgi?id=9164 https://bugzilla.samba.org/show_bug.cgi?id=10035 https://bugzilla.samba.org/show_bug.cgi?id=10092 https://bugzilla.samba.org/show_bug.cgi?id=10518 https://bugzilla.samba.org/show_bug.cgi?id=10950 https://bugzilla.samba.org/show_bug.cgi?id=11166 https://bugzilla.samba.org/show_bug.cgi?id=12732 https://bugzilla.samba.org/show_bug.cgi?id=13109