I don't know what might be causing this but when I run rsync with -vvv (my other options were `-aH --delete -F -Pi --dry-run -vvv` , the process hangs somewhere in the middle of my wine's virtual c_drive (I'm running this to mirror my /home). It works fine without the -vvv or even with -vv.
Basically it just freezes. No IO, no CPU usage.
There are a lot of bugreports related to rsync hanging mysteriously, some of which may be duplicates of each other: https://bugzilla.samba.org/show_bug.cgi?id=1442 https://bugzilla.samba.org/show_bug.cgi?id=2957 https://bugzilla.samba.org/show_bug.cgi?id=9164 https://bugzilla.samba.org/show_bug.cgi?id=10035 https://bugzilla.samba.org/show_bug.cgi?id=10092 https://bugzilla.samba.org/show_bug.cgi?id=10518 https://bugzilla.samba.org/show_bug.cgi?id=10950 https://bugzilla.samba.org/show_bug.cgi?id=11166 https://bugzilla.samba.org/show_bug.cgi?id=12732 https://bugzilla.samba.org/show_bug.cgi?id=13109
While debugging a different rsync hang, I have also seen this behavior with -vvv, and used git bisect to find when it was introduced: d8587b4690b1987c02c71c136720f366abf250e6 is the first bad commit introduced at 3.1.0pre1 commit d8587b4690b1987c02c71c136720f366abf250e6 Author: Wayne Davison <wayned@samba.org> Date: Tue Sep 15 16:12:24 2009 -0700 Change the msg pipe to use a real multiplexed IO mode for the data that goes from the receiver to the generator. I hope that helps. Regards, -- Mark Vitale mvitale@sinenomine.net
When running on localhost with just -avvv options, rsync spawns three processes each stuck in the select function in the new perform_io function: # ps -aux | grep rsync root 98764 10.0 0.1 13108 3060 pts/2 S+ 09:37 0:00 /usr/bin/rsync -avvv src/ dst/ root 98765 0.0 0.1 12816 1872 pts/2 S+ 09:37 0:00 /usr/bin/rsync -avvv src/ dst/ root 98766 10.6 0.1 13076 2432 pts/2 S+ 09:37 0:00 /usr/bin/rsync -avvv src/ dst/ # pstack 98764 #0 0x00007ff114d4a6bb in select () from /lib64/libc.so.6 #1 0x000055cd6fd48a8f in perform_io () #2 0x000055cd6fd4b76d in write_buf () #3 0x000055cd6fd50022 in send_token () #4 0x000055cd6fd3d12e in matched.isra () #5 0x000055cd6fd3d3c2 in match_sums () #6 0x000055cd6fd32dd9 in send_files () #7 0x000055cd6fd3c512 in client_run () #8 0x000055cd6fd1d9f1 in main () # pstack 98765 #0 0x00007ff114d4a6bb in select () from /lib64/libc.so.6 #1 0x000055cd6fd48a8f in perform_io () #2 0x000055cd6fd49d8f in send_msg () #3 0x000055cd6fd3f5fa in rwrite () #4 0x000055cd6fd4a982 in read_a_msg () #5 0x000055cd6fd49046 in perform_io () #6 0x000055cd6fd4a1a9 in io_flush () #7 0x000055cd6fd2f1d7 in generate_files () #8 0x000055cd6fd3b941 in do_recv () #9 0x000055cd6fd3c0d4 in start_server () #10 0x000055cd6fd3c20b in child_main () #11 0x000055cd6fd5c088 in local_child () #12 0x000055cd6fd1d99a in main () # pstack 98766 #0 0x00007ff114d4a6bb in select () from /lib64/libc.so.6 #1 0x000055cd6fd48a8f in perform_io () #2 0x000055cd6fd49d8f in send_msg () #3 0x000055cd6fd3f2c0 in rwrite () #4 0x000055cd6fd3fa07 in rprintf () #5 0x000055cd6fd28e11 in finish_transfer () #6 0x000055cd6fd314fa in recv_files () #7 0x000055cd6fd3b852 in do_recv () #8 0x000055cd6fd3c0d4 in start_server () #9 0x000055cd6fd3c20b in child_main () #10 0x000055cd6fd5c088 in local_child () #11 0x000055cd6fd1d99a in main () These are most likely waiting for the same buffers? # strace -p 98764 strace: Process 98764 attached select(5, [], [4], [], {tv_sec=8, tv_usec=659438}) = 0 (Timeout) select(5, [], [4], [], {tv_sec=60, tv_usec=0} # strace -p 98765 strace: Process 98765 attached select(2, [], [1], [], {tv_sec=40, tv_usec=875659}) = 0 (Timeout) select(2, [], [1], [], {tv_sec=60, tv_usec=0} # strace -p 98766 strace: Process 98766 attached select(5, [], [4], [], {tv_sec=55, tv_usec=452214}) = 0 (Timeout) select(5, [], [4], [], {tv_sec=60, tv_usec=0} So all the processes are waiting to write something? I am guessing that the client process and one of the server processes are waiting to write to the iobuf(fd=4) and the second server process is writing to stdout(assuming fd=1 is stdout)?
If anyone from rsync could help, that would be awesome. I am trying to figure this out but it is hard to tell what happened and who is actually waiting on who.
The hang isn't very easy to fix because of the current way the 3 processes communicate when performing a push operation (which is what happens by default in a local transfer) combined with the huge slew of messages that get generated by -vvv. The messages have to go "around the horn" from the receiver to the generator to the sender to be output, and that forwarding can hang up in certain full-buffer situations. I have some ideas I'm looking into for improving this, but it will take a while. In the meantime you have several options for working around the issue: Use some --info=FOO and/or --debug=FOO options instead of -vvv to limit the number of messages that rsync is generating to just those that you really need to see. (Specify a FOO of "help" to see a list of choices.) Change a local copy from a push to a pull using the support/lsh script that comes with rsync. e.g., instead of running "rsync -aivvv src/ dest/" you'd run: rsync -aivvve lsh localhost:$PWD/src/ dest/ Finally, use the --msgs2stderr (which can be specified as {-M,}--msgs2stderr to affect both sides of a non-local transfer) so that the messages are not in the protocol stream. If you want to map them back from stderr to stdout, you could specify "2>&1" on the command-line. This choice has small chance to cause some weirdness in the message stream, though, since all 3 processes are outputting debug messages at the same time to stderr (though rsync changes stderr to be line buffered to try to minimize that).
I figured out a simple fix that allows the receiver to expand its message buf size when things clog up and committed that fix to git for 3.2.0. I still have a dream of fixing this in a better way in the future, but this will work OK for now. I still suggest that you avoid -vvv when possible, but it will at least work when you do it (assuming you don't overflow memory). In my one local testing, the msg buffer expanded from 16KB to 512KB (in several steps) and then stayed steady at that size, even when I increased the number of files copied. Also, my latest git version of "lsh" (& its bash "lsh.sh" alternative) in the support dir now accepts a hostname of "lh" for localhost with an implied --no-cd option so that the "remote" side doesn't default to the user's home dir. This makes it easier to type a local-copy command that will avoid the extra msg memory allocations: rsync -aivvve lsh lh:src/ dest/
This is awesome thanks. No hang on a reproducer with circa 9k of files(6.6G of files).
Hi, current rsync still hits this problem when transferring huge amounts of files. I am talking tens of thousands. I tried this with some 700k files and it got stuck at the same place as before. What worked for me was to create a bigger buffer from the beginning. I set the IO_BUFFER_SIZE from 32*1024 to 128*1024. I know that this does not make sense for every transfer but maybe it wouldn't be a bad idea to create a new option that could increase the size of the buffer from the beginning. If I have a usecase where I know I am going to be transferring a lot of files and for some reason I need the -vvv or more, I could just set the message buffer to a bigger value. Or do you have an idea how to re-write the logging completely? Thanks for any thoughts on this. Michal