Bug 11166 - running with -vvv causes a hang
Summary: running with -vvv causes a hang
Status: RESOLVED FIXED
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.1.1
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-03-16 16:14 UTC by pjump
Modified: 2020-07-01 13:36 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description pjump 2015-03-16 16:14:57 UTC
I don't know what might be causing this but when I run rsync with -vvv (my other options were `-aH --delete  -F -Pi --dry-run -vvv` , the process hangs somewhere in the middle of my wine's virtual c_drive (I'm running this to mirror my /home).
It works fine without the -vvv or even with -vv.
Comment 1 pjump 2015-03-16 16:15:37 UTC
Basically it just freezes. No IO, no CPU usage.
Comment 3 Mark Vitale 2020-03-05 22:40:43 UTC
While debugging a different rsync hang, I have also seen this behavior with -vvv, and used git bisect to find when it was introduced:

d8587b4690b1987c02c71c136720f366abf250e6 is the first bad commit		introduced at 3.1.0pre1
commit d8587b4690b1987c02c71c136720f366abf250e6
Author: Wayne Davison <wayned@samba.org>
Date:   Tue Sep 15 16:12:24 2009 -0700

    Change the msg pipe to use a real multiplexed IO mode
    for the data that goes from the receiver to the generator.

I hope that helps.

Regards,
--
Mark Vitale
mvitale@sinenomine.net
Comment 4 Michal Ruprich 2020-06-04 13:44:43 UTC
When running on localhost with just -avvv options, rsync spawns three processes each stuck in the select function in the new perform_io function:

# ps -aux | grep rsync
root 98764 10.0  0.1  13108  3060 pts/2  S+ 09:37   0:00 /usr/bin/rsync -avvv src/ dst/
root 98765  0.0  0.1  12816  1872 pts/2  S+ 09:37   0:00 /usr/bin/rsync -avvv src/ dst/
root 98766 10.6  0.1  13076  2432 pts/2  S+ 09:37   0:00 /usr/bin/rsync -avvv src/ dst/
# pstack 98764
#0  0x00007ff114d4a6bb in select () from /lib64/libc.so.6
#1  0x000055cd6fd48a8f in perform_io ()
#2  0x000055cd6fd4b76d in write_buf ()
#3  0x000055cd6fd50022 in send_token ()
#4  0x000055cd6fd3d12e in matched.isra ()
#5  0x000055cd6fd3d3c2 in match_sums ()
#6  0x000055cd6fd32dd9 in send_files ()
#7  0x000055cd6fd3c512 in client_run ()
#8  0x000055cd6fd1d9f1 in main ()

# pstack 98765
#0  0x00007ff114d4a6bb in select () from /lib64/libc.so.6
#1  0x000055cd6fd48a8f in perform_io ()
#2  0x000055cd6fd49d8f in send_msg ()
#3  0x000055cd6fd3f5fa in rwrite ()
#4  0x000055cd6fd4a982 in read_a_msg ()
#5  0x000055cd6fd49046 in perform_io ()
#6  0x000055cd6fd4a1a9 in io_flush ()
#7  0x000055cd6fd2f1d7 in generate_files ()
#8  0x000055cd6fd3b941 in do_recv ()
#9  0x000055cd6fd3c0d4 in start_server ()
#10 0x000055cd6fd3c20b in child_main ()
#11 0x000055cd6fd5c088 in local_child ()
#12 0x000055cd6fd1d99a in main ()

# pstack 98766
#0  0x00007ff114d4a6bb in select () from /lib64/libc.so.6
#1  0x000055cd6fd48a8f in perform_io ()
#2  0x000055cd6fd49d8f in send_msg ()
#3  0x000055cd6fd3f2c0 in rwrite ()
#4  0x000055cd6fd3fa07 in rprintf ()
#5  0x000055cd6fd28e11 in finish_transfer ()
#6  0x000055cd6fd314fa in recv_files ()
#7  0x000055cd6fd3b852 in do_recv ()
#8  0x000055cd6fd3c0d4 in start_server ()
#9  0x000055cd6fd3c20b in child_main ()
#10 0x000055cd6fd5c088 in local_child ()
#11 0x000055cd6fd1d99a in main ()

These are most likely waiting for the same buffers?
# strace -p 98764
strace: Process 98764 attached
select(5, [], [4], [], {tv_sec=8, tv_usec=659438}) = 0 (Timeout)
select(5, [], [4], [], {tv_sec=60, tv_usec=0}

# strace -p 98765
strace: Process 98765 attached
select(2, [], [1], [], {tv_sec=40, tv_usec=875659}) = 0 (Timeout)
select(2, [], [1], [], {tv_sec=60, tv_usec=0}

# strace -p 98766
strace: Process 98766 attached
select(5, [], [4], [], {tv_sec=55, tv_usec=452214}) = 0 (Timeout)
select(5, [], [4], [], {tv_sec=60, tv_usec=0}

So all the processes are waiting to write something? I am guessing that the client process and one of the server processes are waiting to write to the iobuf(fd=4) and the second server process is writing to stdout(assuming fd=1 is stdout)?
Comment 5 Michal Ruprich 2020-06-04 13:46:05 UTC
If anyone from rsync could help, that would be awesome. I am trying to figure this out but it is hard to tell what happened and who is actually waiting on who.
Comment 6 Wayne Davison 2020-06-04 17:24:58 UTC
The hang isn't very easy to fix because of the current way the 3 processes communicate when performing a push operation (which is what happens by default in a local transfer) combined with the huge slew of messages that get generated by -vvv. The messages have to go "around the horn" from the receiver to the generator to the sender to be output, and that forwarding can hang up in certain full-buffer situations. I have some ideas I'm looking into for improving this, but it will take a while.

In the meantime you have several options for working around the issue:

Use some --info=FOO and/or --debug=FOO options instead of -vvv to limit the number of messages that rsync is generating to just those that you really need to see. (Specify a FOO of "help" to see a list of choices.)

Change a local copy from a push to a pull using the support/lsh script that comes with rsync. e.g., instead of running "rsync -aivvv src/ dest/" you'd run:

    rsync -aivvve lsh localhost:$PWD/src/ dest/

Finally, use the --msgs2stderr (which can be specified as {-M,}--msgs2stderr to affect both sides of a non-local transfer) so that the messages are not in the protocol stream. If you want to map them back from stderr to stdout, you could specify "2>&1" on the command-line. This choice has small chance to cause some weirdness in the message stream, though, since all 3 processes are outputting debug messages at the same time to stderr (though rsync changes stderr to be line buffered to try to minimize that).
Comment 7 Wayne Davison 2020-06-04 21:46:25 UTC
I figured out a simple fix that allows the receiver to expand its message buf size when things clog up and committed that fix to git for 3.2.0.  I still have a dream of fixing this in a better way in the future, but this will work OK for now.

I still suggest that you avoid -vvv when possible, but it will at least work when you do it (assuming you don't overflow memory). In my one local testing, the msg buffer expanded from 16KB to 512KB (in several steps) and then stayed steady at that size, even when I increased the number of files copied.

Also, my latest git version of "lsh" (& its bash "lsh.sh" alternative) in the support dir now accepts a hostname of "lh" for localhost with an implied --no-cd option so that the "remote" side doesn't default to the user's home dir. This makes it easier to type a local-copy command that will avoid the extra msg memory allocations:

    rsync -aivvve lsh lh:src/ dest/
Comment 8 Michal Ruprich 2020-06-09 06:12:22 UTC
This is awesome thanks. No hang on a reproducer with circa 9k of files(6.6G of files).
Comment 9 Michal Ruprich 2020-07-01 13:36:35 UTC
Hi,

current rsync still hits this problem when transferring huge amounts of files. I am talking tens of thousands. I tried this with some 700k files and it got stuck at the same place as before. What worked for me was to create a bigger buffer from the beginning. I set the IO_BUFFER_SIZE from 32*1024 to 128*1024.

I know that this does not make sense for every transfer but maybe it wouldn't be a bad idea to create a new option that could increase the size of the buffer from the beginning. If I have a usecase where I know I am going to be transferring a lot of files and for some reason I need the -vvv or more, I could just set the message buffer to a bigger value.

Or do you have an idea how to re-write the logging completely?

Thanks for any thoughts on this.

Michal