Bug 12527 - Sender waits for timeout when fuzzy basis file found
Sender waits for timeout when fuzzy basis file found
Status: NEW
Product: rsync
Classification: Unclassified
Component: core
3.1.2
All All
: P5 critical
: ---
Assigned To: Wayne Davison
Rsync QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2017-01-18 15:16 UTC by Ben RUBSON
Modified: 2017-02-14 11:26 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ben RUBSON 2017-01-18 15:16:59 UTC
Hello,

Trying to make the sender get the last error messages from the generator/receiver (https://bugzilla.samba.org/show_bug.cgi?id=12522), I found one case where sender exits with timeout instead of correctly exiting.
It is when a fuzzy basis file has been selected.



Here is an example where we send 2 files, but as the remote FS gets full, transfer properly exits. Perfect.

sending incremental file list
[sender] change_dir(/home/prd/bkp)
send_files starting
server_recv(2) starting pid=45749
get_local_name count=3 /home/prd/bkp/
[Receiver] change_dir(/home/prd/bkp)
generator starting pid=45749
delta-transmission enabled
recv_generator(.,0)
set modtime of . to (1484751636) Wed Jan 18 16:00:36 2017
recv_generator(.,1)
recv_generator(bf.iso,2)
recv_generator(sf2.dmg,3)
recv_files(3) starting
generate_files phase=1
send_files(0, /home/prd/bkp/.)
./
send_files(2, /home/prd/bkp/bf.iso)
bf.iso
    549,158,912  74%   12.12MB/s    0:00:14  
rsync: [sender] write error: Broken pipe (32)
[sender] got msg=2, len=14
recv_files(.)
[sender] got msg=2, len=19
recv_files(bf.iso)
[sender] got msg=2, len=27
[receiver] send_msg(1, 94)
[sender] got msg=2, len=28
[generator] send_msg(1, 94)
[sender] got msg=1, len=94
rsync: write failed on "/home/prd/bkp/bf.iso": Disc quota exceeded (69)
[sender] got msg=2, len=33
[generator] converted msg len=94
[sender] got msg=2, len=27
[receiver] send_msg(3, 76)
[sender] got msg=2, len=28
[generator] send_msg(3, 76)
[sender] got msg=3, len=76
rsync error: error in file IO (code 11) at receiver.c(400) [receiver=3.1.2]
[sender] got msg=2, len=33
[generator] converted msg len=76
[sender] got msg=2, len=85
[receiver] _exit_cleanup(code=11, file=receiver.c, line=400): about to call exit(11
[sender] got msg=2, len=32
[receiver] send_msg_int(86, 11)
[sender] got msg=2, len=33
[generator] send_msg_int(86, 11)
[sender] got msg=86, len=4
[sender] send_msg(86, 0)
[sender] _exit_cleanup(code=11, file=io.c, line=1633): about to call exit(11)



Exactly same command launched, but adding -y so that fuzzy basis may be found.

sending incremental file list
[sender] change_dir(/home/prd/bkp)
send_files starting
server_recv(2) starting pid=48003
get_local_name count=3 /home/prd/bkp/
[Receiver] change_dir(/home/prd/bkp)
generator starting pid=48003
delta-transmission enabled
recv_generator(.,0)
set modtime of . to (1484751636) Wed Jan 18 16:00:36 2017
recv_generator(.,1)
recv_generator(bf.iso,2)
recv_generator(sf2.dmg,3)
fuzzy basis selected for sf2.dmg: sf1.dmg
send_files(0, /home/prd/bkp/.)
./
send_files(2, /home/prd/bkp/bf.iso)
bf.iso
    519,077,888  70%    8.14MB/s    0:00:25  
[sender] io timeout after 30 seconds -- exiting
rsync error: timeout in data send/receive (code 30) at io.c(195) [sender=3.1.2]
[sender] _exit_cleanup(code=30, file=io.c, line=195): about to call exit(30)



As you can see, a fuzzy basis has been selected for the second file.
However, as the remote FS gets full, the transfer of the first file stops, but instead of correctly exiting, the process hangs and waits for the configured timeout (30 seconds here), or forever if no timeout configured.
Same behaviour with protocol 30 and 31.

I went through the code but was not able to find out why.

Wayne, could you (or someone else of course) help me solving this issue ?

Would be perfect if this could be solved without having to modify the sender code.

Thank you very much for your support !

Ben
Comment 1 Ben RUBSON 2017-01-18 20:34:07 UTC
Same issue with rsync-HEAD-20161010-1854GMT.

My quick previous test uses these local files :
-rw-------  1 root  root  1177136858 Jan 10  2017 bf.iso
-rw-------  1 root  root    68558591 Jan 10  2017 sf2.dmg

And this remote file :
-rw-------  1 root  root    68558591 Jan 10  2017 sf1.dmg

bf.iso (local) is big enough to make the remote FS full.
sf2.dmg (local) and sf1.dmg (remote) are equal.
Comment 2 Ben RUBSON 2017-02-12 17:51:53 UTC
I'm troubleshooting and found that this happens when generate_and_send_sums() writes too many bufs.

If I replace :
for (i = 0; i < sum.count; i++) {
by :
for (i = 0; i < 4500; i++) {
(sum.count > 15000 in this test)

It works as expected, sender exits correctly writing down all error messages from the server.

If I replace :
for (i = 0; i < sum.count; i++) {
by :
for (i = 0; i < 4900; i++) {

It fails as explained above.
Comment 3 Ben RUBSON 2017-02-14 11:26:30 UTC
So sounds like that when the generator sends too much info (too many checksums) to the sender, the sender does not receive / see (important) messages anymore.

Perhaps we could solve this by making the generator wait before calling generate_and_send_sums(), until the sender really needs this info, i.e. until the sender is ready to send this file.

I think however I'll need your help Kevin / Wayne for such a modification.

Perhaps there's also a better way to solve this issue ?

Thank you again,

Ben