Bug 3358 - rsync chokes on large files
Summary: rsync chokes on large files
Status: CLOSED INVALID
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 2.6.6
Hardware: PPC Mac OS X
: P3 major (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-12-28 10:49 UTC by david-bo
Modified: 2006-03-12 02:56 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description david-bo 2005-12-28 10:49:04 UTC
I try to rsync a 25-50 GB AES128 encrypted disk image called 'test' between two Mac OS X-machines. This is with rsync 2.6.6 (is there a 2.6.7? The front page just says 2.6.6)


% rsync -av --progress --stats --rsh=ssh /test 2nd-machine:/test
Warning: No xauth data; using fake authentication data for X11 forwarding.
tcsh: TERM: Undefined variable.
building file list ... 
1 file to consider
test
rsync: writefd_unbuffered failed to write 4 bytes: phase "unknown" [sender]: Broken pipe (32)
rsync: write failed on "/test": No space left on device (28)
rsync error: error in file IO (code 11) at /SourceCache/rsync/rsync-20/rsync/receiver.c(312)
rsync: connection unexpectedly closed (92 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at /SourceCache/rsync/rsync-20/rsync/io.c(359)
rsync: connection unexpectedly closed (1240188 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(434)


The receiving machine has space left (2+ GB). Before I upgraded to 2.6.6 I had 2.6.2 on the sending machine and 2.6.3 on the receiving machine. With that combination I got another error message:

% rsync -av --progress --stats --rsh=ssh test 2nd-machine:/test
Warning: No xauth data; using fake authentication data for X11
forwarding.
tcsh: TERM: Undefined variable.
building file list ... 
1 file to consider
test
rsync: writefd_unbuffered failed to write 4 bytes: phase "unknown":
Broken pipe
rsync error: error in rsync protocol data stream (code 12) at
/SourceCache/rsync/rsync-14/rsync/io.c(836)



The files _should_ be identical, I first transfered them with sftp without problems but they will change in the future and then I want to use rsync to keep them identical. This was just a test to verify my plan - a test that didn't seem to workout that well.

I don't know if this matter but here are some more information about my setup:

Powerbook G3 with 10.3.9
Powerbook G4 with 10.4.3

Wireless 802.11G-network between router and G4, wired network between G3 and router. The router is a Linksys WRT54GS.

Both the older versions and the most recent versions works very well when I work with smaller filer (for example, I synchronized 40 GB with mp3:a without problems)
Comment 1 Wayne Davison 2005-12-28 11:21:49 UTC
The pertinent error is this:

rsync: write failed on "/test": No space left on device (28)

That is an error from your OS that indicates that there was no room to write out the destination file.  Keep in mind that when rsync updates a file, it creates a new version of the file (unless --inplace was specifed), so your destination directory needs to have enough free space available to hold the largest updated file.

As for why the file is updating, if the modified time and size don't match, rsync will update the file (efficiently).  You can use the --checksum option to avoid this unneeded update at the expense of a lot of extra disk I/O to compute each file's checksum before figuring out if a transfer is needed.
Comment 2 david-bo 2005-12-29 13:47:14 UTC
Intereseting, didn't knwo that rsync worked that way - I thought the default behaviour was to only replace the parts of the file that had changed. Anyway, this  motivates a follow-up question:

If I understand it correctly if you file 1 on computer A and file 2 on computer B and some minor changes has been made to 1 and you want to sync these changes to B rsync basically make a copy of 2 and works with that. If 1/2 are big, like in my example when they where 25-50 GB, the copy operation from 2.0 to 2.1 generates a lot of disk acticity.

In my case when I rsync between two laptops all this disk activity is a little unfortunate since laptop drives are so slow. Now to my question, is there a way to reduce disk activity? Does the --inplace switch work around this?

Thanks.
Comment 3 david-bo 2005-12-29 13:48:22 UTC
Btw, I am just trying your suggestions. First I will try the inplace switch and secondly I will test syncing with twice the amount of space required for the file available.
Comment 4 david-bo 2005-12-29 13:54:12 UTC
Sorry for spamming, but I just realised what you meant when you wrote:

You can use the --checksum option to avoid this unneeded update at the expense of a lot of extra disk I/O to compute each file's checksum before figuring out if a transfer is needed.


If rsyns is _not_ checksumming files, why does rsyns remain in this state:

building file list ... 
1 file to consider


for maybe 30 minutes when it transfers my big file?
Comment 5 Wayne Davison 2006-01-02 09:49:24 UTC
(In reply to comment #4)
> If rsyns is _not_ checksumming files, why does rsyns remain in this state:
> [...]
> for maybe 30 minutes when it transfers my big file?

Because it is transferring the file.  Yes, this involves file-transfer checksumming, but I was talking about pre-transfer checksum generation (and its use in determining which files get transferred) which is what --checksum enables.
Comment 6 david-bo 2006-01-02 10:21:26 UTC
This is weird, there is no network activity during this building file list phase. However, as soon as it is finished, rsync saturates my network.

I thought rsync worked, if the file's size and modification date doesn't match, by creating a binary tree and then checksumming the parts between every node, recursively to the root of the tree, and then only transferring the parts where the checksum didn't match.
Comment 7 Wayne Davison 2006-01-02 11:02:15 UTC
(In reply to comment #6)
> This is weird, there is no network activity during this building file list
> phase. However, as soon as it is finished, rsync saturates my network.

What is weird about that?  As soon as rsync outputs the "1 file to consider" message, the file-list-building stage is over, and rsync then starts to transfer the file if it is in need of an update.  (If --checksum was specified, the receiving rsync would first be busily checksumming the file to decide if the file was actually changed before (possibly) starting the transfer.)

> I thought rsync worked, if the file's size and modification date doesn't
> match, by creating a binary tree and then checksumming the parts between
> every node, recursively to the root of the tree, and then only transferring
> the parts where the checksum didn't match.

There are no b-trees involved -- rsync immediately starts to send checksum info from the receiving side to the sender, who then diffs the remote checksums with the sending-side file and sends instructions to the receiver on how to recreate the file using as much of the local data as possible (this new file is built in a separate temp-file unless the --inplace option was specified).
Comment 8 david-bo 2006-01-02 11:42:52 UTC
> What is weird about that?

You wrote in a previous comment when I asked why rsync is considering a file for 30 minutes if it is not checksumming it: 

> Because it is transferring the file. 

To which I replied that there is no noticable network activity when rsync is in this state. However, when it is finished with the 'consideration phase' the network is saturated.

I think it is weird that transferring a 25 GB file doesn't generate any network activity when rsync is in the 'consideration phase' but transferring the same file when rsync is in another phase saturates the network.