Bug 3925 - rsync is unable to sync large (approx 4G) sparse files
Summary: rsync is unable to sync large (approx 4G) sparse files
Status: RESOLVED FIXED
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.0.5
Hardware: x86 Linux
: P3 major (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-07-12 15:11 UTC by Bernard Johnson
Modified: 2009-03-06 14:42 UTC (History)
6 users (show)

See Also:


Attachments
Tweak spare_seek and a few other size_t vars (1.05 KB, patch)
2008-11-11 20:04 UTC, Wayne Davison
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bernard Johnson 2006-07-12 15:11:26 UTC
receiving machine: rsync-2.6.8-1.FC5.1 (Fedora Core 5)
sending machine: rsync-2.6.3-1 (Fedora Core 1)

The following command line parameters are used on the receiving machine:
rsync --rsh=ssh \
      --archive \
      --compress \
      --update \
      --recursive \
      --sparse \
      --progress \
      --exclude-from=excludes.txt \
      --partial \
      --delete \
      --delete-excluded \
    user@sender:'/dest1 /dest2' dest-dir

Two files are complained about.  They are both sparse files and approximately 4G in size.  When rsyncing, the following messages are produced:

test1/cow
  4296024064 100%    5.55MB/s    0:12:18 (xfer#22, to-check=56572/57050)
WARNING: test1/cow failed verification -- update retained (will try again).
test3/cow
  4296028161 100%    5.50MB/s    0:12:24 (xfer#24, to-check=56569/57050)
WARNING: test3/cow failed verification -- update retained (will try again).

then later in the backup:
test1/cow
  4296024064 100%    5.97MB/s    0:11:26 (xfer#30, to-check=56572/57050)
ERROR: test1/cow failed verification -- update retained.
test3/cow
  4296028161 100%    5.75MB/s    0:11:52 (xfer#31, to-check=56569/57050)
ERROR: test3/cow failed verification -- update retained.

Checking the md5sums of the files in question shows that they are not the same.

sending machine:
093426c81424183de9162cc412e46eaf  test1/cow
04f4c4f4cd49160ba696423dd37fe7d2  test3/cow

receiving machine:
5e1cef93f8e5542a097c178bab6b3688  test1/cow
8573a3d344fadd61a38aefdc94e027f5  test3/cow

A second run of rsync does not even attempt to synchronize the files:
user@sender's password:
receiving file list ...
57050 files to consider

sent 145 bytes  received 952528 bytes  20937.87 bytes/sec
total size is 63950448489  speedup is 67127.39

If I remove the files in question from the receiver, and rsync again, the rsync completes normally.  The md5sums also match.

I see that in the changes to 2.6.7, that --inplace and --sparse can't be used together because "the sparse-output algorithm doesn't work when overwriting existing data".  I'm not using --inplace so I don't think this affects me.  Also, the receiver is 2.6.8, which I believe makes this irrelevent.
Comment 1 oscar 2006-11-16 10:21:28 UTC
I noticed something really similar.
Using rsync daemon, transferring a bunch of movie files.
now and then a file fails with this error.
Rsync is the same version on both sides, 2.6.8, from backports.debian.org.
command line parameters:

-a 
--stats 
--delete-excluded 
--delete-after 
--partial
--password-file=/etc/rsync/aaa.passwd 
-W 
--size-only
--exclude "/*/*.zip"
rsync://syncuser@$HOST:/content/ /home/www/${HOST}.com/

ERROR: marie/movies/marie_320.mpg failed verification -- update retained.

The resulting files' checksums are different. Sizes are the same.


Comment 2 oscar 2006-11-16 10:24:52 UTC
on 2 systems removal and retransfer helped. 
On one it does not.
Comment 3 Tony Ketteringham 2006-12-20 23:40:47 UTC
Another Example (with slight differences):
receiving machine: cwRsync 2.0.10 (W2003)
sending machine:  rsync 2.5.7 (RHEL 3)

bug occurs only with -z flag, but does occur consistently with large files.
(also using -e "ssh" and -r options)
Comment 4 Bas van Schaik 2007-02-23 10:58:59 UTC
Hmmm... I'm having the same problems here: rsync 2.6.9 (Debian Sarge) server, version 2.6.3 at the sender. 

You should take a look at bug #2187 which describes exactly the same problem. However, Wayne Davison states that the bug is fixed in CVS somewhere in february 2005. I don't know which version of rsync is the first to have this patch included, but I think my version (2.6.3) is too old and still has bug #2187 included.
Comment 5 Pedro Velasco 2008-11-11 19:39:28 UTC
on file "fileio.c", line 31:
static size_t sparse_seek = 0;

size_t is 4 bytes (4GB), at least on my six years old P4, don't know in 64 bits systems

So if there are more than 4GB consecutive zeros, the "sparse_seek" variable overflows, and things go wrong... but the size of the file changes, so I'm not sure if this is related to this thread.

just change it to:
static off_t sparse_seek = 0;

and it works, or at least it looks like it works :). Maybe some more in deep look should be taken on the "sparse" code (also happens with l1 and l2, but I don't think anyone is going to have a +4GB buffer...)

Oh! this is in, at least, rsync 3.0.4 and on my gentoo 3.0.3... maybe much earlier too...
Comment 6 Wayne Davison 2008-11-11 20:04:26 UTC
Created attachment 3728 [details]
Tweak spare_seek and a few other size_t vars

Thanks, Pedro!  I agree that the sparse_seek variable should be an off_t (which is an OFF_T in rsync, to support some systems where off_t isn't as bit as it should be).

In looking at the size_t args write_file() and write_sparse() and how they interact with int vars, it looks to me like some of the size_t values should also be ints.

This patch changes both the sparse_seek definition and makes some size_t vars ints.
Comment 7 Pedro Velasco 2008-11-13 20:27:52 UTC
Hello again!

I'm not sure if the "nice" way to define them is as ints

I would check what do they exactly mean, and leave them as size_t or make them OFF_T. If they relate to a buffer position or funcion handling size_ts, leave them as size_t. However if they relate to file positions, I think they should be OFF_Ts.

Anyway, I think they definitely should not be ints, as they are not going to get any negative value and you are efectively halving the range in trade of nothing.

However this is just my point of view, and I'm quite sure nobody needs a programming course here, so if you think ints are the way to go I'm not going to cry :)
Comment 8 Wayne Davison 2008-11-14 12:39:54 UTC
The vars need to be consistent.  Since the callers are passing int length values, and the return must be able to either return the input length or go negative (for an error), I think int is the right choice.  The chunk size of a write will not overflow 31 bits (or indeed, even get close to that), so we should be fine.
Comment 9 Stéphane BERTHELOT 2009-03-06 11:15:23 UTC
Using 3.0.4 here on same machine to copy a huge (40GB) sparse file it stops on exactly 32G.

I'm using rsync -S src dst. Without the -S option it completes at the correct size. I do not have either a warning nor an error even when the destination file has an incorrect size !

Timing with cp on the same hardware and same conditions:

time rsync -S /Storage1/mail1.diskimg /Storage2/mail1.diskimg.backup 

real    15m12.434s
user    5m42.237s
sys     7m8.176s
(with wrong file size)

time cp --sparse=always /Storage1/mail1.diskimg /Storage2/mail1.diskimg.backup

real    6m32.121s
user    0m8.414s
sys     2m45.551s

This is a KVM disk image so the file should have quite long sparse zones.
Comment 10 Wayne Davison 2009-03-06 14:42:01 UTC
You need to use 3.0.5 for the fixed sparse variable.