The Samba-Bugzilla – Bug 3925
rsync is unable to sync large (approx 4G) sparse files
Last modified: 2009-03-06 14:42:01 UTC
receiving machine: rsync-2.6.8-1.FC5.1 (Fedora Core 5)
sending machine: rsync-2.6.3-1 (Fedora Core 1)
The following command line parameters are used on the receiving machine:
rsync --rsh=ssh \
user@sender:'/dest1 /dest2' dest-dir
Two files are complained about. They are both sparse files and approximately 4G in size. When rsyncing, the following messages are produced:
4296024064 100% 5.55MB/s 0:12:18 (xfer#22, to-check=56572/57050)
WARNING: test1/cow failed verification -- update retained (will try again).
4296028161 100% 5.50MB/s 0:12:24 (xfer#24, to-check=56569/57050)
WARNING: test3/cow failed verification -- update retained (will try again).
then later in the backup:
4296024064 100% 5.97MB/s 0:11:26 (xfer#30, to-check=56572/57050)
ERROR: test1/cow failed verification -- update retained.
4296028161 100% 5.75MB/s 0:11:52 (xfer#31, to-check=56569/57050)
ERROR: test3/cow failed verification -- update retained.
Checking the md5sums of the files in question shows that they are not the same.
A second run of rsync does not even attempt to synchronize the files:
receiving file list ...
57050 files to consider
sent 145 bytes received 952528 bytes 20937.87 bytes/sec
total size is 63950448489 speedup is 67127.39
If I remove the files in question from the receiver, and rsync again, the rsync completes normally. The md5sums also match.
I see that in the changes to 2.6.7, that --inplace and --sparse can't be used together because "the sparse-output algorithm doesn't work when overwriting existing data". I'm not using --inplace so I don't think this affects me. Also, the receiver is 2.6.8, which I believe makes this irrelevent.
I noticed something really similar.
Using rsync daemon, transferring a bunch of movie files.
now and then a file fails with this error.
Rsync is the same version on both sides, 2.6.8, from backports.debian.org.
command line parameters:
ERROR: marie/movies/marie_320.mpg failed verification -- update retained.
The resulting files' checksums are different. Sizes are the same.
on 2 systems removal and retransfer helped.
On one it does not.
Another Example (with slight differences):
receiving machine: cwRsync 2.0.10 (W2003)
sending machine: rsync 2.5.7 (RHEL 3)
bug occurs only with -z flag, but does occur consistently with large files.
(also using -e "ssh" and -r options)
Hmmm... I'm having the same problems here: rsync 2.6.9 (Debian Sarge) server, version 2.6.3 at the sender.
You should take a look at bug #2187 which describes exactly the same problem. However, Wayne Davison states that the bug is fixed in CVS somewhere in february 2005. I don't know which version of rsync is the first to have this patch included, but I think my version (2.6.3) is too old and still has bug #2187 included.
on file "fileio.c", line 31:
static size_t sparse_seek = 0;
size_t is 4 bytes (4GB), at least on my six years old P4, don't know in 64 bits systems
So if there are more than 4GB consecutive zeros, the "sparse_seek" variable overflows, and things go wrong... but the size of the file changes, so I'm not sure if this is related to this thread.
just change it to:
static off_t sparse_seek = 0;
and it works, or at least it looks like it works :). Maybe some more in deep look should be taken on the "sparse" code (also happens with l1 and l2, but I don't think anyone is going to have a +4GB buffer...)
Oh! this is in, at least, rsync 3.0.4 and on my gentoo 3.0.3... maybe much earlier too...
Created attachment 3728 [details]
Tweak spare_seek and a few other size_t vars
Thanks, Pedro! I agree that the sparse_seek variable should be an off_t (which is an OFF_T in rsync, to support some systems where off_t isn't as bit as it should be).
In looking at the size_t args write_file() and write_sparse() and how they interact with int vars, it looks to me like some of the size_t values should also be ints.
This patch changes both the sparse_seek definition and makes some size_t vars ints.
I'm not sure if the "nice" way to define them is as ints
I would check what do they exactly mean, and leave them as size_t or make them OFF_T. If they relate to a buffer position or funcion handling size_ts, leave them as size_t. However if they relate to file positions, I think they should be OFF_Ts.
Anyway, I think they definitely should not be ints, as they are not going to get any negative value and you are efectively halving the range in trade of nothing.
However this is just my point of view, and I'm quite sure nobody needs a programming course here, so if you think ints are the way to go I'm not going to cry :)
The vars need to be consistent. Since the callers are passing int length values, and the return must be able to either return the input length or go negative (for an error), I think int is the right choice. The chunk size of a write will not overflow 31 bits (or indeed, even get close to that), so we should be fine.
Using 3.0.4 here on same machine to copy a huge (40GB) sparse file it stops on exactly 32G.
I'm using rsync -S src dst. Without the -S option it completes at the correct size. I do not have either a warning nor an error even when the destination file has an incorrect size !
Timing with cp on the same hardware and same conditions:
time rsync -S /Storage1/mail1.diskimg /Storage2/mail1.diskimg.backup
(with wrong file size)
time cp --sparse=always /Storage1/mail1.diskimg /Storage2/mail1.diskimg.backup
This is a KVM disk image so the file should have quite long sparse zones.
You need to use 3.0.5 for the fixed sparse variable.