Bug 2187 - rsync large file getting verification failed using -z
Summary: rsync large file getting verification failed using -z
Status: CLOSED FIXED
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 2.6.3
Hardware: x86 All
: P3 normal (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-12-23 03:18 UTC by qiucheng
Modified: 2005-04-01 11:22 UTC (History)
1 user (show)

See Also:


Attachments
error log (585 bytes, text/plain)
2004-12-27 01:15 UTC, qiucheng
no flags Details
Fix problem with implicit (unsent) data (1.24 KB, patch)
2005-02-14 01:35 UTC, Wayne Davison
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description qiucheng 2004-12-23 03:18:38 UTC
it seems Bug# 1529 has not been resolved totally. When I rsync a large file 
greater than 4G with -z, I keep getting verification failed. If I remove -z 
option, everything becomes ok.
My envirnment is windows 2000 adv server with latest cygwin.
Comment 1 qiucheng 2004-12-27 01:15:28 UTC
Created attachment 869 [details]
error log
Comment 2 qiucheng 2004-12-27 01:44:57 UTC
by the way, I'm using rsync in cygwin 1.5.12-1.
Comment 3 Wayne Davison 2004-12-28 00:29:35 UTC
I'd suggest adding "--partial" to the options and comparing the copied file
(which will not be discarded) with a valid version of the file.  If the
corruption is minor, seeing what is different might clue us in as to what is
going wrong.  If the corruption is major, try adding a couple more files to the
command-line after the failing file -- if rsync experiences some kind of a
protocol failure, then there is probably a problem with the bytes making it over
the socket.  If it transfers the following files OK, we'll need to do more
debugging.

Is this problem unique to the one file you mentioned?  Or does it fail for all
files?
Comment 4 qiucheng 2004-12-28 03:31:23 UTC
I think it's a common problem. In fact I have been transfered a list of files 
which become larger everyday for a long time using rsync. When XYSHAGLS.BAK was 
not larger than 4GB, everything was ok. But when the size beyond 4GB, I found 
my script got the  verification failed error, which I considered as Bug# 1529.

After Bug# 1529 was resolved, I found I still got that error. So I try to 
remove -z in the parameters. It works. Now, I transfer other files which are 
all small than 4GB with -z, transfer this file without -z. If without -z, 
Transfering speed is very very slow.
Comment 5 qiucheng 2004-12-28 20:27:33 UTC
I added --partial to the options and got two files same in size,but not same in 
md5 checksum.

And then I did the following test.
First, I made two test files, test and test3.dat. Test is bigger than 4GB. 
Test3.dat is smaller than 4GB.
Second, I copy the two files to destinction.
Then in the source I added a file at the end of the two files by this command: 
copy /B test+somefile test
Then I rsync them to the destinction:
$ rsync -az -vvv backup@jf-deptest::backup/test* ./
opening tcp connection to jf-deptest port 873
Password:
receiving file list ...
recv_file_name(test)
recv_file_name(test3.dat)
received 2 names
done
recv_file_list done
get_local_name count=2 ./
generator starting pid=1784 count=2
delta transmission enabled
recv_generator(test,0)
generating and sending sums for 0
count=75314 rem=8792 blength=75304 s2length=4 flength=5671378944
recv_files(2) starting
recv_generator(test3.dat,1)
generating and sending sums for 1
count=53255 rem=20480 blength=53248 s2length=4 flength=2835689472
recv_files(test)
test
recv mapped test of size 5671378944
got file_sum
WARNING: test failed verification -- update discarded (will try again).
generate_files phase=1
recv_generator(test,0)
generating and sending sums for 0
count=75314 rem=8792 blength=75304 s2length=16 flength=5671378944
recv_files(test3.dat)
test3.dat
recv mapped test3.dat of size 2835689472
got file_sum
set modtime of .test3.dat.002916 to (1104284751) Wed Dec 29 09:45:51 2004
renaming .test3.dat.002916 to test3.dat
recv_files phase=1
generate_files phase=2
generate_files finished
recv_files(test)
test
recv mapped test of size 5671378944
got file_sum
ERROR: test failed verification -- update discarded.
recv_files finished

sent 2535000 bytes  received 4094667 bytes  1429.89 bytes/sec
total size is 8526412800  speedup is 1286.10
rsync error: some files could not be transferred (code 23) at /home/lapo/packagi
ng/tmp/rsync-2.6.3/main.c(1146)
_exit_cleanup(code=23, file=/home/lapo/packaging/tmp/rsync-2.6.3/main.c, line=11
46): about to call exit(23)

I got the same error. Hope this can help.
Comment 6 qiucheng 2004-12-30 03:40:04 UTC
I did another test.

I got a file which is about 180M. See the following test log:

$ rsync -az -vvv -B 65535 backup@jf-deptest::backup/cscnc1.dmp ./
opening tcp connection to jf-deptest port 873
Password:
receiving file list ...
recv_file_name(cscnc1.dmp)
received 1 names
done
recv_file_list done
get_local_name count=1 ./
generator starting pid=2720 count=1
delta transmission enabled
recv_generator(cscnc1.dmp,0)
generating and sending sums for 0
count=2846 rem=47953 blength=65535 s2length=3 flength=186495028
recv_files(1) starting
generate_files phase=1
recv_files(cscnc1.dmp)
cscnc1.dmp
recv mapped cscnc1.dmp of size 186495028
got file_sum
set modtime of .cscnc1.dmp.002512 to (1104401967) Thu Dec 30 18:19:27 2004
renaming .cscnc1.dmp.002512 to cscnc1.dmp
recv_files phase=1
generate_files phase=2
generate_files finished
recv_files finished

sent 20063 bytes  received 6491 bytes  689.71 bytes/sec
total size is 186495054  speedup is 7023.24
_exit_cleanup(code=0, file=/home/lapo/packaging/tmp/rsync-2.6.3/main.c, line=114
6): about to call exit(0)

administrator@jf-deptest ~
$ rsync -az -vvv -B 65536 backup@jf-deptest::backup/cscnc1.dmp ./
opening tcp connection to jf-deptest port 873
Password:
receiving file list ...
recv_file_name(cscnc1.dmp)
received 1 names
done
recv_file_list done
get_local_name count=1 ./
generator starting pid=2408 count=1
delta transmission enabled
recv_generator(cscnc1.dmp,0)
generating and sending sums for 0
count=2846 rem=45134 blength=65536 s2length=3 flength=186495054
recv_files(1) starting
generate_files phase=1
recv_files(cscnc1.dmp)
cscnc1.dmp
recv mapped cscnc1.dmp of size 186495054
got file_sum
WARNING: cscnc1.dmp failed verification -- update discarded (will try again).
recv_generator(cscnc1.dmp,0)
generating and sending sums for 0
count=2846 rem=45134 blength=65536 s2length=16 flength=186495054
recv_files phase=1
generate_files phase=2
generate_files finished
recv_files(cscnc1.dmp)
cscnc1.dmp
recv mapped cscnc1.dmp of size 186495054
got file_sum
ERROR: cscnc1.dmp failed verification -- update discarded.
recv_files finished

sent 77003 bytes  received 12189 bytes  1382.82 bytes/sec
total size is 186495067  speedup is 2090.94
rsync error: some files could not be transferred (code 23) at /home/lapo/packagi
ng/tmp/rsync-2.6.3/main.c(1146)
_exit_cleanup(code=23, file=/home/lapo/packaging/tmp/rsync-2.6.3/main.c, line=11
46): about to call exit(23)

Please notise when block size is set beyond to 64K, it failed. I don't know 
whether it's the cause of the problem.

I also did the other experience. see the log:

$ rsync -a -vvv -B 65536 backup@jf-deptest::backup/cscnc1.dmp ./
opening tcp connection to jf-deptest port 873
Password:
receiving file list ...
recv_file_name(cscnc1.dmp)
received 1 names
done
recv_file_list done
get_local_name count=1 ./
generator starting pid=864 count=1
delta transmission enabled
recv_generator(cscnc1.dmp,0)
generating and sending sums for 0
count=2846 rem=45134 blength=65536 s2length=3 flength=186495054
recv_files(1) starting
generate_files phase=1
recv_files(cscnc1.dmp)
cscnc1.dmp
recv mapped cscnc1.dmp of size 186495054
got file_sum
set modtime of .cscnc1.dmp.002716 to (1104402055) Thu Dec 30 18:20:55 2004
renaming .cscnc1.dmp.002716 to cscnc1.dmp
recv_files phase=1
generate_files phase=2
generate_files finished
recv_files finished

sent 20062 bytes  received 56714 bytes  642.48 bytes/sec
total size is 186495067  speedup is 2429.08
_exit_cleanup(code=0, file=/home/lapo/packaging/tmp/rsync-2.6.3/main.c, line=114
6): about to call exit(0)

I removed the -z option and used 65536 as block size, it passed this time.
Comment 7 Wayne Davison 2004-12-30 15:03:03 UTC
Thanks for the additional details -- it has been helpful.  I have found a
problem with the sending of compressed tokens when the blocksize is larger than
the CHUNK_SIZE, and am working on an appropriate fix.  When that is done, we can
see if it fixes the original verification problem you reported (which I haven't
been able to duplicate).
Comment 8 Paul Ripke 2005-01-21 05:36:13 UTC
I can confirm this bug under both AIX 5.2 and NetBSD 2.0, with 2.6.3 and CVS-HEAD as of 2005-
01-21. I also have a somewhat manageable test case which results in "verification failed" - a pair 
of 10 MB files available at:

ftp://stix.homeunix.net/pub/rsyncbug2187/file1.gz
ftp://stix.homeunix.net/pub/rsyncbug2187/file2.gz

Test blocksize is as reported by rsync on an actual transfer of a file a little over 4 GB. Test 
command was:

ksh$ rsync -B 66552 -acvvz --stats --inplace -e /usr/bin/rsh /tmp/z/file1 localhost:/tmp/z/file2 
opening connection using /usr/bin/rsh localhost rsync --server -vvlogDtprcz -B66552 --inplace . 
/tmp/z/file2 
building file list ... 
[sender] expand file_list to 131072 bytes, did move
done
delta transmission enabled
file1
WARNING: /tmp/z/file2 failed verification -- update retained (will try again).
file1
total: matches=306  tag_hits=1898  false_alarms=0 data=665520
ERROR: /tmp/z/file2 failed verification -- update retained.

Number of files: 1
Number of files transferred: 2
Total file size: 10485760 bytes
Total transferred file size: 20971520 bytes
Literal data: 665520 bytes
Matched data: 20306000 bytes
File list size: 61
Total bytes sent: 290611
Total bytes received: 4168

sent 290611 bytes  received 4168 bytes  196519.33 bytes/sec
total size is 10485760  speedup is 35.57
rsync error: some files could not be transferred (code 23) at main.c(702)
ksh$ cmp /tmp/z/file?
/tmp/z/file1 /tmp/z/file2 differ: char 3726913, line 11513
Comment 9 Wayne Davison 2005-02-14 01:35:00 UTC
Created attachment 965 [details]
Fix problem with implicit (unsent) data

I was able to reproduce your test case (much appreciated!) and the result is
the fix in the attached patch.	Apparently the zlib data handling is sensitive
to how the implicit (unsent) data is broken up.  The receiving side is kluging
up a literal-data zlib header for unsent data, and this must be 65535 bytes or
less in length.  The sending side was throwing a whole block of data at the
deflate() call, which apparently creates a potential mismatch in the data
streams if the blocksize is larger than 65535 bytes.  I changed the code to
break up the blocks on the sending side in the same increments that the
receiving side uses, and it fixed the problem.	I hope the fix works for all
large block sizes that were having a problem.
Comment 10 Wayne Davison 2005-02-14 01:35:37 UTC
Checked fix into CVS.