Bug 7778 - Extra writes with --inplace due to misaligned block matching
Summary: Extra writes with --inplace due to misaligned block matching
Status: RESOLVED FIXED
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.0.7
Hardware: Other Linux
: P3 minor (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-11-05 15:38 UTC by Ildar Muyukov
Modified: 2011-06-04 19:57 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ildar Muyukov 2010-11-05 15:38:13 UTC
Even if a block contents in dst is the same as in src, it gets written anyway.
It is fine with no --inplace.
But with --inplace it is:
1. Excessive
2. Unexpected. Very troublesome if dst is (partially) sparse.

Is it possible to be fixed? (I guess it's trivial)
Comment 1 Wayne Davison 2010-11-06 10:23:52 UTC
What makes you think matching locations are being written?  In the verbose output, a matching offset is when a seek happens.  e.g.:

chunk[391] of size 920 at 359720 offset=359720

I'm adding a " (seek)" suffix to that line for 3.1.0, just to make it clearer.
Comment 2 Ildar Muyukov 2010-11-06 14:05:14 UTC
I am sorry if I was wrong.

Here's my testcase:
$ dd bs=1M seek=1 count=0 of=f1
$ dd bs=1M seek=1 count=0 of=f2
$ du f?
0	f1
0	f2
$ rsync --inplace f1 f2
$ du f?
0	f1
1,1M	f2

Since files are identical, I expect nothing is written. But a sparse file became filled so something goes wrong. Any idea what?
Comment 3 Matt McCutchen 2010-11-06 17:04:09 UTC
--inplace only avoids rewriting unchanged blocks when the delta-transfer algorithm is on, and it is off by default for a local run.  You can turn it on explicitly with --no-whole-file.  I'm not sure whether adding another mode where the receiver checks for unchanged blocks would be worth the effort.
Comment 4 Ildar Muyukov 2010-11-07 12:56:04 UTC
Hey, Matt! I think you're right. I never knew rsync has two different protocols.
But I still have a problem: if I do this:
$ echo > f1
$ dd bs=1M seek=1 count=0 of=f1
$ dd bs=1M seek=1 count=0 of=f2
$ du -h f?
4,0K	f1
0	f2
$ rsync --inplace --no-whole-file f1 f2
$ du -h f?
4,0K    f1
1,1M    f2

I still get target filled.
That is rsync writes 1M byte into target while 1 byte would be enough.
Comment 5 Matt McCutchen 2010-11-07 14:34:41 UTC
(In reply to comment #4)
> But I still have a problem: if I do this:
> $ echo > f1
> $ dd bs=1M seek=1 count=0 of=f1
> $ dd bs=1M seek=1 count=0 of=f2
> $ du -h f?
> 4,0K    f1
> 0       f2
> $ rsync --inplace --no-whole-file f1 f2
> $ du -h f?
> 4,0K    f1
> 1,1M    f2
> 
> I still get target filled.

I see what is happening.  As the sender goes through the source file, it always matches the data against a basis file block as soon as possible and then skips the entire matched region of the source file.  So in this case, it skips the '\n' and makes matches at offsets 1, 1025, ... of the source file against arbitrary basis file blocks; it never gets to an aligned offset k*1024 where it could match against basis data at the same offset.  To fix this, when updating_basis_file is on, the sender would have to postpone making a nonaligned match until it checks whether the next "block" of the source file matches the basis file at the same offset.
Comment 6 Ildar Muyukov 2010-11-07 14:47:41 UTC
(In reply to comment #5)
Ok, BUT since the second block it should see just blocks with zeros in both src and tgt. I can agree with 1st and 2nd 1k-blocks written. But it does write the whole file! (while the difference was just 1 byte).

Possible to limit the affected area?
Comment 7 Wayne Davison 2010-11-08 20:24:20 UTC
This is caused by the repetition of the file's data.  When rsync checks at offset 1 in the receiving file for a matching block, it finds a match (because all the blocks are identical after the first byte), and rsync never gets back to the 1024-byte aligned blocks on the sender to notice that the data is identical again.  If your data was not so repetitive, rsync would quickly sync up and skip the rest of the writes.  (You can see what it is doing via either the 3.1.0dev option --debug=deltasum3 or via -vvvv.)

I'm not sure how best the code could be improved to try to avoid this.  Matt's idea of block-aligned checks could be made to work (given enough read-ahead), but I'm not sure it's worth it, since it only affects very repetitive files.

I do note that the code that is looking for a (preferential) identical-position block is wasting time when the receiving side block is not aligned with the sending-side's blocks.  That is something that should be optimized.
Comment 8 Ildar Muyukov 2010-11-11 13:30:26 UTC
(In reply to comment #7)
> I'm not sure how best the code could be improved to try to avoid this.  Matt's
> idea of block-aligned checks could be made to work (given enough read-ahead),
> but I'm not sure it's worth it, since it only affects very repetitive files.

Sorry, I can't agree.
This issue may be crucial in some cases. Overwriting blocks is bad: not just for sparse targets but also for filesystems like JFFS2, where "overwrinting" means enlarging target space. This means: for 1M target overwritten 10 times it takes 11M of space.
Comment 9 Wayne Davison 2011-01-18 01:10:08 UTC
The latest 3.1.0dev version now re-aligns for sequences of zeros.  I toyed with generalizing it for any repetitive blocks, but that would have caused extra (useless) checksumming for any inplace file update where the data moved toward the start of the file -- it doesn't seem worthwhile to slow things down in the more common cases to try to optimize the more rare data cases.

So, the current code will re-sync for non-repetitive data (as it always would) and also (now) for zeros (the most common repetitive data).  Further improvements may yet be possible, but I'm not looking for any at the moment.

I've also optimized away the search loop that used to be there to find the right sum record for the current position in the file.  That will especially help files that have a lot of identical sum records in a particular hash chain.
Comment 10 Wayne Davison 2011-06-04 19:57:56 UTC
Closing due to already deployed fixes.