Bug 6946 - Cache checksums
Summary: Cache checksums
Status: NEW
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.1.0
Hardware: Other Linux
: P3 enhancement (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
Depends on:
Reported: 2009-12-01 04:07 UTC by James Pharaoh
Modified: 2009-12-05 00:32 UTC (History)
0 users

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description James Pharaoh 2009-12-01 04:07:55 UTC
Currently if multiple users using group write permissions update the same folder then they can't always update file timestamps. This is because the file may be owned by a separate user. Furthermore, changing the timestamp may be undesirable if files are being updated from various sources, since timestamps moving backwards can confuse some software.

Disabling timestamp updates does not impair rsync's functionality but it can severely reduce performance. It is forced to fully read each file, on both sides of the transfer, in order to obtain a checksum.

I propose caching these checksums, against a file's timestamp, in order to avoid this performance problem. This method does not require an update of the file's timestamp. It would require a database of some sort linking files and timestamps to the calculated checksum. Whenever a file's timestamp changed, the checksum would need recalculating.

To operate efficiently this would need to operate at both sides of the connection. Basically instead of reading a file to determine its timestamp the cache would first be checked. If the cache did not contain a timestamp then it would be calculated and stored in the cache for future reference.

I'd suggest specifying the cache location for local and remote connections via a parameter initially.

I intend to have a go at adding this functionality myself, although I'd be very interested to hear any comments. Also if anyone has done anything similar, or if I am missing some simpler solution to this problem.
Comment 1 Matt McCutchen 2009-12-05 00:32:10 UTC
Note that several of the maintained patches have to do with checksum caching: the ones with "checksum" in the name as well as "db.diff".  See the bottom of http://rsync.samba.org/download.html for how to access these patches.