10379 – rsync metadata files

Bug 10379 - rsync metadata files

Summary: rsync metadata files

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	rsync
Classification:	Unclassified
Component:	core (show other bugs)
Version:	3.1.0
Hardware:	All All

Importance:	P5 enhancement (vote)
Target Milestone:	---
Assignee:	Wayne Davison
QA Contact:	Rsync QA Contact

URL:
Keywords:

Depends on:
Blocks:

Reported:	2014-01-14 17:18 UTC by Haravikk
Modified:	2014-01-20 12:42 UTC (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Haravikk 2014-01-14 17:18:39 UTC

This proposal is for an optional feature within rsync that will allow it to create special metadata files within directories, in order to keep track of additional information. The feature would be activated by setting a --metadata-file or similar parameter, specifying the name of the metadata file.

Whenever rsync encounters a directory (or file within it) that is eligible for special or optimised treatment, it will add suitable information in a metadata file within that directory, using the provided file-name. When rsync operates on that directory in future, it will open any metadata file and use this to obtain hints as to how to handle the directory (and files within it). If the modified time of the metadata file differs from that of the directory it is within then it will be validated (to detect any hints that are no longer true).

This isn't a useful feature in its own right, but would allow for possible future features to be implemented by allowing rsync to track useful information, particularly information that can be used to optimise comparisons.

Comment 1 Wayne Davison 2014-01-19 22:20:03 UTC

See the checksum patches and the db patch for some examples that are being worked on already.  The db one is the one that I'm most fond of, since it has the potential to support checksum caching, fuzzy matching by checksum, more efficient renaming etc.  The current patch only supports checksum caching at the moment, and the sqlite support makes it very lightweight for those that don't want to stand up a DB server.

I do need to change the db patch into a run-time-loaded library idiom for each DB type. That would let us integrate db support into the main code w/o bloating the dependencies for the main rsync package (e.g. the rsync executable is in the main package with no db dependencies, the sqlite lib is in a separate sqlite-dependent package, etc.).

If anyone wants to help out in this area, feel free to contact me about what you're interested in.

I'll resolve this as "works-for-me" just because of the existing meta-data patches.

Comment 2 Haravikk 2014-01-20 12:42:32 UTC

(In reply to comment #1)
That sounds great actually; how does it work with regards to where the DB is stored, you just specify a path to it? Does it create one for the sender and receiver or just the receiver (I suppose receiver is most useful)? I'll try to check it out when I get a chance, I have been using SQLite recently actually.