Bug 14529 - Please add option to save metadata to single file to speed up backups
Summary: Please add option to save metadata to single file to speed up backups
Status: NEW
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.2.0
Hardware: All All
: P5 enhancement (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-10-11 14:51 UTC by Andras Korn
Modified: 2020-10-11 15:06 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andras Korn 2020-10-11 14:51:55 UTC
There are compelling reasons to use rsync as a backup tool; then snapshot the destination fs to preserve the current backup; and save the next backup to the same destination, again using rsync.

In this scenario, the data in the backup filesystem is only ever changed by rsync.

If there are many files, a backup run will take a very long time and most I/O will be spent in reading the metadata of files to see if the source is different from the destination:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 45.85    0.627125          31     20222           lstat
 30.61    0.418682          20     20222           lgetxattr
 13.54    0.185181          79      2338           getdents64
  3.23    0.044241          22      1982      1982 getxattr
  2.34    0.032001          16      1982           stat
  1.78    0.024293          20      1169           openat
  1.25    0.017112          14      1169           close
  1.05    0.014389          12      1169           fstat
  0.27    0.003737          19       187           brk
  0.04    0.000503          45        11           write
  0.02    0.000306          27        11           read
  0.01    0.000159          14        11           select
------ ----------- ----------- --------- --------- ----------------
100.00    1.367729                 50473      1982 total

If rsync could be told to save all metadata to some "database" in addition to the filesystem, the load on the backup server on subsequent backups of the same source data to the same destination could be much lower. The "database" could be  read into RAM, perhaps in chunks if it's very large, and checking metadata for changes would be almost free.

Of course, if data is changed in the actual filesystem by a tool other than rsync (which would keep the "database" updated), the "database" gets out of sync, but that can't be helped.

This could also be an enhancement of "fake super" -- instead of saving metadata in an xattr for each file separately, all metadata could be saved in a single file, in a location outside the root of the rsync module (or, to support chroot, inside it, but hidden from rsync transfers).
Comment 1 Andras Korn 2020-10-11 15:06:57 UTC
It's completely fine if using this "database" in writable modules implies or requires `max connections = 1` to avoid concurrency/locking issues.