Bug 8529 - Extend --batch to a local cache for backups
Summary: Extend --batch to a local cache for backups
Status: NEW
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 2.6.9
Hardware: All All
: P5 enhancement (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-10-14 19:59 UTC by samba-bugzilla
Modified: 2011-10-14 19:59 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description samba-bugzilla 2011-10-14 19:59:36 UTC
I'm backing up my computer to a server.  I've got almost a terabyte of data in  100k of files, and they don't change much each day.  When rsync runs, it traverses the destination directory on the server to find changes, which chews up a lot of network bandwidth, cpu time, and disk seeks compared to the amount of actual data to send, if any.

After reading the manual page, the --write-batch/--read-batch commands look promising.  As I'm the only one writing to the destination directory, I could cache the current state of the destination directory in a local file, then generate a batch file based on that.  This would cut out 99% of the overhead we're seeing now, reducing a 6-hour rsync to maybe 15 minutes (ie: the time to traverse the source filesystem).

Alternatively I could do:
   find /source-directory -type f -ls | sort > new.txt
   comm -3 old.txt new.txt | gawk -F"\t" '{print $NF}' > todo.txt
   rsync -av --files-from=todo.txt /remote-dest-directory && mv new.txt old.txt
and pray there are no filenames with funky characters in them.

Is this caching feature simple enough to implement?  

Thanks!