8529 – Extend --batch to a local cache for backups

Bug 8529 - Extend --batch to a local cache for backups

Summary: Extend --batch to a local cache for backups

Status:	NEW

Alias:	None

Product:	rsync
Classification:	Unclassified
Component:	core (show other bugs)
Version:	2.6.9
Hardware:	All All

Importance:	P5 enhancement (vote)
Target Milestone:	---
Assignee:	Wayne Davison
QA Contact:	Rsync QA Contact

URL:
Keywords:

Depends on:
Blocks:

Reported:	2011-10-14 19:59 UTC by samba-bugzilla
Modified:	2011-10-14 19:59 UTC (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description samba-bugzilla 2011-10-14 19:59:36 UTC

I'm backing up my computer to a server.  I've got almost a terabyte of data in  100k of files, and they don't change much each day.  When rsync runs, it traverses the destination directory on the server to find changes, which chews up a lot of network bandwidth, cpu time, and disk seeks compared to the amount of actual data to send, if any.

After reading the manual page, the --write-batch/--read-batch commands look promising.  As I'm the only one writing to the destination directory, I could cache the current state of the destination directory in a local file, then generate a batch file based on that.  This would cut out 99% of the overhead we're seeing now, reducing a 6-hour rsync to maybe 15 minutes (ie: the time to traverse the source filesystem).

Alternatively I could do:
   find /source-directory -type f -ls | sort > new.txt
   comm -3 old.txt new.txt | gawk -F"\t" '{print $NF}' > todo.txt
   rsync -av --files-from=todo.txt /remote-dest-directory && mv new.txt old.txt
and pray there are no filenames with funky characters in them.

Is this caching feature simple enough to implement?  

Thanks!