Bug 4693 - Amazon S3 storage interface for rsync
Summary: Amazon S3 storage interface for rsync
Status: RESOLVED WONTFIX
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.0.0
Hardware: Other Linux
: P3 enhancement (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-06-13 10:19 UTC by Brad Dixon
Modified: 2010-08-17 09:08 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Brad Dixon 2007-06-13 10:19:14 UTC
Amazon last year launched a "Simple Storage Service":

---

Amazon S3 is intentionally built with a minimal feature set.

    * Write, read, and delete objects containing from 1 byte to 5 gigabytes of data each. The number of objects you can store is unlimited.
    * Each object is stored and retrieved via a unique, developer-assigned key.
    * Authentication mechanisms are provided to ensure that data is kept secure from unauthorized access. Objects can be made private or public, and rights can be granted to specific users.
    * Uses standards-based REST and SOAP interfaces designed to work with any Internet-development toolkit.
    * Built to be flexible so that protocol or functional layers can easily be added.  Default download protocol is HTTP.  A BitTorrent(TM) protocol interface is provided to lower costs for high-scale distribution.  Additional interfaces will be added in the future. 

---

I would like to see rsync support S3 as a storage target. There are utilities that perform rsync-like functionality with S3 but they are inferior, IMHO, to rsync.

I'm willing to provide funded S3 access credentials and a small incentive payment upon completion (ie. integration with the rsync standard release) to recognized, qualified, rsync developers. Obviously this has to all be negotiated.

Since S3 is not a filesystem there will need to be conventions created for how filesystem metadata (permissions, etc.) is stored on S3. Whole file checksums using the MD5 algorithm are supported by S3.

This is a project that I'd like to see completed for personal use... not a corporate funded effort so don't get starry eyes. I think it would be a wildly popular feature based upon the uptake S3 is getting. Something like ~5 Billion objects are stored on S3 so there is plenty of use going on out there.

Usage:

rsync s3.amazonaws.com::
 List buckets on S3

rsync s3.amazonaws.com::testrsync
 List contents of the testrsync bucket

rsync --create-bucket s3.amazonaws.com::testrsync
 Create the bucket if it does not exist

rsync s3.amazonaws.com::testrsync/testfile ./testfile
 Transfer testfile from the testrsync bucket

rsync --create-bucket ./testfile s3.amazonaws.com::testrsync/testfile
 Transfer testfile to the testrsync bucket

rsync -avz $HOME s3.amazonaws.com::testrsync
 Transfer contents of $HOME recursively to the testrsync bucket preserving everything.

rsync -avz s3.amazonaws.com::testrsync $HOME
 Bring it all back.

There will also need to be some S3 permission modifiers that apply to bucket and object creation:

--s3-public-read
--s3-public-read-write
--s3-private

That doesn't cover all of the ACL options S3 can do but those are the ones I use.

Contact me if you are interested in working on this.
Comment 1 Matt McCutchen 2007-06-13 12:11:57 UTC
That sounds interesting.  However, I think it would be senseless to dump a lot of functionality specific to one commercial storage service (including a completely new protocol) into the standard release of a generic tool like rsync, and I'm pretty sure Wayne will agree.  Two possible alternatives are to create a separate specialized program "s3rsync" or to run plain rsync on a FUSE filesystem that exposes S3.  In any event, I think work on getting rsync to access S3 is not germane to the main rsync bug database.
Comment 2 Matt McCutchen 2008-02-06 21:20:36 UTC
Marking WONTFIX as per comment #1.
Comment 3 Boris Churzin 2010-08-17 09:08:26 UTC
Sounds bad to include this functionality but there are some good sides to that.
S3 can be used with programs that use rsync, e.g. rsnapshot etc.
Almost impossible to use it with s3sync, to much hacking to do.
S3 is growing and incredibly cheap.