The Samba-Bugzilla – Bug 4693
Amazon S3 storage interface for rsync
Last modified: 2010-08-17 09:08:26 UTC
Amazon last year launched a "Simple Storage Service":
Amazon S3 is intentionally built with a minimal feature set.
* Write, read, and delete objects containing from 1 byte to 5 gigabytes of data each. The number of objects you can store is unlimited.
* Each object is stored and retrieved via a unique, developer-assigned key.
* Authentication mechanisms are provided to ensure that data is kept secure from unauthorized access. Objects can be made private or public, and rights can be granted to specific users.
* Uses standards-based REST and SOAP interfaces designed to work with any Internet-development toolkit.
* Built to be flexible so that protocol or functional layers can easily be added. Default download protocol is HTTP. A BitTorrent(TM) protocol interface is provided to lower costs for high-scale distribution. Additional interfaces will be added in the future.
I would like to see rsync support S3 as a storage target. There are utilities that perform rsync-like functionality with S3 but they are inferior, IMHO, to rsync.
I'm willing to provide funded S3 access credentials and a small incentive payment upon completion (ie. integration with the rsync standard release) to recognized, qualified, rsync developers. Obviously this has to all be negotiated.
Since S3 is not a filesystem there will need to be conventions created for how filesystem metadata (permissions, etc.) is stored on S3. Whole file checksums using the MD5 algorithm are supported by S3.
This is a project that I'd like to see completed for personal use... not a corporate funded effort so don't get starry eyes. I think it would be a wildly popular feature based upon the uptake S3 is getting. Something like ~5 Billion objects are stored on S3 so there is plenty of use going on out there.
List buckets on S3
List contents of the testrsync bucket
rsync --create-bucket s3.amazonaws.com::testrsync
Create the bucket if it does not exist
rsync s3.amazonaws.com::testrsync/testfile ./testfile
Transfer testfile from the testrsync bucket
rsync --create-bucket ./testfile s3.amazonaws.com::testrsync/testfile
Transfer testfile to the testrsync bucket
rsync -avz $HOME s3.amazonaws.com::testrsync
Transfer contents of $HOME recursively to the testrsync bucket preserving everything.
rsync -avz s3.amazonaws.com::testrsync $HOME
Bring it all back.
There will also need to be some S3 permission modifiers that apply to bucket and object creation:
That doesn't cover all of the ACL options S3 can do but those are the ones I use.
Contact me if you are interested in working on this.
That sounds interesting. However, I think it would be senseless to dump a lot of functionality specific to one commercial storage service (including a completely new protocol) into the standard release of a generic tool like rsync, and I'm pretty sure Wayne will agree. Two possible alternatives are to create a separate specialized program "s3rsync" or to run plain rsync on a FUSE filesystem that exposes S3. In any event, I think work on getting rsync to access S3 is not germane to the main rsync bug database.
Marking WONTFIX as per comment #1.
Sounds bad to include this functionality but there are some good sides to that.
S3 can be used with programs that use rsync, e.g. rsnapshot etc.
Almost impossible to use it with s3sync, to much hacking to do.
S3 is growing and incredibly cheap.