The Samba-Bugzilla – Bug 9744
Support Git, Mercurial, Subversion ignore lists
Last modified: 2013-03-28 14:01:36 UTC
rsync includes the built-in --filter=:C which is great for archiving source trees containing CVS checkouts, but CVS is an obsolete VCS. There is no built-in equivalent for the more modern Git, Mercurial, or Subversion VCSs.
As http://stackoverflow.com/questions/13713101/rsync-exclude-according-to-gitignore-like-cvs-exclude notes, --filter=':- .gitignore' seems to work pretty well for Git (though there are probably more advanced points of syntax that Git and rsync interpret differently); but --filter=':- .hgignore' does not work well (since Mercurial allows regular expressions by default), and there is nothing at all for Subversion’s svn:ignore.
This means this should probably be in the form of external helpers/plugins and an option that just specifies the helper to use.
I'm guessing, now that rsync no longer builds it's entire job list up front, that a helper would have to work like a filter where rsync feeds it filenames one at a time as it encounters them and gets back a yea/nay answer for each one?
You write the helper in any language you want to do whatever weird job you want.
Alternatively, there is something you can do right now which is just build your file list all up front and hand rsync that file list instead of a pattern. You'd still have to write your own tool to generate the file list but you don't have to convince, and then wait for, anyone to modify rsync.
A filter coprocess would suffice, I think; either accepting or rejecting individual files, or emitting rsync-format patterns. For performance reasons it would probably not work well to fork a new process for each check.
Building the file list up front has the disadvantage that you would have to make a complete pass over the (source) directory tree and result could be enormous (I hope rsync is quick at skipping over non-matching absolute pathnames). On the other hand if you _did_ need to fork some subcommands to get information, it would be more straightforward to minimize the number of such forks needed, e.g. by running `svn pg -R --xml svn:ignore` once in the topmost dir to contain a .svn subdir.
(In reply to comment #2)
> A filter coprocess would suffice, I think; either accepting or rejecting
> individual files, or emitting rsync-format patterns. For performance reasons it
> would probably not work well to fork a new process for each check.
No of course not. It would indeed be a single continuous co-process.
If the source ignore info is simple to parse (no regex) then use any language you like to just read in the the ignore list into an array and check each input filename against it. If the souce ignor info includes regex, then just use a language that has regex built in, anything from awk, perl, whatever, anything so you are not calling sed a zillion times.
> Building the file list up front has the disadvantage that you would have to
> make a complete pass over the (source) directory tree and result could be
> enormous (I hope rsync is quick at skipping over non-matching absolute
> pathnames). On the other hand if you _did_ need to fork some subcommands to get
> information, it would be more straightforward to minimize the number of such
> forks needed, e.g. by running `svn pg -R --xml svn:ignore` once in the topmost
> dir to contain a .svn subdir.
Maybe the way to go is to read the ignore info and just translate it into find syntax, regex and all, then use find to do the work of actually walking the tree and generating the file list efficiently. Then pipe that right into "|rsync --files-from=-"
Remember to use find -print0 and rsync -0
Done. No huge temp file and no wasted double-scan through something possibly huge and the tree-walk, including skipping stuff you should know you can skip, is however efficient "find" is.
(In reply to comment #3)
> pipe that right into "|rsync --files-from=-"
Will this work with --delete-excluded --delete-during? If not, then creating an rsync-format excludes file as a preliminary step would be the better choice.
Sounds like there is no clearly superior way to do this from within rsync itself. When I get a chance I will try to write a script to “drive” rsync with VCS-specific ignore lists, and maybe put it on GitHub if it seems reasonably reusable.