Bug 4764 - Wrong include/exclude descriptions
Summary: Wrong include/exclude descriptions
Status: CLOSED INVALID
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 2.6.8
Hardware: Other Linux
: P3 normal (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-07-04 13:41 UTC by Thomas Schweikle
Modified: 2007-07-05 20:16 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Schweikle 2007-07-04 13:41:49 UTC
The manpage states:
 "- foo/" would exclude any directory named foo

But:
+ */
- i386/
This should include all directories, then exclude any i386-directory. Having a look at what rsync does:
i386/debug/xorg-x11-drv-i810-debuginfo-1.6.5-10.fc6.i386.rpm is uptodate
i386/debug/xorg-x11-drv-mga-debuginfo-1.4.5-2.fc6.i386.rpm is uptodate
i386/debug/xorg-x11-drv-mouse-debuginfo-1.2.1-1.fc6.i386.rpm is uptodate
i386/debug/xorg-x11-drv-s3-debuginfo-0.5.0-1.fc6.i386.rpm is uptodate
i386/debug/xorg-x11-drv-savage-debuginfo-2.1.2-3.fc6.i386.rpm is uptodate
i386/debug/xorg-x11-drv-tdfx-debuginfo-1.3.0-2.fc6.i386.rpm is uptodate
i386/debug/xorg-x11-drv-trident-debuginfo-1.2.3-1.fc6.i386.rpm is uptodate
i386/debug/xorg-x11-server-debuginfo-1.1.1-47.10.fc6.i386.rpm is uptodate
i386/debug/xorg-x11-server-debuginfo-1.1.1-47.8.fc6.i386.rpm is uptodate
i386/debug/xorg-x11-server-debuginfo-1.1.1-47.9.fc6.i386.rpm is uptodate
i386/debug/xorg-x11-xinit-debuginfo-1.0.2-15.fc6.i386.rpm is uptodate

i386-Directories are included.


+ */
- */
This should include all directories, then exclude them all. rsync:
x86_64/debug/repodata/filelists.xml.gz
x86_64/debug/repodata/other.xml.gz
x86_64/debug/repodata/primary.xml.gz
x86_64/debug/repodata/repomd.xml
x86_64/repodata/filelists.xml.gz
x86_64/repodata/other.xml.gz
x86_64/repodata/primary.xml.gz
x86_64/repodata/repomd.xml
x86_64/repodata/updateinfo.xml.gz

All directories are included and finaly transfered.


+ */
- *
This should include all directories, then exclude all files or directories. rsync:

This works as expected.


+ i386
- *
This should include all i386 directories, then exclude all files. rsync:

No files transfered.


+ i386/
- *
This should include all i386-directories, then exclude all files. rsync:

No files transfered, directory created.


+ i386**
- *
This should include all i386-directories and files, excluding all others. rsync:
i386/debug/repodata/other.xml.gz is uptodate
i386/debug/repodata/primary.xml.gz is uptodate
i386/debug/repodata/repomd.xml is uptodate
i386/repodata/filelists.xml.gz is uptodate
i386/repodata/other.xml.gz is uptodate
i386/repodata/primary.xml.gz is uptodate

This is OK, but only if the i386-directory is at transfer root. Any other i386-directory will not be transfered at all!


+ i386/
+ i386/*
- *
This should include the i386-directory and files. rsync:
i386/xsane-0.994-2.fc6.i386.rpm is uptodate
i386/xsane-gimp-0.994-2.fc6.i386.rpm is uptodate
i386/xterm-225-1.fc6.i386.rpm is uptodate
i386/yelp-2.16.0-13.fc6.i386.rpm is uptodate
i386/ypbind-1.19-7.fc6.i386.rpm is uptodate
i386/yum-3.0.6-1.fc6.noarch.rpm is uptodate
i386/yum-metadata-parser-1.0.3-1.fc6.i386.rpm is uptodate

This is OK.


Taking the description and a file hirarchy as given below (found on rsync://mirrors.kernel.org/:
fedora
fedora/core
fedora/core/updates
fedora/core/updates/6
fedora/core/updates/6/ppc
fedora/core/updates/6/ppc/debug
fedora/core/updates/6/ppc/debug/repodata
fedora/core/updates/6/ppc/repodata
fedora/core/updates/6/x86_64
fedora/core/updates/6/x86_64/debug
fedora/core/updates/6/x86_64/debug/repodata
fedora/core/updates/6/x86_64/repodata
fedora/core/updates/6/i386
fedora/core/updates/6/i386/debug
fedora/core/updates/6/i386/debug/repodata
fedora/core/updates/6/i386/repodata
fedora/core/updates/6/SRPMS
fedora/core/updates/6/SRPMS/repodata
fedora/core/6
fedora/core/6/ppc
fedora/core/6/ppc/iso
fedora/core/6/ppc/debug
fedora/core/6/ppc/debug/repodata
fedora/core/6/ppc/os
fedora/core/6/ppc/os/ppc
fedora/core/6/ppc/os/ppc/iSeries
fedora/core/6/ppc/os/ppc/chrp
fedora/core/6/ppc/os/ppc/mac
fedora/core/6/ppc/os/ppc/ppc32
fedora/core/6/ppc/os/ppc/ppc64
fedora/core/6/ppc/os/Fedora
fedora/core/6/ppc/os/Fedora/base
fedora/core/6/ppc/os/Fedora/RPMS
fedora/core/6/ppc/os/stylesheet-images
fedora/core/6/ppc/os/images
fedora/core/6/ppc/os/images/iSeries
fedora/core/6/ppc/os/images/netboot
fedora/core/6/ppc/os/etc
fedora/core/6/ppc/os/repodata
fedora/core/6/x86_64
fedora/core/6/x86_64/iso
fedora/core/6/x86_64/debug
fedora/core/6/x86_64/debug/repodata
fedora/core/6/x86_64/os
fedora/core/6/x86_64/os/Fedora
fedora/core/6/x86_64/os/Fedora/base
fedora/core/6/x86_64/os/Fedora/RPMS
fedora/core/6/x86_64/os/stylesheet-images
fedora/core/6/x86_64/os/isolinux
fedora/core/6/x86_64/os/images
fedora/core/6/x86_64/os/images/xen
fedora/core/6/x86_64/os/images/pxeboot
fedora/core/6/x86_64/os/repodata
fedora/core/6/source
fedora/core/6/source/iso
fedora/core/6/source/SRPMS
fedora/core/6/source/SRPMS/repodata
fedora/core/6/i386
fedora/core/6/i386/iso
fedora/core/6/i386/debug
fedora/core/6/i386/debug/repodata
fedora/core/6/i386/os
fedora/core/6/i386/os/Fedora
fedora/core/6/i386/os/Fedora/base
fedora/core/6/i386/os/Fedora/RPMS
fedora/core/6/i386/os/stylesheet-images
fedora/core/6/i386/os/isolinux
fedora/core/6/i386/os/images
fedora/core/6/i386/os/images/xen
fedora/core/6/i386/os/images/pxeboot
fedora/core/6/i386/os/repodata

The exclude file:
+ */
- ppc/
- ppc64/
- x86_64/
- source/

should reveal at least in creating the directories and transfering only files within these:
fedora/core/6/i386
fedora/core/6/i386/iso
fedora/core/6/i386/debug
fedora/core/6/i386/debug/repodata
fedora/core/6/i386/os
fedora/core/6/i386/os/Fedora
fedora/core/6/i386/os/Fedora/base
fedora/core/6/i386/os/Fedora/RPMS
fedora/core/6/i386/os/stylesheet-images
fedora/core/6/i386/os/isolinux
fedora/core/6/i386/os/images
fedora/core/6/i386/os/images/xen
fedora/core/6/i386/os/images/pxeboot
fedora/core/6/i386/os/repodata

This does not happen. All directories and all files are transfered, regardless of being excluded or not.

Trying:
+ i386
- *

doesnt transfer anything, but should transfer all directories named i386.

Trying:
+ i386**
- *

only transfers directories named i386 within the transfer root.#

Trying:
+ i386/**
- *

transfers nothing, but should include any directory named i386 and all subdirectories within transfer-root.

Trying:
+ **/i386
- *

Only creates the directory i386 if this is at transfer-root, but should have had created any directory named i386, since '**/' is stated to match any directory, regardless of deep.

Trying:
+ **/i386
+ **/i386/*
- *

Only transfers i386-directory and included files, directories named i386 deep inside the hierarcy are not recognized.


In short: the matching algorithm described in the rsync documentation is not the one implemented! It is badly broken and nearly unusable, since you will have to experiment what might be matched and what not.


All tests done using:
rsync -avvP -n --exclude-from=exclude.lst rsync://mirrors.kernel.org/fedora/ /tmp/test/
Comment 1 Thomas Schweikle 2007-07-04 13:51:34 UTC
The same ist for rsync 2.6.9.
Comment 2 Wayne Davison 2007-07-04 14:05:39 UTC
Please re-read the exclude section of the man page.  It describes how the first match is the one that takes effect, so a directory exclude that follows a */ include will never be seen.  Just change the order of the rules so that the exceptions come before the general rules and it will work fine.
Comment 3 Thomas Schweikle 2007-07-05 17:13:17 UTC
I have tried that too, but without any luck:

INCLUDE/EXCLUDE PATTERN RULES
       You can include and exclude files by specifying patterns using
       the "+", "-", etc. filter rules  (as  introduced in  the  FILTER
       RULES section above).  The include/exclude rules each specify a
       pattern that is matched against the names of the files that are
       going to be transferred.  These patterns can take several forms:

       o      if the pattern starts with a / then it is anchored to
              a particular spot in the hierarchy of files,  oth-
              erwise it is matched against the end of the pathname.
              This is similar to a leading ^ in regular expres-
              sions.  Thus "/foo" would match a file named "foo" at
              either the "root of the transfer"  (for  a  global
              rule) or in the merge-file’s directory (for a per-
              directory rule).  An unqualified "foo" would match any
              file or directory named "foo" anywhere in the tree
              because the algorithm is applied recursively from the
              top  down; it behaves as if each path component gets
              a turn at being the end of the file name.  Even the
              unanchored "sub/foo" would match at any point in the
              hierarchy where a "foo" was found within  a  direc-
              tory named "sub".  See the section on ANCHORING
              INCLUDE/EXCLUDE PATTERNS for a full discussion of how
              to specify a pattern that matches at the root of
              the transfer.

       o      if the pattern ends with a / then it will only match
              a directory, not a file, link, or device.

if this where true, an --exclude-from file

+ 6/
- *

should only match directories named '6'. For example

core/6
update/6

and so on. Could you please explain me, why I am seeing with this rule above

$ rsync -av --exclude-from=exclude.txt rsync://mirrors.kernel.org/fedora/
[...]
receiving file list ... done
drwxr-xr-x          77 2007/05/21 23:37:48 .

sent 101 bytes  received 787 bytes  118.40 bytes/sec
total size is 0  speedup is 0.00

The driectories '6' are not matched at all --- and there are some of them within this hierarchy! At least these I expect to be listed!

Changing the rule to
+ 6/
+ 6/*
- *

if following the description should make it match all files within any directory named '6'. But again:
$ rsync -av --exclude-from=exclude.txt rsync://mirrors.kernel.org/fedora/
[...]
receiving file list ... done
drwxr-xr-x          77 2007/05/21 23:37:48 .

sent 110 bytes  received 787 bytes  94.42 bytes/sec
total size is 0  speedup is 0.00


       o      rsync chooses between doing a simple string match and
              wildcard matching by checking if the pattern  con-
              tains one of these three wildcard characters: ’*’,
              ’?’, and ’[’ .

       o      a ’*’ matches any non-empty path component (it stops
              at slashes).

if this where true, the rule

+ */*/*
- *

would match any file or directory like

core/6/...

But:
$ rsync -av --exclude-from=exclude.txt rsync://mirrors.kernel.org/fedora/
[...]
receiving file list ... done
drwxr-xr-x          77 2007/05/21 23:37:48 .

sent 104 bytes  received 787 bytes  93.79 bytes/sec
total size is 0  speedup is 0.00

No match at all!

       o      use ’**’ to match anything, including slashes.

       o      a ’?’ matches any character except a slash (/).

       o      a ’[’ introduces a character class, such as
              [a-z] or [[:alpha:]].

       o      in  a wildcard pattern, a backslash can be used to
              escape a wildcard character, but it is matched liter-
              ally when no wildcards are present.

       o      if the pattern contains a / (not counting a trailing
              /) or a "**", then it is matched against  the  full
              pathname,  including  any  leading directories. If
              the pattern doesn’t contain a / or a "**", then it
              is matched only against the final component of the
              filename.   (Remember  that  the  algorithm  is
              applied recursively  so  "full  filename"  can
              actually be any portion of a path from the starting
              directory on down.)

If this where true, the rule

+ /**/6
- *

would match all files or directories '6'. But:
$ rsync -av --exclude-from=exclude.txt rsync://mirrors.kernel.org/fedora/
[...]
receiving file list ... done
drwxr-xr-x          77 2007/05/21 23:37:48 .

sent 104 bytes  received 787 bytes  93.79 bytes/sec
total size is 0  speedup is 0.00

No match at all!

       o      a trailing "dir_name/***" will match both the directory
              (as if "dir_name/" had been specified)  and  all
              the  files in the directory (as if "dir_name/**"
              had been specified).  (This behavior is new for version
              2.6.7.)

Comment 4 Matt McCutchen 2007-07-05 17:39:37 UTC
(In reply to comment #3)
> if this where true, an --exclude-from file
> 
> + 6/
> - *
> 
> should only match directories named '6'. For example
> 
> core/6
> update/6
> 
> and so on. Could you please explain me, why I am seeing with this rule above
> 
> $ rsync -av --exclude-from=exclude.txt rsync://mirrors.kernel.org/fedora/
> [...]
> receiving file list ... done
> drwxr-xr-x          77 2007/05/21 23:37:48 .
> 
> sent 101 bytes  received 787 bytes  118.40 bytes/sec
> total size is 0  speedup is 0.00
> 
> The driectories '6' are not matched at all --- and there are some of them
> within this hierarchy! At least these I expect to be listed!

In all of your examples, rsync's behavior is correct.  The "core" and "update" directories are excluded by the "- *", so rsync never even goes inside to check for "6" directories.  The man page explains this:

``Note that, when using the --recursive (-r) option (which is implied by -a), every subcomponent of every path is visited from the top down, so include/exclude patterns get applied recursively to each subcomponent's full name (e.g. to include "/foo/bar/baz" the subcomponents "/foo" and "/foo/bar" must not be excluded).  The exclude patterns actually short-circuit the directory traversal stage when rsync finds the files to send.  If a pattern excludes a particular parent directory, it can render a deeper include pattern ineffectual because rsync did not descend through that excluded section of the
hierarchy.  This is particularly important when using a trailing '*' rule.  For instance, this won't work:

+ /some/path/this-file-will-not-be-found
+ /file-is-included
- *

This fails because the parent directory "some" is excluded by the '*' rule, so rsync never visits any of the files in the "some" or "some/path" directories.''
Comment 5 Wayne Davison 2007-07-05 18:06:39 UTC
It would be better to ask questions about how to understand includes on the mailing list.
Comment 6 Thomas Schweikle 2007-07-05 18:20:49 UTC
> It would be better to ask questions about how to
> understand includes on the mailing list.

If the description on how to understand includes is this misleading that I have to ask how to understand them on the mailing list, wouldn't you agree that such a description should be overhauled to make it more clear?
Comment 7 Thomas Schweikle 2007-07-05 19:33:53 UTC
OK:
+ /some/path/this-file-will-not-be-found
+ /file-is-included
- *

This fails because the parent directory "some" is excluded by the '*' rule, so
rsync never visits any of the files in the "some" or "some/path" directories.''

If I interpret correctly, what this states, it means '/**/6/' is useless combined with '- *', since no directory will ever be matched, since you are not expanding first to find a match for '/**/6/'? It is not clear.

The same is with

+ 6/
- *

are you traversing all directories, or just reading the contents of the one we are in? The manpage states that '+ 6/ finds any directory named '6'. Thus I assume you traversing all available directories first, looking for one matching '6'? If this is not true, meaning you are testing within the directory we are in if there is no match (one named '6') discarding them, never traversing down the tree?

This is not clearly stated. I am missing a clear description what is matched by all those rules spawning more than just one directory:

+ **/
+ /**/
+ 6/ (implicitly something like (**/6/)


If you are assuming

/fedora/core/6

you are reading only

/fedora

comparing against '/**/6/'. Since this does not match the '- *' is applied. This matches. The directory is removed?! I'd assume with

+ /**/6/
- *

you'd try to find a match expanding until there is none. Meaning:

/fedora has subdirectory /core has subdirectory /6, this matches /**/6, since /**/ matches any path, thus /fedora/core/6 is matched. If this isn't done, state it clearly. '/**/6' this way will never match anything if followed by a rule like '- *'.

If you are in need of including only certain deeply nested subdirectories the includes are not not of use in most cases and the only way transfering deeply nested directories would be to analyze a whole directory traversal by rsync feeding the output into grep, sed, or any tool capable of handling strings efficiently, creating an include file yourself.
Comment 8 Wayne Davison 2007-07-05 19:44:57 UTC
Please leave this closed.
Comment 9 Wayne Davison 2007-07-05 20:00:40 UTC
Also, check out the description of short-circuiting the descent in the man page:

http://rsync.samba.org/ftp/rsync/rsync.html

This appears near the start of the "INCLUDE/EXCLUDE PATTERN RULES" section (right after the list of pattern types).
Comment 10 Matt McCutchen 2007-07-05 20:16:00 UTC
(In reply to comment #7)
> OK:
> + /some/path/this-file-will-not-be-found
> + /file-is-included
> - *
> 
> This fails because the parent directory "some" is excluded by the '*' rule, so
> rsync never visits any of the files in the "some" or "some/path" directories.''
> 
> If I interpret correctly, what this states, it means '/**/6/' is useless
> combined with '- *', since no directory will ever be matched, since you are not
> expanding first to find a match for '/**/6/'? It is not clear.

Did you read the previous paragraph?  It says, "If a pattern excludes a particular parent directory, it can render a deeper include pattern ineffectual because rsync did not descend through that excluded section of the hierarchy."  To me, this makes it clear that an exclude pattern cuts rsync off altogether from scanning a subtree of the source so that files inside that subtree are never even considered for transmission, whether or not they would be included according to your rules.  There is nothing in the man page to justify the interpretation that rsync "expand[s] first to find a match for '/**/6/'".

> The same is with
> 
> + 6/
> - *
> 
> are you traversing all directories, or just reading the contents of the one we
> are in?  The manpage states that '+ 6/ finds any directory named '6'.

No, nowhere does the manpage state that include patterns "find" anything.  '+ 6/' matches and includes any directory named '6' *if and when* such a directory is encountered during the traversal.

> Thus I
> assume you traversing all available directories first, looking for one matching
> '6'? If this is not true, meaning you are testing within the directory we are
> in if there is no match (one named '6') discarding them, never traversing down
> the tree?
> 
> This is not clearly stated.

I'm ignoring this because it is based on a statement that isn't in the manpage.

> I am missing a clear description what is matched by
> all those rules spawning more than just one directory:
> 
> + **/
> + /**/
> + 6/ (implicitly something like (**/6/)
> 
> 
> If you are assuming
> 
> /fedora/core/6
> 
> you are reading only
> 
> /fedora
> 
> comparing against '/**/6/'. Since this does not match the '- *' is applied.
> This matches. The directory is removed?!

Correct.

> I'd assume with
> 
> + /**/6/
> - *
> 
> you'd try to find a match expanding until there is none. Meaning:
> 
> /fedora has subdirectory /core has subdirectory /6, this matches /**/6, since
> /**/ matches any path, thus /fedora/core/6 is matched.

There is nothing in the manpage to justify this assumption.  If you are misled
by your own faulty assumption, that's your fault, not rsync's.

> If this isn't done,
> state it clearly. '/**/6' this way will never match anything if followed by a
> rule like '- *'.

The example I quoted from the manpage is intended to explain just that:

``For instance, this won't work:

+ /some/path/this-file-will-not-be-found
+ /file-is-included
- *

This fails because the parent directory "some" is excluded by the '*' rule, so
rsync never visits any of the files in the "some" or "some/path" directories.''

Granted, that example isn't as explicit as it possibly could be that the first
rule has no effect, but it ought to be enough to catch the attention of people
using a similar filter file (like you) and get them to read the rest of the
description so they understand why it won't work.

> If you are in need of including only certain deeply nested subdirectories the
> includes are not not of use in most cases and the only way transfering deeply
> nested directories would be to analyze a whole directory traversal by rsync
> feeding the output into grep, sed, or any tool capable of handling strings
> efficiently, creating an include file yourself.

No, the man page goes on to explain another alternative:

``One solution is to ask for all directories in the hierarchy to be included by using a single rule: "+ */" (put it somewhere before the "- *" rule), and perhaps use the --prune-empty-dirs option.''

If you wanted rsync to provide an easier way to include only certain subdirectories, that would be a legitimate feature request.