The manpage states: "- foo/" would exclude any directory named foo But: + */ - i386/ This should include all directories, then exclude any i386-directory. Having a look at what rsync does: i386/debug/xorg-x11-drv-i810-debuginfo-1.6.5-10.fc6.i386.rpm is uptodate i386/debug/xorg-x11-drv-mga-debuginfo-1.4.5-2.fc6.i386.rpm is uptodate i386/debug/xorg-x11-drv-mouse-debuginfo-1.2.1-1.fc6.i386.rpm is uptodate i386/debug/xorg-x11-drv-s3-debuginfo-0.5.0-1.fc6.i386.rpm is uptodate i386/debug/xorg-x11-drv-savage-debuginfo-2.1.2-3.fc6.i386.rpm is uptodate i386/debug/xorg-x11-drv-tdfx-debuginfo-1.3.0-2.fc6.i386.rpm is uptodate i386/debug/xorg-x11-drv-trident-debuginfo-1.2.3-1.fc6.i386.rpm is uptodate i386/debug/xorg-x11-server-debuginfo-1.1.1-47.10.fc6.i386.rpm is uptodate i386/debug/xorg-x11-server-debuginfo-1.1.1-47.8.fc6.i386.rpm is uptodate i386/debug/xorg-x11-server-debuginfo-1.1.1-47.9.fc6.i386.rpm is uptodate i386/debug/xorg-x11-xinit-debuginfo-1.0.2-15.fc6.i386.rpm is uptodate i386-Directories are included. + */ - */ This should include all directories, then exclude them all. rsync: x86_64/debug/repodata/filelists.xml.gz x86_64/debug/repodata/other.xml.gz x86_64/debug/repodata/primary.xml.gz x86_64/debug/repodata/repomd.xml x86_64/repodata/filelists.xml.gz x86_64/repodata/other.xml.gz x86_64/repodata/primary.xml.gz x86_64/repodata/repomd.xml x86_64/repodata/updateinfo.xml.gz All directories are included and finaly transfered. + */ - * This should include all directories, then exclude all files or directories. rsync: This works as expected. + i386 - * This should include all i386 directories, then exclude all files. rsync: No files transfered. + i386/ - * This should include all i386-directories, then exclude all files. rsync: No files transfered, directory created. + i386** - * This should include all i386-directories and files, excluding all others. rsync: i386/debug/repodata/other.xml.gz is uptodate i386/debug/repodata/primary.xml.gz is uptodate i386/debug/repodata/repomd.xml is uptodate i386/repodata/filelists.xml.gz is uptodate i386/repodata/other.xml.gz is uptodate i386/repodata/primary.xml.gz is uptodate This is OK, but only if the i386-directory is at transfer root. Any other i386-directory will not be transfered at all! + i386/ + i386/* - * This should include the i386-directory and files. rsync: i386/xsane-0.994-2.fc6.i386.rpm is uptodate i386/xsane-gimp-0.994-2.fc6.i386.rpm is uptodate i386/xterm-225-1.fc6.i386.rpm is uptodate i386/yelp-2.16.0-13.fc6.i386.rpm is uptodate i386/ypbind-1.19-7.fc6.i386.rpm is uptodate i386/yum-3.0.6-1.fc6.noarch.rpm is uptodate i386/yum-metadata-parser-1.0.3-1.fc6.i386.rpm is uptodate This is OK. Taking the description and a file hirarchy as given below (found on rsync://mirrors.kernel.org/: fedora fedora/core fedora/core/updates fedora/core/updates/6 fedora/core/updates/6/ppc fedora/core/updates/6/ppc/debug fedora/core/updates/6/ppc/debug/repodata fedora/core/updates/6/ppc/repodata fedora/core/updates/6/x86_64 fedora/core/updates/6/x86_64/debug fedora/core/updates/6/x86_64/debug/repodata fedora/core/updates/6/x86_64/repodata fedora/core/updates/6/i386 fedora/core/updates/6/i386/debug fedora/core/updates/6/i386/debug/repodata fedora/core/updates/6/i386/repodata fedora/core/updates/6/SRPMS fedora/core/updates/6/SRPMS/repodata fedora/core/6 fedora/core/6/ppc fedora/core/6/ppc/iso fedora/core/6/ppc/debug fedora/core/6/ppc/debug/repodata fedora/core/6/ppc/os fedora/core/6/ppc/os/ppc fedora/core/6/ppc/os/ppc/iSeries fedora/core/6/ppc/os/ppc/chrp fedora/core/6/ppc/os/ppc/mac fedora/core/6/ppc/os/ppc/ppc32 fedora/core/6/ppc/os/ppc/ppc64 fedora/core/6/ppc/os/Fedora fedora/core/6/ppc/os/Fedora/base fedora/core/6/ppc/os/Fedora/RPMS fedora/core/6/ppc/os/stylesheet-images fedora/core/6/ppc/os/images fedora/core/6/ppc/os/images/iSeries fedora/core/6/ppc/os/images/netboot fedora/core/6/ppc/os/etc fedora/core/6/ppc/os/repodata fedora/core/6/x86_64 fedora/core/6/x86_64/iso fedora/core/6/x86_64/debug fedora/core/6/x86_64/debug/repodata fedora/core/6/x86_64/os fedora/core/6/x86_64/os/Fedora fedora/core/6/x86_64/os/Fedora/base fedora/core/6/x86_64/os/Fedora/RPMS fedora/core/6/x86_64/os/stylesheet-images fedora/core/6/x86_64/os/isolinux fedora/core/6/x86_64/os/images fedora/core/6/x86_64/os/images/xen fedora/core/6/x86_64/os/images/pxeboot fedora/core/6/x86_64/os/repodata fedora/core/6/source fedora/core/6/source/iso fedora/core/6/source/SRPMS fedora/core/6/source/SRPMS/repodata fedora/core/6/i386 fedora/core/6/i386/iso fedora/core/6/i386/debug fedora/core/6/i386/debug/repodata fedora/core/6/i386/os fedora/core/6/i386/os/Fedora fedora/core/6/i386/os/Fedora/base fedora/core/6/i386/os/Fedora/RPMS fedora/core/6/i386/os/stylesheet-images fedora/core/6/i386/os/isolinux fedora/core/6/i386/os/images fedora/core/6/i386/os/images/xen fedora/core/6/i386/os/images/pxeboot fedora/core/6/i386/os/repodata The exclude file: + */ - ppc/ - ppc64/ - x86_64/ - source/ should reveal at least in creating the directories and transfering only files within these: fedora/core/6/i386 fedora/core/6/i386/iso fedora/core/6/i386/debug fedora/core/6/i386/debug/repodata fedora/core/6/i386/os fedora/core/6/i386/os/Fedora fedora/core/6/i386/os/Fedora/base fedora/core/6/i386/os/Fedora/RPMS fedora/core/6/i386/os/stylesheet-images fedora/core/6/i386/os/isolinux fedora/core/6/i386/os/images fedora/core/6/i386/os/images/xen fedora/core/6/i386/os/images/pxeboot fedora/core/6/i386/os/repodata This does not happen. All directories and all files are transfered, regardless of being excluded or not. Trying: + i386 - * doesnt transfer anything, but should transfer all directories named i386. Trying: + i386** - * only transfers directories named i386 within the transfer root.# Trying: + i386/** - * transfers nothing, but should include any directory named i386 and all subdirectories within transfer-root. Trying: + **/i386 - * Only creates the directory i386 if this is at transfer-root, but should have had created any directory named i386, since '**/' is stated to match any directory, regardless of deep. Trying: + **/i386 + **/i386/* - * Only transfers i386-directory and included files, directories named i386 deep inside the hierarcy are not recognized. In short: the matching algorithm described in the rsync documentation is not the one implemented! It is badly broken and nearly unusable, since you will have to experiment what might be matched and what not. All tests done using: rsync -avvP -n --exclude-from=exclude.lst rsync://mirrors.kernel.org/fedora/ /tmp/test/
The same ist for rsync 2.6.9.
Please re-read the exclude section of the man page. It describes how the first match is the one that takes effect, so a directory exclude that follows a */ include will never be seen. Just change the order of the rules so that the exceptions come before the general rules and it will work fine.
I have tried that too, but without any luck: INCLUDE/EXCLUDE PATTERN RULES You can include and exclude files by specifying patterns using the "+", "-", etc. filter rules (as introduced in the FILTER RULES section above). The include/exclude rules each specify a pattern that is matched against the names of the files that are going to be transferred. These patterns can take several forms: o if the pattern starts with a / then it is anchored to a particular spot in the hierarchy of files, oth- erwise it is matched against the end of the pathname. This is similar to a leading ^ in regular expres- sions. Thus "/foo" would match a file named "foo" at either the "root of the transfer" (for a global rule) or in the merge-file’s directory (for a per- directory rule). An unqualified "foo" would match any file or directory named "foo" anywhere in the tree because the algorithm is applied recursively from the top down; it behaves as if each path component gets a turn at being the end of the file name. Even the unanchored "sub/foo" would match at any point in the hierarchy where a "foo" was found within a direc- tory named "sub". See the section on ANCHORING INCLUDE/EXCLUDE PATTERNS for a full discussion of how to specify a pattern that matches at the root of the transfer. o if the pattern ends with a / then it will only match a directory, not a file, link, or device. if this where true, an --exclude-from file + 6/ - * should only match directories named '6'. For example core/6 update/6 and so on. Could you please explain me, why I am seeing with this rule above $ rsync -av --exclude-from=exclude.txt rsync://mirrors.kernel.org/fedora/ [...] receiving file list ... done drwxr-xr-x 77 2007/05/21 23:37:48 . sent 101 bytes received 787 bytes 118.40 bytes/sec total size is 0 speedup is 0.00 The driectories '6' are not matched at all --- and there are some of them within this hierarchy! At least these I expect to be listed! Changing the rule to + 6/ + 6/* - * if following the description should make it match all files within any directory named '6'. But again: $ rsync -av --exclude-from=exclude.txt rsync://mirrors.kernel.org/fedora/ [...] receiving file list ... done drwxr-xr-x 77 2007/05/21 23:37:48 . sent 110 bytes received 787 bytes 94.42 bytes/sec total size is 0 speedup is 0.00 o rsync chooses between doing a simple string match and wildcard matching by checking if the pattern con- tains one of these three wildcard characters: ’*’, ’?’, and ’[’ . o a ’*’ matches any non-empty path component (it stops at slashes). if this where true, the rule + */*/* - * would match any file or directory like core/6/... But: $ rsync -av --exclude-from=exclude.txt rsync://mirrors.kernel.org/fedora/ [...] receiving file list ... done drwxr-xr-x 77 2007/05/21 23:37:48 . sent 104 bytes received 787 bytes 93.79 bytes/sec total size is 0 speedup is 0.00 No match at all! o use ’**’ to match anything, including slashes. o a ’?’ matches any character except a slash (/). o a ’[’ introduces a character class, such as [a-z] or [[:alpha:]]. o in a wildcard pattern, a backslash can be used to escape a wildcard character, but it is matched liter- ally when no wildcards are present. o if the pattern contains a / (not counting a trailing /) or a "**", then it is matched against the full pathname, including any leading directories. If the pattern doesn’t contain a / or a "**", then it is matched only against the final component of the filename. (Remember that the algorithm is applied recursively so "full filename" can actually be any portion of a path from the starting directory on down.) If this where true, the rule + /**/6 - * would match all files or directories '6'. But: $ rsync -av --exclude-from=exclude.txt rsync://mirrors.kernel.org/fedora/ [...] receiving file list ... done drwxr-xr-x 77 2007/05/21 23:37:48 . sent 104 bytes received 787 bytes 93.79 bytes/sec total size is 0 speedup is 0.00 No match at all! o a trailing "dir_name/***" will match both the directory (as if "dir_name/" had been specified) and all the files in the directory (as if "dir_name/**" had been specified). (This behavior is new for version 2.6.7.)
(In reply to comment #3) > if this where true, an --exclude-from file > > + 6/ > - * > > should only match directories named '6'. For example > > core/6 > update/6 > > and so on. Could you please explain me, why I am seeing with this rule above > > $ rsync -av --exclude-from=exclude.txt rsync://mirrors.kernel.org/fedora/ > [...] > receiving file list ... done > drwxr-xr-x 77 2007/05/21 23:37:48 . > > sent 101 bytes received 787 bytes 118.40 bytes/sec > total size is 0 speedup is 0.00 > > The driectories '6' are not matched at all --- and there are some of them > within this hierarchy! At least these I expect to be listed! In all of your examples, rsync's behavior is correct. The "core" and "update" directories are excluded by the "- *", so rsync never even goes inside to check for "6" directories. The man page explains this: ``Note that, when using the --recursive (-r) option (which is implied by -a), every subcomponent of every path is visited from the top down, so include/exclude patterns get applied recursively to each subcomponent's full name (e.g. to include "/foo/bar/baz" the subcomponents "/foo" and "/foo/bar" must not be excluded). The exclude patterns actually short-circuit the directory traversal stage when rsync finds the files to send. If a pattern excludes a particular parent directory, it can render a deeper include pattern ineffectual because rsync did not descend through that excluded section of the hierarchy. This is particularly important when using a trailing '*' rule. For instance, this won't work: + /some/path/this-file-will-not-be-found + /file-is-included - * This fails because the parent directory "some" is excluded by the '*' rule, so rsync never visits any of the files in the "some" or "some/path" directories.''
It would be better to ask questions about how to understand includes on the mailing list.
> It would be better to ask questions about how to > understand includes on the mailing list. If the description on how to understand includes is this misleading that I have to ask how to understand them on the mailing list, wouldn't you agree that such a description should be overhauled to make it more clear?
OK: + /some/path/this-file-will-not-be-found + /file-is-included - * This fails because the parent directory "some" is excluded by the '*' rule, so rsync never visits any of the files in the "some" or "some/path" directories.'' If I interpret correctly, what this states, it means '/**/6/' is useless combined with '- *', since no directory will ever be matched, since you are not expanding first to find a match for '/**/6/'? It is not clear. The same is with + 6/ - * are you traversing all directories, or just reading the contents of the one we are in? The manpage states that '+ 6/ finds any directory named '6'. Thus I assume you traversing all available directories first, looking for one matching '6'? If this is not true, meaning you are testing within the directory we are in if there is no match (one named '6') discarding them, never traversing down the tree? This is not clearly stated. I am missing a clear description what is matched by all those rules spawning more than just one directory: + **/ + /**/ + 6/ (implicitly something like (**/6/) If you are assuming /fedora/core/6 you are reading only /fedora comparing against '/**/6/'. Since this does not match the '- *' is applied. This matches. The directory is removed?! I'd assume with + /**/6/ - * you'd try to find a match expanding until there is none. Meaning: /fedora has subdirectory /core has subdirectory /6, this matches /**/6, since /**/ matches any path, thus /fedora/core/6 is matched. If this isn't done, state it clearly. '/**/6' this way will never match anything if followed by a rule like '- *'. If you are in need of including only certain deeply nested subdirectories the includes are not not of use in most cases and the only way transfering deeply nested directories would be to analyze a whole directory traversal by rsync feeding the output into grep, sed, or any tool capable of handling strings efficiently, creating an include file yourself.
Please leave this closed.
Also, check out the description of short-circuiting the descent in the man page: http://rsync.samba.org/ftp/rsync/rsync.html This appears near the start of the "INCLUDE/EXCLUDE PATTERN RULES" section (right after the list of pattern types).
(In reply to comment #7) > OK: > + /some/path/this-file-will-not-be-found > + /file-is-included > - * > > This fails because the parent directory "some" is excluded by the '*' rule, so > rsync never visits any of the files in the "some" or "some/path" directories.'' > > If I interpret correctly, what this states, it means '/**/6/' is useless > combined with '- *', since no directory will ever be matched, since you are not > expanding first to find a match for '/**/6/'? It is not clear. Did you read the previous paragraph? It says, "If a pattern excludes a particular parent directory, it can render a deeper include pattern ineffectual because rsync did not descend through that excluded section of the hierarchy." To me, this makes it clear that an exclude pattern cuts rsync off altogether from scanning a subtree of the source so that files inside that subtree are never even considered for transmission, whether or not they would be included according to your rules. There is nothing in the man page to justify the interpretation that rsync "expand[s] first to find a match for '/**/6/'". > The same is with > > + 6/ > - * > > are you traversing all directories, or just reading the contents of the one we > are in? The manpage states that '+ 6/ finds any directory named '6'. No, nowhere does the manpage state that include patterns "find" anything. '+ 6/' matches and includes any directory named '6' *if and when* such a directory is encountered during the traversal. > Thus I > assume you traversing all available directories first, looking for one matching > '6'? If this is not true, meaning you are testing within the directory we are > in if there is no match (one named '6') discarding them, never traversing down > the tree? > > This is not clearly stated. I'm ignoring this because it is based on a statement that isn't in the manpage. > I am missing a clear description what is matched by > all those rules spawning more than just one directory: > > + **/ > + /**/ > + 6/ (implicitly something like (**/6/) > > > If you are assuming > > /fedora/core/6 > > you are reading only > > /fedora > > comparing against '/**/6/'. Since this does not match the '- *' is applied. > This matches. The directory is removed?! Correct. > I'd assume with > > + /**/6/ > - * > > you'd try to find a match expanding until there is none. Meaning: > > /fedora has subdirectory /core has subdirectory /6, this matches /**/6, since > /**/ matches any path, thus /fedora/core/6 is matched. There is nothing in the manpage to justify this assumption. If you are misled by your own faulty assumption, that's your fault, not rsync's. > If this isn't done, > state it clearly. '/**/6' this way will never match anything if followed by a > rule like '- *'. The example I quoted from the manpage is intended to explain just that: ``For instance, this won't work: + /some/path/this-file-will-not-be-found + /file-is-included - * This fails because the parent directory "some" is excluded by the '*' rule, so rsync never visits any of the files in the "some" or "some/path" directories.'' Granted, that example isn't as explicit as it possibly could be that the first rule has no effect, but it ought to be enough to catch the attention of people using a similar filter file (like you) and get them to read the rest of the description so they understand why it won't work. > If you are in need of including only certain deeply nested subdirectories the > includes are not not of use in most cases and the only way transfering deeply > nested directories would be to analyze a whole directory traversal by rsync > feeding the output into grep, sed, or any tool capable of handling strings > efficiently, creating an include file yourself. No, the man page goes on to explain another alternative: ``One solution is to ask for all directories in the hierarchy to be included by using a single rule: "+ */" (put it somewhere before the "- *" rule), and perhaps use the --prune-empty-dirs option.'' If you wanted rsync to provide an easier way to include only certain subdirectories, that would be a legitimate feature request.