Bug 4899 - Change sender exclusions to allow files/dirs to be excluded before a stat
Summary: Change sender exclusions to allow files/dirs to be excluded before a stat
Status: ASSIGNED
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 2.6.9
Hardware: x86 Linux
: P3 enhancement (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-08-20 02:31 UTC by David Pei
Modified: 2008-01-12 11:50 UTC (History)
1 user (show)

See Also:


Attachments
Suggested solution of the bug (47.63 KB, text/plain)
2007-08-20 02:51 UTC, David Pei
no flags Details
Alternative solution: check exclusion both ways (2.20 KB, patch)
2007-08-21 06:00 UTC, Matt McCutchen
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Pei 2007-08-20 02:31:28 UTC
Problem: When a mounted dir(for example, "mnt") cannot be visited, rsync will halt there and the shell is halted, even "ctrl -c" can't quit it. If the mountd dir is filtered out, rsync still fail to work.

dirA_________dir1
          |__ mnt
 
dirB_________dir1
$rsync -r -vvvvvvv -L --include-from=filter --exclude="*" dirA/ dirB/

Root cause:
related brief process of rsync:
1 client_run(), called once the connection has been negotiated.
2 read the filter.
3 send_file_list(). 
4 send_files().
 
In step 3, rsync will examine the file one by one, to decide whether it is a symblic, whether it is filtered, and so on. It revokes  do_stat to check the file status.
int do_stat(const char *fname, STRUCT_STAT *st)
{
#if HAVE_OFF64_T
 return stat64(fname, st);
#else
 return stat(fname, st);
#endif
}
 
It revokes system function "stat", which will halt when the mounted dir is crashed. 
The "stat" issue is not due to rsync. But if the mountd dir is filtered out, rsync should work well.

Solution:
Modified "flist.c", 
struct file_struct *make_file(char *fname, struct file_list *flist,
                              STRUCT_STAT *stp, unsigned short flags,
                              int filter_level)
{
...
        memset(sum, 0, SUM_LENGTH);

        /* In case a mounted dir crashed, than stat will hang there even
           the dir has been filtered */
        if (filter_level != NO_FILTERS
            && is_excluded(thisname, 1, filter_level)) {
                rprintf(FINFO, "Excluding %s\n", thisname);
                return NULL;
        } else if (stp && S_ISDIR(stp->st_mode)) {
                st = *stp; /* Needed for "symlink/." with --relative. */
                *linkname = '\0'; /* make IBM code checker happy */
        } else if (readlink_stat(thisname, &st, linkname) != 0) {
...
}
 

}
Comment 1 David Pei 2007-08-20 02:51:03 UTC
Created attachment 2873 [details]
Suggested solution of the bug

In fact, the root cause is "stat", I have reproduced it both on Linux and Solaris. A suggested solution is: when a mounted dir is filtered out, rsync should work well.
Comment 2 Wayne Davison 2007-08-21 00:16:31 UTC
We need to stat() the file to know if it is a directory or not.  Otherwise the directory-specific exclude rules won't work right.

So, the only possible solution that would allow the stat() to be skipped would be to allow the exclude code to deal with an "unknown" file/dir status, and have it stat() a name that matched a dir-only exclusion rule to be sure that it was indeed a dir (at which point the file's dir-ness would be remembered).
Comment 3 David Pei 2007-08-21 01:07:07 UTC
Yes, I agree with you. The core is to skip stat() if the dir is filtered.
In fact in my solution, I assumed the "thisname" is a dir to check whether it is filtered.
1) if "thisname" is a dir, is_excluded(thisname, 1, filter_level) return 1 and no problem occurs.
2) if "thisname" is a file, because it match the filter_list, here is_excluded(thisname, 1, filter_level) also return 1 and no problem occurs.
Thus I prefer change the make_file() a little, rather than change the exclude code.


(In reply to comment #2)
> We need to stat() the file to know if it is a directory or not.  Otherwise the
> directory-specific exclude rules won't work right.
> So, the only possible solution that would allow the stat() to be skipped would
> be to allow the exclude code to deal with an "unknown" file/dir status, and
> have it stat() a name that matched a dir-only exclusion rule to be sure that it
> was indeed a dir (at which point the file's dir-ness would be remembered).

Comment 4 Matt McCutchen 2007-08-21 05:54:26 UTC
(In reply to comment #3)
> In fact in my solution, I assumed the "thisname" is a dir to check whether it
> is filtered.

As Wayne tried to point out, while your solution does fix the hanging on the mounted dir, it is unacceptable because it breaks directory-specific exclude rules.  If the source contains a regular file X and a rule --exclude='X/', rsync will assume X is a directory and exclude it, which is wrong.
Comment 5 Matt McCutchen 2007-08-21 06:00:38 UTC
Created attachment 2876 [details]
Alternative solution: check exclusion both ways

The attached patch gives an alternative way to fix the hanging without breaking directory-specific exclude rules.  make_file calls is_exclude twice to determine whether the file would be excluded if it is a directory and whether it would be excluded if it isn't one.  If the file would be excluded either way, make_file skips calling stat.

This solution has two disadvantages: the filter list is scanned twice (hurting performance), and at -vv level, hypothetical "showing/hiding file/directory X because of pattern Y" messages are printed.