Bug 10233 - rsync is spending a lot of time lstat64()'ing --exclude'd files
rsync is spending a lot of time lstat64()'ing --exclude'd files
Status: ASSIGNED
Product: rsync
Classification: Unclassified
Component: core
3.1.0
All All
: P5 enhancement
: ---
Assigned To: Wayne Davison
Rsync QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-28 22:01 UTC by Darxus
Modified: 2013-11-28 17:29 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Darxus 2013-10-28 22:01:22 UTC
I ran:  rsync -Pva --exclude '*.gz' / <destination>

For hours, strace has been scrolling:

lstat64("<file>.gz", {st_mode=S_IFREG|0644, st_size=74, ...}) = 0
lstat64("<file>.gz", {st_mode=S_IFREG|0644, st_size=419, ...}) = 0
lstat64("<file>.gz", {st_mode=S_IFREG|0644, st_size=449, ...}) = 0
lstat64("<file>.gz", {st_mode=S_IFREG|0644, st_size=408, ...}) = 0
lstat64("<file>.gz", {st_mode=S_IFREG|0644, st_size=75, ...}) = 0
lstat64("<file>.gz", {st_mode=S_IFREG|0644, st_size=579, ...}) = 0
lstat64("<file>.gz", {st_mode=S_IFREG|0644, st_size=339, ...}) = 0

Seems like this statting of files which match the --exclude pattern could be skipped, saving a lot of time in some cases?

I'm using rsync version 3.0.7.  BasketCase in IRC reproduced this with v3.1.0.
Comment 1 Darxus 2013-10-29 15:57:28 UTC
A useful workaround would be something like:

find / | grep -v 'gz$' > filelist.txt
rsync -Pva --files-from=filelist.txt / <destination>

Also, don't rsync /proc/kcore :/
Comment 2 Wayne Davison 2013-11-28 17:29:51 UTC
Rsync wants to know if it is a file or a directory before applying the exclude rules, or it wouldn't have enough information for dir-only exclude rules.  It might be possible to make the stat call lazy, where the first dir-only rule to match the name checks if the file-type is known, and does a stat if it is not.  However, that might over complicate things.  I'm marking this as an enhancement request.