Bug 10575 - Long Delay for Large Folders Even with Incremental File-List
Summary: Long Delay for Large Folders Even with Incremental File-List
Status: RESOLVED WONTFIX
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.1.0
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-29 13:12 UTC by Haravikk
Modified: 2014-05-01 19:25 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Haravikk 2014-04-29 13:12:53 UTC
Okay, so I have a folder I need to backup that unfortunately contains a massive 32,000+ files (it's part of a several hundred gigabytes large disk-image).

However, using rsync to synchronise this folder with a copy on another machine on my local network takes an incredibly long time, largely thanks to a huge delay before any files start being sent. It seems that even with the incremental file-list enabled that rsync is still waiting to scan the entire folder before any files start being transferred, even though files high up in the list should be ready to send right away.

Perhaps I'm misunderstanding how the incremental file-list generation works, but surely the number of files in a folder shouldn't matter for incremental sending, as it can just send the file-paths in several batches so that the transfer can start right away? If this is not how it works then please let me know, and I can file an enhancement instead.

Currently I'm having to use a timestamp based find to pipe files into rsync, but it's not really how I'd like to do things, as it won't detect deletions; fortunately those only happen if I manually request the disk image be compacted, so I can avoid the issue by simply never compacting the image (I don't need to anyway), but it seems like strange behaviour if incremental sending can't handle folders with tons of files properly.
Comment 1 Kevin Korb 2014-04-29 13:18:35 UTC
What is your command line?
Comment 2 Wayne Davison 2014-04-29 17:19:54 UTC
That's the way rsync works.  It has to have the full directory listing so that it can sort it and work on it in order (and potentially run any --delete-during processing first), so it has to wait for the full list.  Sadly, this isn't going to change.  Sorry.
Comment 3 Haravikk 2014-05-01 19:25:02 UTC
Actually it seems the specific issue I was seeing was related to --fuzzy and the fact that the destination machine is a NAS (so much slower than my main machine); even though there was no actual need for fuzzy matching to take place, it presumably forces rsync to wait for the full folder list on the receiving side before it can be sure there aren't any matches for new files.

With --fuzzy removed there is no noticeable delay before changes start getting sent; still takes a while, but that's just the 32,000+ items and the copying, rather than the additional delay I was seeing.

Anyway, I suppose I'll have to resolve the issue by doing my normal --fuzzy command with an exclude rule for the disk image, and then run a separate sync without --fuzzy for just the disk image.

I've posted issue #10581 with some proposals for tuning fuzzy performance to avoid this for a general purpose command.