The Samba-Bugzilla – Bug 10575
Long Delay for Large Folders Even with Incremental File-List
Last modified: 2014-05-01 19:25:02 UTC
Okay, so I have a folder I need to backup that unfortunately contains a massive 32,000+ files (it's part of a several hundred gigabytes large disk-image).
However, using rsync to synchronise this folder with a copy on another machine on my local network takes an incredibly long time, largely thanks to a huge delay before any files start being sent. It seems that even with the incremental file-list enabled that rsync is still waiting to scan the entire folder before any files start being transferred, even though files high up in the list should be ready to send right away.
Perhaps I'm misunderstanding how the incremental file-list generation works, but surely the number of files in a folder shouldn't matter for incremental sending, as it can just send the file-paths in several batches so that the transfer can start right away? If this is not how it works then please let me know, and I can file an enhancement instead.
Currently I'm having to use a timestamp based find to pipe files into rsync, but it's not really how I'd like to do things, as it won't detect deletions; fortunately those only happen if I manually request the disk image be compacted, so I can avoid the issue by simply never compacting the image (I don't need to anyway), but it seems like strange behaviour if incremental sending can't handle folders with tons of files properly.
What is your command line?
That's the way rsync works. It has to have the full directory listing so that it can sort it and work on it in order (and potentially run any --delete-during processing first), so it has to wait for the full list. Sadly, this isn't going to change. Sorry.
Actually it seems the specific issue I was seeing was related to --fuzzy and the fact that the destination machine is a NAS (so much slower than my main machine); even though there was no actual need for fuzzy matching to take place, it presumably forces rsync to wait for the full folder list on the receiving side before it can be sure there aren't any matches for new files.
With --fuzzy removed there is no noticeable delay before changes start getting sent; still takes a while, but that's just the 32,000+ items and the copying, rather than the additional delay I was seeing.
Anyway, I suppose I'll have to resolve the issue by doing my normal --fuzzy command with an exclude rule for the disk image, and then run a separate sync without --fuzzy for just the disk image.
I've posted issue #10581 with some proposals for tuning fuzzy performance to avoid this for a general purpose command.