The Samba-Bugzilla – Bug 3491
throttle disk IO during filelist/directory parsing
Last modified: 2010-08-19 07:26:26 UTC
rsync was bringing our webserver to crawl while during file list generation. since the job wasn't time critical, i made it be less aggressive during this step - a rather trivial change to microsleep between each readdir().
--slow-down=100 will usleep() for 1000usec (microseconds) before each readdir.
if i'm not mistaken with 10k directories that'd be ~10second of sleep.
I've seen people try to do this using --bwlimit and/or loop checking loadavg and sending sigstop/sigcont.
not the best fix for disk cache poisoning but still might give a some time to breathe.
sorry, patches are not for the latest ver..
I like this idea. I, too, have seen degradation in response times when rsync starts on a large hierarchy. Using nice didn't help (much).
Wouldn't it be better to count up readdir() calls and then sleep for a longer time when a threshold is reached? Fewer system calls that way. For example, add the slow-down value to a counter and when the counter passes 100,000 then sleep for 100,000 microseconds (0.1 second) and subtract 100,000 from the counter.
BTW, in your example, did you mean --slow-down=1000 (not 100) for 1000 microsecs?
*** Bug 4030 has been marked as a duplicate of this bug. ***
I'd like to have this feature for the reasons others already stated, too, and, looking at the patch, it even doesn't seem to be hard to implement. What's holding it up?
Is rsync still being maintained?
I support this feature too, and the ability to request that the sender does not cache files.
My webserver is a social networking site, and has a deep hierarchy of image files eg, ~/999/999/999/file.jpg with a parallel tree for server generated thumbnails. This is located on a separate device from the main system (reiserfs,noatime of course!).
Rsync is amazing - I use it to back up the server over cable, but the problem is that scanning 50Gb of files in 300,000+ directories hits the server hard, takes a long time, and saturates linux's cache with files that will rarely be used.
The option to sleep between dir reads would be useful, although this would dramatically increase run time!
The most useful thing for me would be an option to tell rsync to tell the kernel to leave the server's present cache untouched. Is this possible?
If not, then I think I'll have to write something to query the database and calculate deltas from the local copy and sync on a per file basis.