Bug 3491 - throttle disk IO during filelist/directory parsing
Summary: throttle disk IO during filelist/directory parsing
Status: NEW
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 2.6.4
Hardware: All Linux
: P3 enhancement (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL: http://vilius.multiply.com/video/item/10
Keywords:
: 4030 (view as bug list)
Depends on:
Blocks:
 
Reported: 2006-02-07 22:13 UTC by Vilius Puidokas
Modified: 2010-08-19 07:26 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vilius Puidokas 2006-02-07 22:13:32 UTC
rsync was bringing our webserver to crawl while during file list generation. since the job wasn't time critical, i made it be less aggressive during this step - a rather trivial change to microsleep between each readdir().

--slow-down=100 will usleep() for 1000usec (microseconds) before each readdir.
if i'm not mistaken with 10k directories that'd be ~10second of sleep.


I've seen people try to do this using --bwlimit and/or loop checking loadavg and sending sigstop/sigcont.
re: http://lists.samba.org/archive/rsync/2004-February/008651.html

not the best fix for disk cache poisoning but still might give a some time to breathe.

sorry, patches are not for the latest ver..
Comment 1 John Van Essen 2006-02-08 02:22:38 UTC
I like this idea.  I, too, have seen degradation in response times when rsync starts on a large hierarchy.  Using nice didn't help (much).

Wouldn't it be better to count up readdir() calls and then sleep for a longer time when a threshold is reached?  Fewer system calls that way.  For example, add the slow-down value to a counter and when the counter passes 100,000 then sleep for 100,000 microseconds (0.1 second) and subtract 100,000 from the counter.

BTW, in your example, did you mean --slow-down=1000 (not 100) for 1000 microsecs?
Comment 2 Matt McCutchen 2008-06-23 19:14:09 UTC
*** Bug 4030 has been marked as a duplicate of this bug. ***
Comment 3 Toni Müller 2008-10-16 14:09:48 UTC
I'd like to have this feature for the reasons others already stated, too, and, looking at the patch, it even doesn't seem to be hard to implement. What's holding it up?
Comment 4 Andy Haveland-Robinson 2010-08-19 07:26:26 UTC
Is rsync still being maintained?

I support this feature too, and the ability to request that the sender does not cache files.

My webserver is a social networking site, and has a deep hierarchy of image files eg, ~/999/999/999/file.jpg with a parallel tree for server generated thumbnails. This is located on a separate device from the main system (reiserfs,noatime of course!).

Rsync is amazing - I use it to back up the server over cable, but the problem is that scanning 50Gb of files in 300,000+ directories hits the server hard, takes a long time, and saturates linux's cache with files that will rarely be used.

The option to sleep between dir reads would be useful, although this would dramatically increase run time!

The most useful thing for me would be an option to tell rsync to tell the kernel to leave the server's present cache untouched. Is this possible?

If not, then I think I'll have to write something to query the database and calculate deltas from the local copy and sync on a per file basis.