Bug 7876 - please implement o_direct
please implement o_direct
Status: ASSIGNED
Product: rsync
Classification: Unclassified
Component: core
3.1.0
Other Linux
: P3 enhancement
: ---
Assigned To: Wayne Davison
Rsync QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-12-20 10:03 UTC by costin gusa
Modified: 2011-06-15 14:48 UTC (History)
2 users (show)

See Also:


Attachments
Proposed patch (206 bytes, patch)
2011-01-25 13:35 UTC, Daniel Hahler
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description costin gusa 2010-12-20 10:03:25 UTC
running nightly rsync leaves a lot of unneeded data in the filesystem cache.
 an "--o_direct" argument would solve this.
thank you.
Comment 1 Daniel Hahler 2011-01-25 05:33:10 UTC
From a quick look at the source it might be as easy as using this option in do_open in syscall.c:

int do_open(const char *pathname, int flags, mode_t mode)
{
        [...]
        return open(pathname, flags | O_BINARY, mode);
}

Looking at http://kerneltrap.org/node/7563 it appears that interfaces like posix_fadvise are preferred by Linus though.
This would use the POSIX_FADV_NOREUSE or POSIX_FADV_DONTNEED flag (man page: http://linux.die.net/man/2/posix_fadvise).
Comment 2 Daniel Hahler 2011-01-25 13:35:48 UTC
Created attachment 6227 [details]
Proposed patch

This is a noobish attempt at getting this fixed.

I have forgotten to diff it using "-p", but the context is "struct map_struct *map_file", where every file appears to get mapped through.

Please note that this is a very amateurish attempt, but I would really like to get feedback to improve on this. Of course the best would be to have an official patch for this.. :)
Comment 3 Wayne Davison 2011-01-25 14:30:19 UTC
(In reply to comment #2)
> Please note that this is a very amateurish attempt, but I would really like to
> get feedback to improve on this.

See the patch drop-cache.diff in older releases.  There is also a newer version of the patch that someone else wrote.

This won't be in the official release until OS support is better.  At present the POSIX_FADV_DONTNEED option in linux does all kinds of weird things (or did at the time the patch was written by Tobi Oetiker), such as uncaching files that someone else put into the cache (so, if you copy a file that someone else is using heavily, it totally vanishes from the disk cache).

I still say that a better solution than modifying various copy programs would be to have a helper app (ala nice, etc.) that would use a pre-loaded library to do whatever the current OS needs to do to make the file reading not flood the cache.  Then it could be used with cp, tar, etc.
Comment 4 Daniel Hahler 2011-01-25 15:10:21 UTC
Thanks for your feedback!

I had just found the patch by Tobi myself, and it is using mincore (now) to check if a file has been cached before rsync touched it (and then skip removing it from the cache).

Post: http://insights.oetiker.ch/linux/fadvise.html
Patch for 3.0.7: http://tobi.oetiker.ch/patches/rsync-3.0.7-fadvise.patch

I agree that something like "nice" and "ionice" would be the best approach, and I find it rather unusual to not have something like that available already.
I have found the following wrapper, which is using O_DIRECT however: http://arighi.blogspot.com/2007/04/how-to-bypass-buffer-cache-in-linux.html

For what it's worth, here's a link to the POSIX_FADV_DONTNEED implementation in the current kernel: http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.37.y.git;a=blob;f=mm/fadvise.c;hb=refs/heads/master#l118
Comment 5 Daniel Hahler 2011-01-25 17:15:30 UTC
Just for information: rsync builds using the patch are available from my PPA ("Lucid", works on Squeeze, too):
https://launchpad.net/~blueyed/+archive/ppa
Comment 6 costin gusa 2011-04-30 12:55:16 UTC
(In reply to comment #4)
> Thanks for your feedback!
> 
> I had just found the patch by Tobi myself, and it is using mincore (now) to
> check if a file has been cached before rsync touched it (and then skip removing
> it from the cache).
> 
> Post: http://insights.oetiker.ch/linux/fadvise.html
> Patch for 3.0.7: http://tobi.oetiker.ch/patches/rsync-3.0.7-fadvise.patch
> 
> I agree that something like "nice" and "ionice" would be the best approach, and
> I find it rather unusual to not have something like that available already.
> I have found the following wrapper, which is using O_DIRECT however:
> http://arighi.blogspot.com/2007/04/how-to-bypass-buffer-cache-in-linux.html
> 
> For what it's worth, here's a link to the POSIX_FADV_DONTNEED implementation in
> the current kernel:
> http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.37.y.git;a=blob;f=mm/fadvise.c;hb=refs/heads/master#l118

To me as an end user is irrelevant what method is chosen to avoid filling up the cache.

The LD_PRELOAD library wrapper solution looks like a more logic approach; however besides the mentioned blogspot article, are there out successful usage reports based on it?

Second, if the answer to the previous question is "yes", how can I enable it remotely ? - My backup solution is a "pull" type - to avoid a bunch of servers potentially abuse the backup machine, only the backup machine connects to servers. So if I on the backup machine would be able to run rsync with LD_PRELOAD wrapper, how do I run the remote '-e ssh' rsync with the LD_PRELOAD wrapper?

Thank you.
Comment 7 Tomasz Chmielewski 2011-06-15 14:47:25 UTC
There is also this tool by Andrew Morton, which can be used with any arbitrary application:

http://lwn.net/Articles/224653/

http://code.google.com/p/pagecache-mangagement/



Other than that, it would be of course useful to have such a "--direct" switch in rsync, as using LD_PRELOAD with remote servers may not be feasible.