Bug 8918 - Use OS support to detect zero ranges of source file
Summary: Use OS support to detect zero ranges of source file
Status: ASSIGNED
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.1.0
Hardware: All All
: P5 enhancement (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-05-08 04:17 UTC by Matt McCutchen
Modified: 2012-06-17 00:51 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Matt McCutchen 2012-05-08 04:17:02 UTC
Rsync could use the fiemap ioctl (https://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=Documentation/filesystems/fiemap.txt;hb=HEAD) on the source file to detect any ranges that do not have data on the filesystem and thus are guaranteed to read as zero, without actually reading them.  This would be most useful in combination with writing the destination file sparsely (--sparse), but it would be safe to use in any case as a sender-side optimization.

Originally proposed at https://bugzilla.redhat.com/show_bug.cgi?id=525545.
Comment 1 Arne Jansen 2012-05-08 05:59:34 UTC
you could also use the SEEK_HOLE/SEEK_DATA interface which linux added recently.
This would have the advantage of being Solaris-compatible.
Comment 2 Theodore Ts'o 2012-05-08 10:01:13 UTC
Using FIEMAP has some real potential problems if the file was just recently written (and has blocks which were recently written, and where their final location on disk has not been determined yet).   You can work around this using the explicitly or implicitly forcing an fsync if this case is found, but supporting SEEK_HOLE/SEEK_DATA avoids this problem.  The tradeoff is that only the very latest kernels support SEEK_HOLE/SEEK_DATA.
Comment 3 Wayne Davison 2012-06-16 18:01:35 UTC
I would like to include support for SEEK_HOLE and SEEK_DATA in rsync's --sparse code, but the rsync protocol doesn't yet support indicating holes in the data between the sender and the receiver (the receiver just scans for zeros and leaves holes when they are found).

We'd also need configure support (if it can be compiled) and run-time support (if the current kernel supports the feature and detection if the source filesystem supports it too).

I'll look into this at some point.  Patches welcomed.
Comment 4 Matt McCutchen 2012-06-17 00:51:29 UTC
(In reply to comment #3)
> the rsync protocol doesn't yet support indicating holes in the data
> between the sender and the receiver (the receiver just scans for zeros and
> leaves holes when they are found).

Strictly speaking, the protocol issue is orthogonal and is covered by bug 5801.