Rsync could use the fiemap ioctl (https://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=Documentation/filesystems/fiemap.txt;hb=HEAD) on the source file to detect any ranges that do not have data on the filesystem and thus are guaranteed to read as zero, without actually reading them. This would be most useful in combination with writing the destination file sparsely (--sparse), but it would be safe to use in any case as a sender-side optimization.
Originally proposed at https://bugzilla.redhat.com/show_bug.cgi?id=525545.
you could also use the SEEK_HOLE/SEEK_DATA interface which linux added recently.
This would have the advantage of being Solaris-compatible.
Using FIEMAP has some real potential problems if the file was just recently written (and has blocks which were recently written, and where their final location on disk has not been determined yet). You can work around this using the explicitly or implicitly forcing an fsync if this case is found, but supporting SEEK_HOLE/SEEK_DATA avoids this problem. The tradeoff is that only the very latest kernels support SEEK_HOLE/SEEK_DATA.
I would like to include support for SEEK_HOLE and SEEK_DATA in rsync's --sparse code, but the rsync protocol doesn't yet support indicating holes in the data between the sender and the receiver (the receiver just scans for zeros and leaves holes when they are found).
We'd also need configure support (if it can be compiled) and run-time support (if the current kernel supports the feature and detection if the source filesystem supports it too).
I'll look into this at some point. Patches welcomed.
(In reply to comment #3)
> the rsync protocol doesn't yet support indicating holes in the data
> between the sender and the receiver (the receiver just scans for zeros and
> leaves holes when they are found).
Strictly speaking, the protocol issue is orthogonal and is covered by bug 5801.