Bug 11588 - better handling for --preallocate with --sparse
better handling for --preallocate with --sparse
Status: ASSIGNED
Product: rsync
Classification: Unclassified
Component: core
3.1.2
x64 Linux
: P5 enhancement
: ---
Assigned To: Wayne Davison
Rsync QA Contact
:
: 12305 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-03 18:47 UTC by Andrey Gursky
Modified: 2016-10-19 01:12 UTC (History)
2 users (show)

See Also:


Attachments
Preliminary patch to support punching holes (5.81 KB, patch)
2016-10-08 18:18 UTC, Wayne Davison
no flags Details
Test program to show that fallocate followed by punch hole works just fine.... (999 bytes, text/x-csrc)
2016-10-08 23:31 UTC, Theodore Ts'o
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrey Gursky 2015-11-03 18:47:47 UTC
It's a good practice to transfer big files with a preallocate option, but it's fatally for sparse files being among them. Could this be improved?

Thanks,
Andrey
Comment 1 Wayne Davison 2016-10-01 22:07:29 UTC
*** Bug 12305 has been marked as a duplicate of this bug. ***
Comment 2 Коренберг Марк 2016-10-02 07:33:16 UTC
Bug 12305 has much more detailed description
Comment 3 Andrey Gursky 2016-10-02 09:45:58 UTC
(In reply to Коренберг Марк from comment #2)
Марк, nevertheless it is still the same issue. You're welcome to add the details here. Please, be sure, to fix the typo: instead of --fallocate it should be --preallocate.

Thanks,
Andrey
Comment 4 Carson Gaspar 2016-10-08 16:44:44 UTC
rsync currently just has the receiver turn "long" sequences of zeroes into sparse regions when --sparse is specified. If --preallocate is also specified, what would you like rsync to do?

No wire protocol change required (please pick which behaviour you prefer even if you'd rather a protocol bump, as we may negotiate an older wire protocol):
1) Emit an error (as with --in-place and --sparse)
2) Disable one of the options (which one?)
3) Pre-allocate the file, but when zero regions are detected then ftruncate() it and create the sparse region. Reallocate the rest of the file space after creating the sparse region or not? (IFF receiver is on Linux and on a supported filesystem, fallocate(,FALLOC_FL_PUNCH_HOLE,...) could be used to create sparse regions without truncating the file)

Wire protocol change required, have the sender determine the sparse regions, using SEEK_HOLE if available, otherwise scanning for all-zero regions:
A) Preallocate 1st data region, create 1st sparse region, preallocate 2nd data region, ...
Comment 5 Carson Gaspar 2016-10-08 16:53:06 UTC
(In reply to Carson Gaspar from comment #4)
Actually, you never want the sender to scan for zero regions if SEEK_HOLE isn't supported, as performance would then be terrible. And a given filesystem may not support SEEK_HOLE, even if lseek() does. So we're really back to picking (1), (2), or (3), as I don't see unreliable sender sparse mapping as sensible (although using SEEK_HOLE to save source file read time and provide a hint to the receiver may be nice).
Comment 6 Carson Gaspar 2016-10-08 17:09:20 UTC
For punching holes, Solaris and UnixWare support F_FREESP(64) in fcntl().

Windows supports both reporting and punching holes, but I don't know if cygwin (or any other rsync on windows platform) implements it.
Comment 7 Wayne Davison 2016-10-08 18:18:13 UTC
Created attachment 12556 [details]
Preliminary patch to support punching holes

In my testing, using both a pre-allocate call on a file followed by a hole-punch call has no effect on the allocation of the blocks (though it does zero them).

I tested --sparse and --inplace with this, and it worked fine on one system (with a new enough linux kernel). There are cases where the sparseness will be lost, though, depending on OS & filesystem.  I'm thinking we just update the docs to mention that if you combine --sparse with --inplace (and/or --preallocate) that you might not get the sparseness preserved.
Comment 8 Andrey Gursky 2016-10-08 20:10:52 UTC
(In reply to Wayne Davison from comment #7)
Wayne,

since this bug made rsync unusable for me, I fixed that and implemented additional checks needed for ext4 a month or two after I reported this bug and saw no reaction at all. Now a couple of people got interested and you also, so I can share my work. It's not tiny.

> In my testing, using both a pre-allocate call on a file followed by a
> hole-punch call has no effect on the allocation of the blocks (though it does
> zero them).
Yes, this is tricky. Hole-punch works only for full filesystem blocks (e.g., default 4K). Issuing few partial hole-punch requests wouldn't work, even if they cover the whole block.
Comment 9 Wayne Davison 2016-10-08 21:03:22 UTC
> Hole-punch works only for full filesystem blocks

That has nothing to do with it. If you fallocate() the full file length and then (on the same file handle) try to punch out parts of the allocated file, no blocks change away from becoming allocated. Looks like a bug in Linux to me.
Comment 10 Wayne Davison 2016-10-08 21:05:32 UTC
> ... I can share my work.

Sounds interesting! Looking forward to seeing what you've come up with.
Comment 11 Theodore Ts'o 2016-10-08 21:28:49 UTC
Re: #9.  I'm not able to reproduce the described behavior.   If you want to follow up on what you think is a kernel bug, please send a simple repro program or script and what version of the kernel you are using to the linux-ext4 mailing list.   Thanks!!

Cheers,

<tytso@callcc> {/usr/projects/docker/dropbox}   (master)
1009% fallocate -o 0 -l 128M test.file
<tytso@callcc> {/usr/projects/docker/dropbox}   (master)
1010% filefrag  -v test.file
Filesystem type is: ef53
File size of test.file is 134217728 (32768 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..    4095:   55990272..  55994367:   4096:             unwritten
   1:     4096..    8191:   56025088..  56029183:   4096:   55994368: unwritten
   2:     8192..   10239:   56170496..  56172543:   2048:   56029184: unwritten
   3:    10240..   14335:   56180736..  56184831:   4096:   56172544: unwritten
   4:    14336..   16383:   56252416..  56254463:   2048:   56184832: unwritten
   5:    16384..   20479:   56229888..  56233983:   4096:   56254464: unwritten
   6:    20480..   28671:   56305664..  56313855:   8192:   56233984: unwritten
   7:    28672..   32767:   56352768..  56356863:   4096:   56313856: last,unwritten,eof
test.file: 8 extents found
<tytso@callcc> {/usr/projects/docker/dropbox}   (master)
1011% xfs_io -c "fpunch 65536 65536" test.file
<tytso@callcc> {/usr/projects/docker/dropbox}   (master)
1012% filefrag  -v test.file
Filesystem type is: ef53
File size of test.file is 134217728 (32768 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..      15:   55990272..  55990287:     16:             unwritten
   1:       32..    4095:   55990304..  55994367:   4064:             unwritten
   2:     4096..    8191:   56025088..  56029183:   4096:   55994368: unwritten
   3:     8192..   10239:   56170496..  56172543:   2048:   56029184: unwritten
   4:    10240..   14335:   56180736..  56184831:   4096:   56172544: unwritten
   5:    14336..   16383:   56252416..  56254463:   2048:   56184832: unwritten
   6:    16384..   20479:   56229888..  56233983:   4096:   56254464: unwritten
   7:    20480..   28671:   56305664..  56313855:   8192:   56233984: unwritten
   8:    28672..   32767:   56352768..  56356863:   4096:   56313856: last,unwritten,eof
test.file: 8 extents found
<tytso@callcc> {/usr/projects/docker/dropbox}   (master)
1013% uname -a
Linux callcc 4.8.0-00041-gecd2f69 #3 SMP Mon Oct 3 02:56:05 EDT 2016 x86_64 GNU/Linux
Comment 12 Andrey Gursky 2016-10-08 22:32:09 UTC
(In reply to Wayne Davison from comment #9)
>> Hole-punch works only for full filesystem blocks
> That has nothing to do with it.

Wayne,

OK, might be. I haven't tested it the exactly way you're doing it now, since I did it other way. But this was important to take care in order to fix the bugs related to sparse with inplace [1], [2]. But due to missing response I've hold back the fix (not yet throughly tested) for me.



[1] About data/token send/receive protocol part and more
    https://lists.samba.org/archive/rsync/2015-December/030471.html

[2] Status of --inplace and --sparse in rsync or alternative?
    https://lists.samba.org/archive/rsync/2015-December/030472.html
Comment 13 Andrey Gursky 2016-10-08 22:36:56 UTC
(In reply to Theodore Ts'o from comment #11)

Theo,

I believe "on the same file handle" is the unusual prerequisite to trigger the behavior described by Wayne. Or such a test is already contained in e2fsprogs (in a C test program, not a shell script)?

Regards,
Andrey
Comment 14 Theodore Ts'o 2016-10-08 23:30:23 UTC
>I believe "on the same file handle" is the unusual prerequisite to trigger the >behavior described by Wayne.

I was fairly sure that was a red herring, so I was trying to save myself some time, but no, it doesn't replicate even if you use the same file descriptor....

<tytso@callcc> {/usr/projects/linux/ext4}   (origin)
1009% strace /tmp/test-fallocate 
execve("/tmp/test-fallocate", ["/tmp/test-fallocate"], [/* 64 vars */]) = 0
brk(NULL)                               = 0x187e000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f20e7d2a000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=101609, ...}) = 0
mmap(NULL, 101609, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f20e7d11000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\3\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1685264, ...}) = 0
mmap(NULL, 3791264, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f20e776d000
mprotect(0x7f20e7902000, 2093056, PROT_NONE) = 0
mmap(0x7f20e7b01000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x194000) = 0x7f20e7b01000
mmap(0x7f20e7b07000, 14752, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f20e7b07000
close(3)                                = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f20e7d0f000
arch_prctl(ARCH_SET_FS, 0x7f20e7d0f700) = 0
mprotect(0x7f20e7b01000, 16384, PROT_READ) = 0
mprotect(0x7f20e7d2d000, 4096, PROT_READ) = 0
munmap(0x7f20e7d11000, 101609)          = 0
open("test-file", O_WRONLY|O_CREAT|O_TRUNC, 0700) = 3
fallocate(3, 0, 0, 1048576)             = 0
fallocate(3, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 32768, 32768) = 0
close(3)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++
<tytso@callcc> {/usr/projects/linux/ext4}   (origin)
1010% filefrag  -v test-file
Filesystem type is: ef53
File size of test-file is 1048576 (256 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..       7:   62130432..  62130439:      8:             unwritten
   1:       16..     255:   62130448..  62130687:    240:             last,unwritten,eof
test-file: 1 extent found
Comment 15 Theodore Ts'o 2016-10-08 23:31:03 UTC
Created attachment 12557 [details]
Test program to show that fallocate followed by punch hole works just fine....
Comment 16 Andrey Gursky 2016-10-08 23:47:03 UTC
(In reply to Theodore Ts'o from comment #15)

Theo, thanks for taking time to test it! This works for me too (Debian Testing 4.7.4-2, ext4):

$ /usr/sbin/filefrag -v test-file 
Filesystem type is: ef53
File size of test-file is 1048576 (256 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..       7:   14482176..  14482183:      8:             unwritten
   1:       16..     255:   14482192..  14482431:    240:             last,unwritten,eof
test-file: 1 extent found
$
Comment 17 Wayne Davison 2016-10-09 04:15:11 UTC
Take the test program and change the SYS_fallocate to use the FALLOC_FL_KEEP_SIZE flag (don't forget to "rm test-file") and it will fail. Rsync always pre-allocates with FALLOC_FL_KEEP_SIZE when the flag is available.
Comment 18 Wayne Davison 2016-10-09 04:16:40 UTC
FYI, I tested on Linux 4.2.0 and 3.10.0 (I don't have a newer kernel running here to try).
Comment 19 Wayne Davison 2016-10-09 04:27:27 UTC
Also, to be more like rsync would do you can follow the hole-punch with a seek and a write so that the file ends up with a non-zero size. Apparently if I change the order to do the seek & the write first and THEN punch hole it works.
Comment 20 Andrey Gursky 2016-10-09 13:15:38 UTC
(In reply to Wayne Davison from comment #17)

From what I see, it doesn't fail, since the file is not preallocated at all with FALLOC_FL_KEEP_SIZE, but just a fully sparse file is created (consisting of only one big hole):

$ ls -ls test-file
1024 -rwx------ 1 andrey andrey 0 Oct  9 14:47 test-file

$ /usr/sbin/filefrag -v test-file 
Filesystem type is: ef53
File size of test-file is 0 (0 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..     255:   14828800..  14829055:    256:             last,unwritten,eof
test-file: 1 extent found
Comment 21 Andrey Gursky 2016-10-09 13:53:57 UTC
(In reply to Andrey Gursky from comment #20)

Sorry, it is indeed preallocated. Other still holds: hole-punch doesn't fail because the file already consists of only hole. Such file I would call a preallocated data-sparse. In opposite to the usual not preallocated space-sparse with truncate:

$ truncate -s 1048576 test-sparse

$ ls -ls test-sparse
0 -rw-r--r-- 1 andrey andrey 1048576 Oct  9 15:34 test-sparse

$ /usr/sbin/filefrag -v test-sparse
Filesystem type is: ef53
File size of test-sparse is 1048576 (256 blocks of 4096 bytes)
test-sparse: 0 extents found
Comment 22 Andrey Gursky 2016-10-09 14:04:07 UTC
(In reply to Andrey Gursky from comment #21)

Continuing to think aloud. It's not really a hole, it's already reserved space.
Comment 23 Wayne Davison 2016-10-09 14:54:48 UTC
> Continuing to think aloud. It's not really a hole, it's already reserved space.

Exactly, and it's impossible to punch holes in that allocation. I'm changing my patch to give the file a size to deal with this anomaly.
Comment 24 Theodore Ts'o 2016-10-10 14:38:24 UTC
So a simple workaround would be to use fallocate with KEEP_SIZE at first, then use punch whole, write the blocks, etc., and then use either truncate to set i_size, or seek to the desired size minus one and write a single byte.   Seeking to the desired size minus one is more portable, but if you want to avoid allocating an extra 4k block, you could try using truncate, and if that doesn't set i_size (it's not guaranteed by POSIX, but I believe all Linux file systems will set i_size), seeking to size-1 and writing a single zero byte is guaranteed to work.

That being said, I agree that ext4 should allow punch hole to work beyond i_size, if there are blocks allocated using fallocate(2).   We'll fix that for the future, but for now, the workaround suggested above is probably the simplest way to work around the issue in a way that's compatible with both the current and future behavior.
Comment 25 Коренберг Марк 2016-10-10 15:02:33 UTC
What about fallcate()d area beyond file size ? Will they be synchronized ? Just curious.