It's a good practice to transfer big files with a preallocate option, but it's fatally for sparse files being among them. Could this be improved? Thanks, Andrey
*** Bug 12305 has been marked as a duplicate of this bug. ***
Bug 12305 has much more detailed description
(In reply to Коренберг Марк from comment #2) Марк, nevertheless it is still the same issue. You're welcome to add the details here. Please, be sure, to fix the typo: instead of --fallocate it should be --preallocate. Thanks, Andrey
rsync currently just has the receiver turn "long" sequences of zeroes into sparse regions when --sparse is specified. If --preallocate is also specified, what would you like rsync to do? No wire protocol change required (please pick which behaviour you prefer even if you'd rather a protocol bump, as we may negotiate an older wire protocol): 1) Emit an error (as with --in-place and --sparse) 2) Disable one of the options (which one?) 3) Pre-allocate the file, but when zero regions are detected then ftruncate() it and create the sparse region. Reallocate the rest of the file space after creating the sparse region or not? (IFF receiver is on Linux and on a supported filesystem, fallocate(,FALLOC_FL_PUNCH_HOLE,...) could be used to create sparse regions without truncating the file) Wire protocol change required, have the sender determine the sparse regions, using SEEK_HOLE if available, otherwise scanning for all-zero regions: A) Preallocate 1st data region, create 1st sparse region, preallocate 2nd data region, ...
(In reply to Carson Gaspar from comment #4) Actually, you never want the sender to scan for zero regions if SEEK_HOLE isn't supported, as performance would then be terrible. And a given filesystem may not support SEEK_HOLE, even if lseek() does. So we're really back to picking (1), (2), or (3), as I don't see unreliable sender sparse mapping as sensible (although using SEEK_HOLE to save source file read time and provide a hint to the receiver may be nice).
For punching holes, Solaris and UnixWare support F_FREESP(64) in fcntl(). Windows supports both reporting and punching holes, but I don't know if cygwin (or any other rsync on windows platform) implements it.
Created attachment 12556 [details] Preliminary patch to support punching holes In my testing, using both a pre-allocate call on a file followed by a hole-punch call has no effect on the allocation of the blocks (though it does zero them). I tested --sparse and --inplace with this, and it worked fine on one system (with a new enough linux kernel). There are cases where the sparseness will be lost, though, depending on OS & filesystem. I'm thinking we just update the docs to mention that if you combine --sparse with --inplace (and/or --preallocate) that you might not get the sparseness preserved.
(In reply to Wayne Davison from comment #7) Wayne, since this bug made rsync unusable for me, I fixed that and implemented additional checks needed for ext4 a month or two after I reported this bug and saw no reaction at all. Now a couple of people got interested and you also, so I can share my work. It's not tiny. > In my testing, using both a pre-allocate call on a file followed by a > hole-punch call has no effect on the allocation of the blocks (though it does > zero them). Yes, this is tricky. Hole-punch works only for full filesystem blocks (e.g., default 4K). Issuing few partial hole-punch requests wouldn't work, even if they cover the whole block.
> Hole-punch works only for full filesystem blocks That has nothing to do with it. If you fallocate() the full file length and then (on the same file handle) try to punch out parts of the allocated file, no blocks change away from becoming allocated. Looks like a bug in Linux to me.
> ... I can share my work. Sounds interesting! Looking forward to seeing what you've come up with.
Re: #9. I'm not able to reproduce the described behavior. If you want to follow up on what you think is a kernel bug, please send a simple repro program or script and what version of the kernel you are using to the linux-ext4 mailing list. Thanks!! Cheers, <tytso@callcc> {/usr/projects/docker/dropbox} (master) 1009% fallocate -o 0 -l 128M test.file <tytso@callcc> {/usr/projects/docker/dropbox} (master) 1010% filefrag -v test.file Filesystem type is: ef53 File size of test.file is 134217728 (32768 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 4095: 55990272.. 55994367: 4096: unwritten 1: 4096.. 8191: 56025088.. 56029183: 4096: 55994368: unwritten 2: 8192.. 10239: 56170496.. 56172543: 2048: 56029184: unwritten 3: 10240.. 14335: 56180736.. 56184831: 4096: 56172544: unwritten 4: 14336.. 16383: 56252416.. 56254463: 2048: 56184832: unwritten 5: 16384.. 20479: 56229888.. 56233983: 4096: 56254464: unwritten 6: 20480.. 28671: 56305664.. 56313855: 8192: 56233984: unwritten 7: 28672.. 32767: 56352768.. 56356863: 4096: 56313856: last,unwritten,eof test.file: 8 extents found <tytso@callcc> {/usr/projects/docker/dropbox} (master) 1011% xfs_io -c "fpunch 65536 65536" test.file <tytso@callcc> {/usr/projects/docker/dropbox} (master) 1012% filefrag -v test.file Filesystem type is: ef53 File size of test.file is 134217728 (32768 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 15: 55990272.. 55990287: 16: unwritten 1: 32.. 4095: 55990304.. 55994367: 4064: unwritten 2: 4096.. 8191: 56025088.. 56029183: 4096: 55994368: unwritten 3: 8192.. 10239: 56170496.. 56172543: 2048: 56029184: unwritten 4: 10240.. 14335: 56180736.. 56184831: 4096: 56172544: unwritten 5: 14336.. 16383: 56252416.. 56254463: 2048: 56184832: unwritten 6: 16384.. 20479: 56229888.. 56233983: 4096: 56254464: unwritten 7: 20480.. 28671: 56305664.. 56313855: 8192: 56233984: unwritten 8: 28672.. 32767: 56352768.. 56356863: 4096: 56313856: last,unwritten,eof test.file: 8 extents found <tytso@callcc> {/usr/projects/docker/dropbox} (master) 1013% uname -a Linux callcc 4.8.0-00041-gecd2f69 #3 SMP Mon Oct 3 02:56:05 EDT 2016 x86_64 GNU/Linux
(In reply to Wayne Davison from comment #9) >> Hole-punch works only for full filesystem blocks > That has nothing to do with it. Wayne, OK, might be. I haven't tested it the exactly way you're doing it now, since I did it other way. But this was important to take care in order to fix the bugs related to sparse with inplace [1], [2]. But due to missing response I've hold back the fix (not yet throughly tested) for me. [1] About data/token send/receive protocol part and more https://lists.samba.org/archive/rsync/2015-December/030471.html [2] Status of --inplace and --sparse in rsync or alternative? https://lists.samba.org/archive/rsync/2015-December/030472.html
(In reply to Theodore Ts'o from comment #11) Theo, I believe "on the same file handle" is the unusual prerequisite to trigger the behavior described by Wayne. Or such a test is already contained in e2fsprogs (in a C test program, not a shell script)? Regards, Andrey
>I believe "on the same file handle" is the unusual prerequisite to trigger the >behavior described by Wayne. I was fairly sure that was a red herring, so I was trying to save myself some time, but no, it doesn't replicate even if you use the same file descriptor.... <tytso@callcc> {/usr/projects/linux/ext4} (origin) 1009% strace /tmp/test-fallocate execve("/tmp/test-fallocate", ["/tmp/test-fallocate"], [/* 64 vars */]) = 0 brk(NULL) = 0x187e000 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f20e7d2a000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=101609, ...}) = 0 mmap(NULL, 101609, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f20e7d11000 close(3) = 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\3\2\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=1685264, ...}) = 0 mmap(NULL, 3791264, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f20e776d000 mprotect(0x7f20e7902000, 2093056, PROT_NONE) = 0 mmap(0x7f20e7b01000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x194000) = 0x7f20e7b01000 mmap(0x7f20e7b07000, 14752, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f20e7b07000 close(3) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f20e7d0f000 arch_prctl(ARCH_SET_FS, 0x7f20e7d0f700) = 0 mprotect(0x7f20e7b01000, 16384, PROT_READ) = 0 mprotect(0x7f20e7d2d000, 4096, PROT_READ) = 0 munmap(0x7f20e7d11000, 101609) = 0 open("test-file", O_WRONLY|O_CREAT|O_TRUNC, 0700) = 3 fallocate(3, 0, 0, 1048576) = 0 fallocate(3, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 32768, 32768) = 0 close(3) = 0 exit_group(0) = ? +++ exited with 0 +++ <tytso@callcc> {/usr/projects/linux/ext4} (origin) 1010% filefrag -v test-file Filesystem type is: ef53 File size of test-file is 1048576 (256 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 7: 62130432.. 62130439: 8: unwritten 1: 16.. 255: 62130448.. 62130687: 240: last,unwritten,eof test-file: 1 extent found
Created attachment 12557 [details] Test program to show that fallocate followed by punch hole works just fine....
(In reply to Theodore Ts'o from comment #15) Theo, thanks for taking time to test it! This works for me too (Debian Testing 4.7.4-2, ext4): $ /usr/sbin/filefrag -v test-file Filesystem type is: ef53 File size of test-file is 1048576 (256 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 7: 14482176.. 14482183: 8: unwritten 1: 16.. 255: 14482192.. 14482431: 240: last,unwritten,eof test-file: 1 extent found $
Take the test program and change the SYS_fallocate to use the FALLOC_FL_KEEP_SIZE flag (don't forget to "rm test-file") and it will fail. Rsync always pre-allocates with FALLOC_FL_KEEP_SIZE when the flag is available.
FYI, I tested on Linux 4.2.0 and 3.10.0 (I don't have a newer kernel running here to try).
Also, to be more like rsync would do you can follow the hole-punch with a seek and a write so that the file ends up with a non-zero size. Apparently if I change the order to do the seek & the write first and THEN punch hole it works.
(In reply to Wayne Davison from comment #17) From what I see, it doesn't fail, since the file is not preallocated at all with FALLOC_FL_KEEP_SIZE, but just a fully sparse file is created (consisting of only one big hole): $ ls -ls test-file 1024 -rwx------ 1 andrey andrey 0 Oct 9 14:47 test-file $ /usr/sbin/filefrag -v test-file Filesystem type is: ef53 File size of test-file is 0 (0 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 255: 14828800.. 14829055: 256: last,unwritten,eof test-file: 1 extent found
(In reply to Andrey Gursky from comment #20) Sorry, it is indeed preallocated. Other still holds: hole-punch doesn't fail because the file already consists of only hole. Such file I would call a preallocated data-sparse. In opposite to the usual not preallocated space-sparse with truncate: $ truncate -s 1048576 test-sparse $ ls -ls test-sparse 0 -rw-r--r-- 1 andrey andrey 1048576 Oct 9 15:34 test-sparse $ /usr/sbin/filefrag -v test-sparse Filesystem type is: ef53 File size of test-sparse is 1048576 (256 blocks of 4096 bytes) test-sparse: 0 extents found
(In reply to Andrey Gursky from comment #21) Continuing to think aloud. It's not really a hole, it's already reserved space.
> Continuing to think aloud. It's not really a hole, it's already reserved space. Exactly, and it's impossible to punch holes in that allocation. I'm changing my patch to give the file a size to deal with this anomaly.
So a simple workaround would be to use fallocate with KEEP_SIZE at first, then use punch whole, write the blocks, etc., and then use either truncate to set i_size, or seek to the desired size minus one and write a single byte. Seeking to the desired size minus one is more portable, but if you want to avoid allocating an extra 4k block, you could try using truncate, and if that doesn't set i_size (it's not guaranteed by POSIX, but I believe all Linux file systems will set i_size), seeking to size-1 and writing a single zero byte is guaranteed to work. That being said, I agree that ext4 should allow punch hole to work beyond i_size, if there are blocks allocated using fallocate(2). We'll fix that for the future, but for now, the workaround suggested above is probably the simplest way to work around the issue in a way that's compatible with both the current and future behavior.
What about fallcate()d area beyond file size ? Will they be synchronized ? Just curious.
There is a fix for this feature, here: https://bugzilla.samba.org/show_bug.cgi?id=13320 It worksforme.
This feature was added a while ago.