Overview ======== rsync hangs during the transfer of directories containing many small files. The blocked process must be interrupted or killed and the transfer restarted, sometimes many times in a row. As opposed to my previous bug report _this_ time, there are no hard links required to make rsync hang, only regular files are involved. How to reproduce ================ [ Using Linux on e.g. ext4, about 3 GiB disk space required ] ############################ mkdir rstest cd rstest wget 'http://www.hlipp.de/rs/mktest' chmod u+x mktest mkdir files cd files wget 'http://www.hlipp.de/rs/1518_0219.jpg_original' wget 'http://www.hlipp.de/rs/1518_0219.jpg' cd .. ./mktest 5000 ############################ Background: This is based on a larger backup script (which explains the somewhat odd directory names etc.) that often hangs. The error often occurs when a user has many image files in a directory which are geotagged (eventually using exiftool) causing the original files to be renamed to *.jpg_original and a new file with updated EXIF information to be created. As I don't know if it is important that only a part of the file changes, I actually include an example image file into the test case. The script creates a directory "dst" containing 5000 image files (which represents an old backup before geotagging) and a directory "src" containing the same files renamed to *.jpg_original and additionally 5000 image file with altered EXIF information. Finally, rsync -avvHAXSkK --backup --backup-dir="$PWD/X/bak" "$PWD/X/src/." "$PWD/X/dst/." is executed. Actual Results ============== The transfer starts normally (files are backed up and transferred to the destination directory, the log looks as expected) but unexpectedly stops without any message or other hints what is going on. The actual number of files that are transferred vary from system to system. Recent tests stopped after 1193, 1105, 972, and 1266 files. Expected Results ================ rsync should not block but complete the transfer. Further information =================== This problem exists at least for rsync versions 3.1.0 and 3.1.2 for different Linux varieties (at least some OpenSUSE versions and Debian jessie) on x86_64 using various file systems (at least ext4 and xfs).
Created attachment 13756 [details] zip archive of the test case Attached a zip archive of the test case to make reproducing the problem easier: unzip rstest.zip cd rstest ./mktest 5000
Created attachment 13760 [details] Simplified test case I could simplify the test case even further, the attached test script does not even need any example files any more. Simply execute ./mktest 5000 on a Linux system. This creates files consisting of few 0-bytes in the manner described above and executes rsync as described. I can't really help debugging this as I'm not familiar with the code, the communication of the processes appears quite complex. I only got this far (current git): Commenting out the line send_msg((enum msgcode)code, buf, len, 0) in rwrite() in log.c makes the error go away. When printing the values iobuf.msg.len, iobuf.msg.len + needed, and iobuf.msg.size in send_msg() in io.c, it can be seen that the hang occurs as soon as (iobuf.msg.len + needed) exceeds iobuf.msg.size (32768), i.e. when perform_io(needed, PIO_NEED_MSGROOM) has to be called.
There are a lot of bugreports related to rsync hanging mysteriously, some of which may be duplicates of each other: https://bugzilla.samba.org/show_bug.cgi?id=1442 https://bugzilla.samba.org/show_bug.cgi?id=2957 https://bugzilla.samba.org/show_bug.cgi?id=9164 https://bugzilla.samba.org/show_bug.cgi?id=10035 https://bugzilla.samba.org/show_bug.cgi?id=10092 https://bugzilla.samba.org/show_bug.cgi?id=10518 https://bugzilla.samba.org/show_bug.cgi?id=10950 https://bugzilla.samba.org/show_bug.cgi?id=11166 https://bugzilla.samba.org/show_bug.cgi?id=12732 https://bugzilla.samba.org/show_bug.cgi?id=13109
This is fixed in the latest git version.