rsync hangs during the transfer of directories containing many small files. The blocked process must be interrupted or killed and the transfer restarted, sometimes many times in a row.
As opposed to my previous bug report _this_ time, there are no hard links required to make rsync hang, only regular files are involved.
How to reproduce
[ Using Linux on e.g. ext4, about 3 GiB disk space required ]
chmod u+x mktest
Background: This is based on a larger backup script (which explains the somewhat odd directory names etc.) that often hangs. The error often occurs when a user has many image files in a directory which are geotagged (eventually using exiftool) causing the original files to be renamed to *.jpg_original and a new file with updated EXIF information to be created. As I don't know if it is important that only a part of the file changes, I actually include an example image file into the test case.
The script creates a directory "dst" containing 5000 image files (which represents an old backup before geotagging) and a directory "src" containing the same files renamed to *.jpg_original and additionally 5000 image file with altered EXIF information. Finally,
rsync -avvHAXSkK --backup --backup-dir="$PWD/X/bak" "$PWD/X/src/." "$PWD/X/dst/."
The transfer starts normally (files are backed up and transferred to the destination directory, the log looks as expected) but unexpectedly stops without any message or other hints what is going on. The actual number of files that are transferred vary from system to system. Recent tests stopped after 1193, 1105, 972, and 1266 files.
rsync should not block but complete the transfer.
This problem exists at least for rsync versions 3.1.0 and 3.1.2 for different Linux varieties (at least some OpenSUSE versions and Debian jessie) on x86_64 using various file systems (at least ext4 and xfs).
Created attachment 13756 [details]
zip archive of the test case
Attached a zip archive of the test case to make reproducing the problem easier:
Created attachment 13760 [details]
Simplified test case
I could simplify the test case even further, the attached test script does not even need any example files any more. Simply execute
on a Linux system. This creates files consisting of few 0-bytes in the manner described above and executes rsync as described.
I can't really help debugging this as I'm not familiar with the code, the communication of the processes appears quite complex. I only got this far (current git):
Commenting out the line
send_msg((enum msgcode)code, buf, len, 0)
in rwrite() in log.c makes the error go away.
When printing the values iobuf.msg.len, iobuf.msg.len + needed, and iobuf.msg.size in send_msg() in io.c, it can be seen that the hang occurs as soon as (iobuf.msg.len + needed) exceeds iobuf.msg.size (32768), i.e. when perform_io(needed, PIO_NEED_MSGROOM) has to be called.
There are a lot of bugreports related to rsync hanging mysteriously, some of which may be duplicates of each other:
This is fixed in the latest git version.