I tried to rsync a folder with 323,187 files in it. I sync from osx to linux (NAS Synology). Both side is rsync v3.1.2 compiled by myself. On the NAS, original rsync is 3.0.9. Rsync 3.1.2 hangs at around 295,000 files. If I use exclusion to go under that number, it works. If I add --protocol=30, it doesn't work. If I add --delay-updates, it works. If I use the original Synology rsync (not using rsync-path), it works (without --delay-updates). If I increase ulimit on linux (sudo sysctl -w fs.file-max=25000), it doesn't work. NOTE : when I say it hangs after 295,000 files, I mean : I executed rsync with -v -v -v and then I got approx 295,000 lines of : [sender] make_file(bands/8725,*,2) [sender] make_file(bands/8726,*,2) [sender] make_file(bands/8727,*,2) [sender] make_file(bands/8728,*,2) If I exclude files starting with 8, I got the same number of files. It just stops at : [sender] make_file(bands/9729,*,2) [sender] make_file(bands/972a,*,2) [sender] make_file(bands/972b,*,2) [sender] make_file(bands/972c,*,2) What really puzzles me is that it works with Synology rsync 3.0.9. Did I miss something when I compiled rsync on my linux ?
I found it. I had the idea to try a rsync server instead of ssh. For some reason, I had access to the destination folder when I'm logged in through ssh but launched process under my uid hadn't. I guess it's a Synology trick with extended permissions. And because there was no error message, I didn't think of that. So in short, there is no functional bug. There is an annoying silent fail. Sorry about that.
(In reply to jief from comment #1) My mistake. Previous comment is wrong. Didn't see the --delay-updates option in my final test. Bug still remains and it's not a permission problem. The bug doesn't happen when using a rsync server (using the same binary version 3.1.2) instead of ssh.
What filesystems are used at each end? Also, when you say "hangs", how long have you left it in the "hung" state? In preparation for trying to reproduce this issue, I created a new directory on an ext4 filesystem and started to populate it, using the following bash command: $ for i in {01..323187} ; do echo $i > $i ; done which would result in sequentially creating 323187 distinct files, each with different contents. I expected this to take a little while, but what I observed was that the first 214301 files appeared in the directory very quickly (<30 seconds, maybe only 10 or 15 seconds), but there was then a long pause during which no new files appeared. Thus, the file creation process seemed to have stalled at this point. However, after a long pause (>2 minutes) the rest (108886) of the files suddenly appeared in the directory, in an interval of probably no more 10 seconds. Hence the average rate of file creation over a series of ten-second intervals appeared to vary from ~10000/sec down to almost zero. So I suspect that the ext4 filesystem at least may have some limit on how many blocks/inodes/directory entries can be queued for writeback to disk at a time, or that while the updated directory is being written back it may be locked against further updates, possibly for quite a long time. Thus, the "hang" may be in the underlying filesystem on the receiver, rather than in rsync itself. There may however be differences in the behaviour of rsync (e.g. order and timing of file operations) across different versions that either trigger or avoid triggering the particular case in the filesystem code. Can you run the receiving program in the hanging case under truss/strace or similar, and thus collect information on the timing of the file create calls over the duration of the program run?
APFS in one side, ext4 Synology as destination. I've just made another test. I left it 2 hours. No CPU usage. It just hangs. Destination doesn't matter. I tried APFS to APFS (selecting an empty directory as destination). Same. I tried it reversed : ext4 to APFS. It works ! I seems to hang at 'cnt = select(max_fd + 1, &r_fds, &w_fds, &e_fds, &tv);' line 742(rsync v3.1.2) of io.c.
I see similar problem using rsync 3.1.3 (both client and server on Arch Linux). This happened after user duplicated local copy of their Thunderbird profile - a directory of roughly 42 GB of maildir+ files - so lots of files. The nightly root backup, rsync from client to server, fails hanginng in select() according to strace. After the hang occurs rsync as root on even a small file hangs in select as well - adding -vvv produces no additional output and nothing is happening on the server. The hang is local to the client rsync. rsync as an ordinary user at this point, however, does works normally. Then - as the user who created the large mail dir, I rsync'ed to the server - this runs for a while and transfers a few GB then hangs. At this point the network is fine, same user can ssh to the server and all appears normal aside from rsync hanging. Restarting the network kicks the rsync process to continue - which again hangs after a few more GB of data transferred. This was repeated until all 42 GB were now on backup server. After this - running the normal rsync as root now works fine again.
I have hit the same. Gentoo Linux # rsync --version rsync version 3.1.3 protocol version 31 rsyncing a maildir (as root) with ~36000 files to sshfs mounted dir. strace tells rsync hangs on select - timeouts for all three rsync processes.
rsyncing to an empty directory went fine; it is the rsync with an already existing dir that causes the halt.
Using -vvv is almost always a bad idea and should be avoided. That said, the latest version should avoid a hang.