Bug 13913 - sync a folder with large amount of files
Summary: sync a folder with large amount of files
Status: RESOLVED FIXED
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.1.3
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-25 12:43 UTC by jief
Modified: 2020-07-27 21:26 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description jief 2019-04-25 12:43:56 UTC
I tried to rsync a folder with 323,187 files in it.
I sync from osx to linux (NAS Synology). Both side is rsync v3.1.2 compiled by myself. On the NAS, original rsync is 3.0.9.

Rsync 3.1.2 hangs at around 295,000 files.
If I use exclusion to go under that number, it works.
If I add --protocol=30, it doesn't work.
If I add --delay-updates, it works.
If I use the original Synology rsync (not using rsync-path), it works (without --delay-updates).
If I increase ulimit on linux (sudo sysctl -w fs.file-max=25000), it doesn't work.

NOTE : when I say it hangs after 295,000 files, I mean : I executed rsync with -v -v -v and then I got approx 295,000 lines of :
[sender] make_file(bands/8725,*,2)
[sender] make_file(bands/8726,*,2)
[sender] make_file(bands/8727,*,2)
[sender] make_file(bands/8728,*,2)
If I exclude files starting with 8, I got the same number of files. It just stops at :
[sender] make_file(bands/9729,*,2)
[sender] make_file(bands/972a,*,2)
[sender] make_file(bands/972b,*,2)
[sender] make_file(bands/972c,*,2)

What really puzzles me is that it works with Synology rsync 3.0.9.
Did I miss something when I compiled rsync on my linux ?
Comment 1 jief 2019-04-25 15:49:08 UTC
I found it.
I had the idea to try a rsync server instead of ssh.
For some reason, I had access to the destination folder when I'm logged in through ssh but launched process under my uid hadn't. I guess it's a Synology trick with extended permissions.

And because there was no error message, I didn't think of that.

So in short, there is no functional bug. There is an annoying silent fail.

Sorry about that.
Comment 2 jief 2019-04-25 15:56:22 UTC
(In reply to jief from comment #1)

My mistake. Previous comment is wrong. Didn't see the --delay-updates option in my final test.

Bug still remains and it's not a permission problem.

The bug doesn't happen when using a rsync server (using the same binary version 3.1.2) instead of ssh.
Comment 3 Dave Gordon 2019-04-27 22:51:29 UTC
What filesystems are used at each end?
Also, when you say "hangs", how long have you left it in the "hung" state?

In preparation for trying to reproduce this issue, I created a new directory on an ext4 filesystem and started to populate it, using the following bash command:
$ for i in {01..323187} ; do echo $i > $i ; done
which would result in sequentially creating 323187 distinct files, each with different contents.

I expected this to take a little while, but what I observed was that the first 214301 files appeared in the directory very quickly (<30 seconds, maybe only 10 or 15 seconds), but there was then a long pause during which no new files appeared. Thus, the file creation process seemed to have stalled at this point. However, after a long pause (>2 minutes) the rest (108886) of the files suddenly appeared in the directory, in an interval of probably no more 10 seconds. Hence the average rate of file creation over a series of ten-second intervals appeared to vary from ~10000/sec down to almost zero.

So I suspect that the ext4 filesystem at least may have some limit on how many blocks/inodes/directory entries can be queued for writeback to disk at a time, or that while the updated directory is being written back it may be locked against further updates, possibly for quite a long time.

Thus, the "hang" may be in the underlying filesystem on the receiver, rather than in rsync itself. There may however be differences in the behaviour of rsync (e.g. order and timing of file operations) across different versions that either trigger or avoid triggering the particular case in the filesystem code.

Can you run the receiving program in the hanging case under truss/strace or similar, and thus collect information on the timing of the file create calls over the duration of the program run?
Comment 4 jief 2019-05-07 12:15:37 UTC
APFS in one side, ext4 Synology as destination.

I've just made another test. I left it 2 hours. No CPU usage. It just hangs.

Destination doesn't matter. I tried APFS to APFS (selecting an empty directory as destination). Same.

I tried it reversed : ext4 to APFS. It works !

I seems to hang at 'cnt = select(max_fd + 1, &r_fds, &w_fds, &e_fds, &tv);' line 742(rsync v3.1.2) of io.c.
Comment 5 Gene 2019-05-26 13:17:18 UTC
I see similar problem using rsync 3.1.3 (both client and server on Arch Linux).

This happened after user duplicated local copy of their Thunderbird profile  - a directory of roughly 42 GB of maildir+ files - so lots of files.
The nightly root backup, rsync from client to server, fails hanginng in select() according to strace.

After the hang occurs rsync as root on even a small file hangs in select as well - adding -vvv produces no additional output and nothing is happening on the server. The hang is local to the client rsync.

rsync as an ordinary user at this point, however, does works normally. 

Then - as the user who created the large mail dir, I rsync'ed to the server - this runs for a while and transfers a few GB then hangs. At this point the network is fine, same user can ssh to the server and all appears normal aside from rsync hanging.

Restarting the network kicks the rsync process to continue - which again hangs after a few more GB of data transferred. This was repeated until all 42 GB were now on backup server.

After this -  running the normal rsync as root now works fine again.
Comment 6 Adam Purkrt 2019-12-12 09:51:20 UTC
I have hit the same. Gentoo Linux
# rsync --version
rsync version 3.1.3 protocol version 31

rsyncing a maildir (as root) with ~36000 files to sshfs mounted dir. strace tells rsync hangs on select - timeouts for all three rsync processes.
Comment 7 Adam Purkrt 2019-12-12 10:12:29 UTC
rsyncing to an empty directory went fine; it is the rsync with an already existing dir that causes the halt.
Comment 8 Wayne Davison 2020-07-27 21:26:26 UTC
Using -vvv is almost always a bad idea and should be avoided. That said, the latest version should avoid a hang.