Bug 12769 - error allocating core memory buffers (code 22) depending on source file system
error allocating core memory buffers (code 22) depending on source file system
Status: NEW
Product: rsync
Classification: Unclassified
Component: core
3.1.0
All Linux
: P5 normal
: ---
Assigned To: Wayne Davison
Rsync QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2017-05-05 13:54 UTC by Roland Haberkorn
Modified: 2017-07-24 15:23 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Roland Haberkorn 2017-05-05 13:54:02 UTC
We run an openSuSE Leap 42.2 and an Ubuntu 14.04.5 on two servers. Copying a large number of files (in this case about 28 million) leads to different results depending on the source file system. 
We copy with rsync -rlptgoDAxHnP --info=progress2 --delete --link-dest=$LINK_DEST root@$SERVER:/$FOLDER /backup/rsynctest/ . Replacing --delete by --delete-delay doesn't change the behaviour as expected. The error occurs with and without the option -n, in this case it is just for testing reasons included.
In case the source is located on an Ext4 file system we run into the following error message after about 26 million files copied:
ERROR: out of memory in hashtable_node [sender]
rsync error: error allocating core memory buffers (code 22) at util2.c(102) [sender=3.1.0]
In case the source is located on an XFS file system the above command copies all files without error.
Both of the file systems hold the same data as the one is the backup copy of the other. The behaviour appears as well when we use rsync via an rsync server and not via SSH and as well when we copy locally on one of the two machines. And it appears regardless of the operating system (openSuSE 42.2 or Ubuntu 14.04.5). 
I did not try in the last time but replaying the backup from an Ext4 showed this error at least one year ago as well. With the change to XFS on the source file system the error suddenly disappeared.
As the error appears even if just doing a --dry-run it seems to be related to the way rsync handles metadata. The data size seems to be unimportant.
Comment 1 Roland Haberkorn 2017-05-05 15:14:57 UTC
If you want me to run further testings with other file systems I am totally willing to produce fake data and run tests. I just haven't done yet because of my lack of knowledge about the underlying mechanisms and because I am not totally sure whether this is a rsync's problem or a kernel issue.
To add two more things: We saw this issue also when having mounted the data with NFSv3 or v4. The target file system does not matter, we've had this issue with btrfs, Ext4 and XFS.
Comment 2 Roland Haberkorn 2017-05-19 11:48:07 UTC
I did some further investigation... 
First thing to add: The ext4 file systems are hard-linked differential rsync backups of the real data on XFS.
I changed the testcase by deleting the --link-dest option.
When rsyncing from an XFS, the rsync process on the client uses about 3% RAM (of total 8GB). When rsyncing from an ext4, it uses up to about 50% RAM.
This picture totally changes when I delete the option -H. In this case also copying from an ext4 uses only less than 2% RAM. 
My guess would be that perhaps -H breaks the incremental recursion when copying from an ext4.
Comment 3 Roland Haberkorn 2017-07-24 15:20:16 UTC
Ok, I digged somewhat deeper. I've found a second difference between my two sources. The one is the original data, the other one is a diffential rsync backup with hard links.
I then built a testcase with about 50 million dummy files with something like this:

#!/bin/bash
for i in {1..50}
do
mkdir $i
#cd $i
for j in {1..1000}
do
mkdir $i/$j
#cd $j
for k in {1..1000}
do
touch $i/$j/$k
done
done
done

Rsyncing this testfolder work fine from and to any of the tested file systems (ext4 64Bit, XFS, btrfs). This is true for with and without the Option -H and as long as in the source file system there is no hard linked copy of the source folder. 
In the moment when there is at least one hard linked copy the option -H breaks the run:

roland@msspc25:~$ stat /mnt/rsynctestsource/1/1/1
  File: /mnt/rsynctestsource/1/1/1
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: 811h/2065d      Inode: 153991349   Links: 2
Access: (0644/-rw-r--r--)  Uid: ( 2001/  roland)   Gid: (  100/   users)
Access: 2017-05-09 10:54:10.341300841 +0200
Modify: 2017-05-08 09:33:51.535967423 +0200
Change: 2017-05-26 16:21:57.610628573 +0200
 Birth: -
roland@msspc25:~$ rsync -rlptgoDxAn --info=name,progress2  --delete --link-dest=/mnt2/link3/ /mnt/rsynctestsource/ /mnt2/link1/.
              0 100%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/49049050)   
roland@msspc25:~$ rsync -rlptgoDxHAn --info=name,progress2  --delete --link-dest=/mnt2/link3/ /mnt/rsynctestsource/ /mnt2/link1/.
              0 100%    0.00kB/s    0:00:00 (xfr#0, ir-chk=1000/25191050)
ERROR: out of memory in hashtable_node [sender]
rsync error: error allocating core memory buffers (code 22) at util2.c(106) [sender=3.1.2]

You can see, the first run without -H works, the last one with -H doesn't. 

So I would have to somewhat rename the bug report into "-H breaks the incremental recursion on hard linked sources". This is as well true for all the three file systems tested.