Created attachment 13472 [details]
add '--bind-cpu' option to rsync
We use rsync to take daily backup or log synchronization, but rsync
often trigger high CPU load. I tried to find a solution through
Google, but didn't find a satisfactory answer. Many people suggested
using the '--whole-file' option to reduce the CPU load, or use 'nice'
to lower the rsync execution priority. So I have another idea:
Maybe we can add the '--bind-cpu' option to tell rsync to run
on specified processor, like the 'worker_cpu_affinity' in nginx.
Although I'm not sure that is a good idea, because the core issue is
the improvement of the rsync protocol, but I still have made a few
attempts. I made some changes to rsync 3.1.2 to enable it to support
binding CPUs on GNU/Linux, AIX, FreeBSD & Solaris, please see the
attachment. For example, the following option tells rsync to run on
CPUs 0, 2-5 and 7:
I also used maketree.py script to test the synchronization of a large
number of files (10000 files), and drew some statistics curves (see
the cpu_time.pdf in attachment). I was just testing on a machine (AIX
7.2 with 8 processors) without testing the remote synchronization, but
I think the conclusion is worth reference. As the size of file
increases, the CPU load will continue to increase by default, but the
CPU load will slowly increase until it is around 55% when binding a
single processor. The disadvantage of binding CPUs is that when file
sizes are growing, they are slower to process than non-binding (it's
about two times slower in my test result). Of course, if I/O or
network bandwidth is the bottleneck, then binding CPUs is not
significant because most of time is waiting.
Is it necessary to add the '--bind-cpu' option ? Your comments are
welcome, happy to answer any question I can !
On Linux you can use taskset (in combination with nice and ionice)...