Reproduction: 1. du /foo/bigfile 2. echo 3 > /proc/sys/vm/drop_caches 3. time rsync -avp /foo/bigfile /bar/bigfile 4. echo 3 > /proc/sys/vm/drop_caches 5. time cp -a /foo/bigfile /bar/bigfile Actual result: 1. ~1286 MB 3. 27.9s, 45.9 MB/s per calc, 45.61 MB/s according to rsync 5. 14.6s, 88.1 MB/s per calc In other words, cp is *almost twice as fast* as rsync. On a single big file where no comparison is necessary. Expected result: When copying a file, rsync is as fast as cp. rsync is great, but this is a real putoff. I don't see a good reason either, as the two programs perform the same function in this case. Must be some design problem. Find the bottleneck and eliminate it, please. Importance: rsync advertizes with its speed.
If it were'nt just one file, having -v and -r and -D and not having --inplace on rsync would be unfair. Only for a single file like this you can get away with it. Also it doesn't affect speed but -a already includes -p. Actually lots of rsync options and default behavior do not fairly compare to cp, but at the very least, add -W to rsync to compare it against cp. It'll still be slower.
As you said, all these options are irrelevant when rsync is in the middle of copying a single big file. This copy loop is inefficient, that's what my test case shows. And it's very much real: when I move 5T from one server to another new server via GBit Ethernet, and I use rsync (because that's my habit because rsync is so great), I am some 10 hours slower than with cp.
I am seeing performance problems as well but CPU bottleneck. My issue is even cp is CPU bottlenecked and dd with direct i/O gives me the best performance. Nobody mentioned -W before but it didn't seem to make a difference for me. Copying a file from one array to another I am seeing the following: cp (no direct I/O) 94% cpu = 1min11s rsync (no direct I/O) 210% cpu = 2m41s rsync -w (no direct/IO) 205% cpu = 2m46s dd (direct I/O) 13% cpu = 47s dd (no direct I/O) 85% cpu = 1m24s cp (libdirectio) 114% cpu = 50s so my best bet is cp with libdirect I/O but I am not fond of this method as it also is using a lot of CPU usage. I wish there was something that could give me the speed/CPU usage of dd with direct I/O. Pushing data through the memory buffer on my machine is eating up too much CPU usage and bottlenecking the transfer in my case. I don't know if rsync does check-suming and stuff and that is why it seems to be the slowest out of everything I tested.
Let me add my voice to the mix here. I'm copying a 1GB VOB file from an Ubuntu ZFS server running Samba 4.1.1, to my Mac OS X 10.9 box. iperf reports 112 MB/s (should be my theoretical maximum). Copying with Path Finder over Samba: 99 MB/s. Copying with rsync directly (using arcfour256): 92 MB/s. Copying with dd over Samba: 67 MB/s. Copying with cat over Samba (measured with pv): 69 MB/s. Copying with rsync over Samba: 55 MB/s. I'm using gigabit ethernet, obviously, with mtu set to 1500 and no TCP options other than the following in smb.conf: socket options = TCP_NODELAY SO_RCVBUF=131072 SO_SNDBUF=131072 These numbers are very stable over several runs, so I'm pretty curious now about what's going on, especially with rsync.
We use rsync to copy data from one file server to another using NFS3 mounts over a 10Gb link. We found that upping the buffer sizes (as a quick test) increases performance. When using --sparse this increases performance with a factor of fifty, from 2MBps to 100MBps. % diff -u rsync.h-org rsync.h --- rsync.h-org 2014-04-13 19:36:59.000000000 +0200 +++ rsync.h 2014-09-08 16:20:41.427973852 +0200 @@ -131,11 +131,11 @@ #define RSYNC_PORT 873 -#define SPARSE_WRITE_SIZE (1024) -#define WRITE_SIZE (32*1024) -#define CHUNK_SIZE (32*1024) +#define SPARSE_WRITE_SIZE (128*1024) +#define WRITE_SIZE (128*1024) +#define CHUNK_SIZE (128*1024) #define MAX_MAP_SIZE (256*1024) -#define IO_BUFFER_SIZE (32*1024) +#define IO_BUFFER_SIZE (128*1024) #define MAX_BLOCK_SIZE ((int32)1 << 17) /* For compatibility with older rsyncs */ % It sure would be nice if these sizes were `officially' increased.