When using rsync to mirror large directories with a lot of files and paths, we can see that the process crashes with the following error: rsync error: some files could not be transferred (code 23) at main.c(1173) I am using rsync V2.6.6-1.2 on CentOS 4.2. Apparently, this happens not due to a configuration problem, because if I mirror only the directory that is causing the problem, then the mirroring is done properly, with all files copied successfully and a long execution time (normal). However, when the same directory under which the rsync process fails is copied with several other large directories, then when it comes to that same directory it will fail. Another thing which is strange: when I do only the mirroring of the /var directory, there are many many files that show to be copied then when I mirror /var with other directories. I can see directories being accessed like /var/log/httpd without any update before when mirroring with several directories, but when I mirror /var only I can see a lot of files being copied before reaching that same directory. This could mean that the mirroring is not done properly at all when directories contain too many files. Regards, Daniel
Regarding last report, here is some more information about that bug: I tried to increase the limits with ulimit before running Rsync to the following values, but the problem is still the same and the failure remains at the exact same line. Here are the limits I used: core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited file size (blocks, -f) unlimited pending signals (-i) 4096 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 10000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 2000000 stack size (kbytes, -s) 70000 cpu time (seconds, -t) unlimited max user processes (-u) 16383 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited It seems that there is a buffer overflow inside the code somewhere. Unfortunately, the code line described (1173) is just the end of the main procedure and it doesn't indicate what happens. It would be great to have detailed information for each error reported, but that is another feature request. Here is the command I use and that generates the problem: rsync -e ssh -a -v -z -l --delete myhost.mydomain.com:/ /backup_disk/root/ \ --exclude /boot/ \ --exclude /dev/ \ --exclude /proc/ \ --exclude /backup_disk/ \ --exclude /var/log/sa/ \ --exclude /var/spool/ \ --exclude /var/run/ The program stops after the following line: var/log/cron var/log/maillog var/log/messages var/log/secure var/log/httpd/access_log var/log/httpd/error_log var/log/mail/statistics sent 151316 bytes received 14527573 bytes 65677.36 bytes/sec total size is 14261055275 speedup is 971.54 rsync error: some files could not be transferred (code 23) at main.c(1173) if I do only mirror /var then the directory is properly updated without any problem with the following command: rsync -e ssh -a -v -z -l --delete myhost.mydomain.com:/var /backup_disk/root/var I also print below the output of rsync-debug from the server, some last lines before the crash: read(3, ") AppleWebKit/312.5.1 (KHTML, li"..., 257712) = 257712 read(3, " "..., 257712) = 257712 read(3, " "..., 257712) = 257712 read(3, "n.ne.jp "..., 257712) = 257712 read(3, "patible; MSIE 6.0; Windows 98; W"..., 257712) = 257712 read(3, " "..., 55277) = 55277 munmap(0xb5a09000, 266240) = 0 close(3) = 0 munmap(0xb5a4a000, 221184) = 0 select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left {60, 0}) write(1, "\376\6\0\7\216%\6\0\f\200\376\27\0\0\370\27\0\0\3\0\0\0"..., 1794) = 1794 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) read(0, "\246\336}\243\356\4\335\211\324\222\227\246D\333\204\207"..., 8184) = 8184 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) read(0, "\336d\255\322\314\\\t\2565\4\335\366V\31\1\10\342\221\264"..., 8184) = 1270 open("/var/lib/mysql/phphits/log.MYI", O_RDONLY|O_LARGEFILE) = 3 fstat64(3, {st_mode=S_IFREG|0660, st_size=2628608, ...}) = 0 mmap2(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb5a3f000 read(3, "\376\376\7\1\0\2\1\224\0\260\0d\0\310\0\2\0\0\1\0\10\1"..., 263408) = 263408 read(3, "2.134, 1\0\0\30\341\3\303\f81.255.29.89\0\0\30\340\3"..., 263407) = 263407 read(3, ".90\0\0002\5\3\22\f66.249.71.39\0\0002\0049.29\0\0"..., 261792) = 261792 read(3, "\0J\302\3U\016192.168.0.103,\0\0J\301\3V\016192.1"..., 261792) = 261792 read(3, "\361\3\2\f81.37.198.48\0\0j\362\3\3\r83.203.15"..., 261792) = 261792 read(3, ".141.94.107\0\0\232\17\3 \01681.255.119.178"..., 261792) = 261792 read(3, "133.5\0\0\311[\3l\v10.3.21.227\0\0\311\\\3m\r81"..., 261792) = 261792 read(3, "215.105.67\0\0\370\376\3\17\f218.6.43.186\0\0\370"..., 261792) = 261792 read(3, ".72.179\0\1(^\3o\r211.75.161.94\0\1(_\3"..., 261792) = 261792 read(3, "\357\2M\0\f12.189.32.37\0\1W\360\3\1\016194.106."..., 261792) = 261792 read(3, "18.56\0\1\207y\3\212\016220.160.239.59\0\1\207z\3\213"..., 7457) = 7457 munmap(0xb5a3f000, 266240) = 0 close(3) = 0 open("/var/log/cron", O_RDONLY|O_LARGEFILE) = 3 fstat64(3, {st_mode=S_IFREG|0600, st_size=5033, ...}) = 0 read(3, "Nov 6 16:10:01 anoigo crond[197"..., 5033) = 5033 close(3) = 0 open("/var/log/maillog", O_RDONLY|O_LARGEFILE) = 3 fstat64(3, {st_mode=S_IFREG|0600, st_size=21617, ...}) = 0 read(3, "Nov 6 16:07:49 anoigo sendmail["..., 21617) = 21617 close(3) = 0 open("/var/log/messages", O_RDONLY|O_LARGEFILE) = 3 fstat64(3, {st_mode=S_IFREG|0600, st_size=10781, ...}) = 0 read(3, "Nov 6 16:05:43 anoigo syslogd 1"..., 10781) = 10781 close(3) = 0 open("/var/log/secure", O_RDONLY|O_LARGEFILE) = 3 fstat64(3, {st_mode=S_IFREG|0600, st_size=1532, ...}) = 0 read(3, "Nov 6 16:51:13 anoigo sshd[2866"..., 1532) = 1532 close(3) = 0 select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left {60, 0}) write(1, "\221\t\0\7\217%\6\0\10\200[\6\0\0P\6\0\0\2\0\0\0\340\3"..., 2453) = 2453 select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left {60, 0}) write(1, "\4\0\0\7\377\377\377\377", 8) = 8 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59, 468000}) read(0, "\377\377\377\377\377\377\377\377", 8184) = 8 select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left {60, 0}) write(1, "\4\0\0\7\377\377\377\377", 8) = 8 select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left {60, 0}) write(1, "\4\0\0\7\377\377\377\377", 8) = 8 time(NULL) = 1131299440 select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left {60, 0}) write(1, "\34\0\0\7\316\304\2\0\253\236+\1\377\377\377\377\34\343"..., 32) = 32 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {58, 975000}) read(0, "\377\377\377\377", 8184) = 4 rt_sigaction(SIGUSR1, {SIG_IGN}, {0x8056e50, [USR1], SA_RESTORER|SA_RESTART, 0x5e28c8}, 8) = 0 rt_sigaction(SIGUSR2, {SIG_IGN}, {0x8056e80, [USR2], SA_RESTORER|SA_RESTART, 0x5e28c8}, 8) = 0 select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left {60, 0}) write(1, "J\0\0\10rsync error: some files coul"..., 78) = 78 exit_group(23) = ? Process 6325 detached I hope that this will help. Regards, Daniel
rsync did not crash -- exit code 23 is how it indicates that some kind of a read error occurred during the reading of the files on the sending system (the error message will probably have scrolled off the screen in a large transfer, so be sure to capture all the output, including stderr). I appreciate the extra detail of the copy commands you used, because it confirmed my suspicion that the extra updates you saw were due to you creating a duplicate set of files in the destination. I.e.: this command: rsync -e ssh -avzl --delete myhost.mydomain.com:/var /backup_disk/root/var actually created a directory /backup_disk/root/var/var. You should either leave off the trailing "var" in the destination, or add a trailing slash in the source to avoid that.
Thanks Wayne for that. I have probably found the cause of the problem. I was copying the /sys directory and that seemed to cause problems. It seems to be a system dependent directory that we do not need to make a full backup. So do you believe that the mirroring process did successfully go until the end, but that this error Code 23 is printed out only at the end of the transfer or does the transfer stop immediately once this error happens? And if only at the end, then would only those read error files not have been copied and all the rest is ok? Also, as you suggested, I forwarded the output of rsync to a file and I saw the errors coming up to the console. The errors were the followings: rsync: send_files failed to open "/sys/bus/pci/drivers/AEC62xx_IDE/new_id": Permission denied (13) rsync: send_files failed to open "/sys/bus/pci/drivers/ALI15x3_IDE/new_id": Permission denied (13) rsync: send_files failed to open "/sys/bus/pci/drivers/AMD_IDE/new_id": Permission denied (13) And about 40 others like this, so this confirms that the problem was coming from a problem reading those files. Anyway, the main problem is that I used the verbose mode and that this has put the errors hidden in the large output so I didn't see them. This made me believe that the code 23 error was happening on the last line of the output and that rsync crashed due to that, and I've been mislead by this. So all is my fault.. However now I much better understand what was happening. Thank you also for that trailing slash explanation, it helps much. Do you know why this is implemented differently than the cp program, where the source trailing slash is not taken into account to change the behaviour? Regards, Daniel