Bug 3244 - exit status with large directories
Summary: exit status with large directories
Status: CLOSED INVALID
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 2.6.6
Hardware: x86 Linux
: P3 major (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-11-06 09:18 UTC by Daniel
Modified: 2006-03-12 02:56 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel 2005-11-06 09:18:49 UTC
When using rsync to mirror large directories with a lot of files and paths, we 
can see that the process crashes with the following error:

rsync error: some files could not be transferred (code 23) at main.c(1173)

I am using rsync V2.6.6-1.2 on CentOS 4.2.

Apparently, this happens not due to a configuration problem, because if I 
mirror only the directory that is causing the problem, then the mirroring is 
done properly, with all files copied successfully and a long execution time 
(normal). However, when the same directory under which the rsync process fails 
is copied with several other large directories, then when it comes to that 
same directory it will fail.

Another thing which is strange: when I do only the mirroring of the /var 
directory, there are many many files that show to be copied then when I 
mirror /var with other directories.
I can see directories being accessed like /var/log/httpd without any update 
before when mirroring with several directories, but when I mirror /var only I 
can see a lot of files being copied before reaching that same directory. This 
could mean that the mirroring is not done properly at all when directories 
contain too many files.

Regards,
Daniel
Comment 1 Daniel 2005-11-06 11:33:56 UTC
Regarding last report, here is some more information about that bug:

I tried to increase the limits with ulimit before running Rsync to the 
following values, but the problem is still the same and the failure remains at 
the exact same line. Here are the limits I used:

core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
pending signals                 (-i) 4096
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 10000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 2000000
stack size              (kbytes, -s) 70000
cpu time               (seconds, -t) unlimited
max user processes              (-u) 16383
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

It seems that there is a buffer overflow inside the code somewhere.
Unfortunately, the code line described (1173) is just the end of the main 
procedure and it doesn't indicate what happens.
It would be great to have detailed information for each error reported, but 
that is another feature request.

Here is the command I use and that generates the problem:

rsync -e ssh -a -v -z -l --delete myhost.mydomain.com:/ /backup_disk/root/ \
--exclude /boot/                        \
--exclude /dev/                         \
--exclude /proc/                        \
--exclude /backup_disk/                 \
--exclude /var/log/sa/                  \
--exclude /var/spool/                   \
--exclude /var/run/


The program stops after the following line:
var/log/cron
var/log/maillog
var/log/messages
var/log/secure
var/log/httpd/access_log
var/log/httpd/error_log
var/log/mail/statistics

sent 151316 bytes  received 14527573 bytes  65677.36 bytes/sec
total size is 14261055275  speedup is 971.54
rsync error: some files could not be transferred (code 23) at main.c(1173)

if I do only mirror /var then the directory is properly updated without any 
problem with the following command:
rsync -e ssh -a -v -z -l --delete 
myhost.mydomain.com:/var /backup_disk/root/var

I also print below the output of rsync-debug from the server, some last lines 
before the crash:


read(3, ") AppleWebKit/312.5.1 (KHTML, li"..., 257712) = 257712
read(3, "                                "..., 257712) = 257712
read(3, "                                "..., 257712) = 257712
read(3, "n.ne.jp                         "..., 257712) = 257712
read(3, "patible; MSIE 6.0; Windows 98; W"..., 257712) = 257712
read(3, "                                "..., 55277) = 55277
munmap(0xb5a09000, 266240)              = 0
close(3)                                = 0
munmap(0xb5a4a000, 221184)              = 0
select(2, NULL, [1], NULL, {60, 0})     = 1 (out [1], left {60, 0})
write(1, "\376\6\0\7\216%\6\0\f\200\376\27\0\0\370\27\0\0\3\0\0\0"..., 1794) = 
1794
select(1, [0], [], NULL, {60, 0})       = 1 (in [0], left {60, 0})
read(0, "\246\336}\243\356\4\335\211\324\222\227\246D\333\204\207"..., 8184) = 
8184
select(1, [0], [], NULL, {60, 0})       = 1 (in [0], left {60, 0})
read(0, "\336d\255\322\314\\\t\2565\4\335\366V\31\1\10\342\221\264"..., 8184) 
= 1270
open("/var/lib/mysql/phphits/log.MYI", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0660, st_size=2628608, ...}) = 0
mmap2(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0xb5a3f000
read(3, "\376\376\7\1\0\2\1\224\0\260\0d\0\310\0\2\0\0\1\0\10\1"..., 263408) = 
263408
read(3, "2.134, 1\0\0\30\341\3\303\f81.255.29.89\0\0\30\340\3"..., 263407) = 
263407
read(3, ".90\0\0002\5\3\22\f66.249.71.39\0\0002\0049.29\0\0"..., 261792) = 
261792
read(3, "\0J\302\3U\016192.168.0.103,\0\0J\301\3V\016192.1"..., 261792) = 
261792
read(3, "\361\3\2\f81.37.198.48\0\0j\362\3\3\r83.203.15"..., 261792) = 261792
read(3, ".141.94.107\0\0\232\17\3 \01681.255.119.178"..., 261792) = 261792
read(3, "133.5\0\0\311[\3l\v10.3.21.227\0\0\311\\\3m\r81"..., 261792) = 261792
read(3, "215.105.67\0\0\370\376\3\17\f218.6.43.186\0\0\370"..., 261792) = 
261792
read(3, ".72.179\0\1(^\3o\r211.75.161.94\0\1(_\3"..., 261792) = 261792
read(3, "\357\2M\0\f12.189.32.37\0\1W\360\3\1\016194.106."..., 261792) = 261792
read(3, "18.56\0\1\207y\3\212\016220.160.239.59\0\1\207z\3\213"..., 7457) = 
7457
munmap(0xb5a3f000, 266240)              = 0
close(3)                                = 0
open("/var/log/cron", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0600, st_size=5033, ...}) = 0
read(3, "Nov  6 16:10:01 anoigo crond[197"..., 5033) = 5033
close(3)                                = 0
open("/var/log/maillog", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0600, st_size=21617, ...}) = 0
read(3, "Nov  6 16:07:49 anoigo sendmail["..., 21617) = 21617
close(3)                                = 0
open("/var/log/messages", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0600, st_size=10781, ...}) = 0
read(3, "Nov  6 16:05:43 anoigo syslogd 1"..., 10781) = 10781
close(3)                                = 0
open("/var/log/secure", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0600, st_size=1532, ...}) = 0
read(3, "Nov  6 16:51:13 anoigo sshd[2866"..., 1532) = 1532
close(3)                                = 0
select(2, NULL, [1], NULL, {60, 0})     = 1 (out [1], left {60, 0})
write(1, "\221\t\0\7\217%\6\0\10\200[\6\0\0P\6\0\0\2\0\0\0\340\3"..., 2453) = 
2453
select(2, NULL, [1], NULL, {60, 0})     = 1 (out [1], left {60, 0})
write(1, "\4\0\0\7\377\377\377\377", 8) = 8
select(1, [0], [], NULL, {60, 0})       = 1 (in [0], left {59, 468000})
read(0, "\377\377\377\377\377\377\377\377", 8184) = 8
select(2, NULL, [1], NULL, {60, 0})     = 1 (out [1], left {60, 0})
write(1, "\4\0\0\7\377\377\377\377", 8) = 8
select(2, NULL, [1], NULL, {60, 0})     = 1 (out [1], left {60, 0})
write(1, "\4\0\0\7\377\377\377\377", 8) = 8
time(NULL)                              = 1131299440
select(2, NULL, [1], NULL, {60, 0})     = 1 (out [1], left {60, 0})
write(1, "\34\0\0\7\316\304\2\0\253\236+\1\377\377\377\377\34\343"..., 32) = 32
select(1, [0], [], NULL, {60, 0})       = 1 (in [0], left {58, 975000})
read(0, "\377\377\377\377", 8184)       = 4
rt_sigaction(SIGUSR1, {SIG_IGN}, {0x8056e50, [USR1], SA_RESTORER|SA_RESTART, 
0x5e28c8}, 8) = 0
rt_sigaction(SIGUSR2, {SIG_IGN}, {0x8056e80, [USR2], SA_RESTORER|SA_RESTART, 
0x5e28c8}, 8) = 0
select(2, NULL, [1], NULL, {60, 0})     = 1 (out [1], left {60, 0})
write(1, "J\0\0\10rsync error: some files coul"..., 78) = 78
exit_group(23)                          = ?
Process 6325 detached


I hope that this will help.

Regards,
Daniel
Comment 2 Wayne Davison 2005-11-06 14:18:17 UTC
rsync did not crash -- exit code 23 is how it indicates that some kind of a read
error occurred during the reading of the files on the sending system (the error
message will probably have scrolled off the screen in a large transfer, so be
sure to capture all the output, including stderr).

I appreciate the extra detail of the copy commands you used, because it
confirmed my suspicion that the extra updates you saw were due to you creating a
duplicate set of files in the destination.  I.e.:  this command:

rsync -e ssh -avzl --delete myhost.mydomain.com:/var /backup_disk/root/var

actually created a directory /backup_disk/root/var/var.

You should either leave off the trailing "var" in the destination, or add a
trailing slash in the source to avoid that.
Comment 3 Daniel 2005-11-07 08:40:54 UTC
Thanks Wayne for that.

I have probably found the cause of the problem. I was copying the /sys 
directory and that seemed to cause problems. It seems to be a system dependent 
directory that we do not need to make a full backup.
So do you believe that the mirroring process did successfully go until the 
end, but that this error Code 23 is printed out only at the end of the 
transfer or does the transfer stop immediately once this error happens? And if 
only at the end, then would only those read error files not have been copied 
and all the rest is ok?

Also, as you suggested, I forwarded the output of rsync to a file and I saw 
the errors coming up to the console. The errors were the followings:

rsync: send_files failed to open "/sys/bus/pci/drivers/AEC62xx_IDE/new_id": 
Permission denied (13)
rsync: send_files failed to open "/sys/bus/pci/drivers/ALI15x3_IDE/new_id": 
Permission denied (13)
rsync: send_files failed to open "/sys/bus/pci/drivers/AMD_IDE/new_id": 
Permission denied (13)
And about 40 others like this, so this confirms that the problem was coming 
from a problem reading those files.

Anyway, the main problem is that I used the verbose mode and that this has put 
the errors hidden in the large output so I didn't see them. This made me 
believe that the code 23 error was happening on the last line of the output 
and that rsync crashed due to that, and I've been mislead by this. So all is 
my fault.. However now I much better understand what was happening.

Thank you also for that trailing slash explanation, it helps much. Do you know 
why this is implemented differently than the cp program, where the source 
trailing slash is not taken into account to change the behaviour?

Regards,
Daniel