Bug 5727 - rsync crashes while copying large directory.
Summary: rsync crashes while copying large directory.
Status: RESOLVED WORKSFORME
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 2.6.9
Hardware: x86 Linux
: P3 normal (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-31 08:24 UTC by Roger Wolff
Modified: 2008-09-01 02:29 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Roger Wolff 2008-08-31 08:24:47 UTC
I apologise for running an old version: 2.6.9. Last time compiling rsync was a real hassle, and finally got too complicated. 

I have a very large directory that needs to move from one RAID to the new RAID.

Rsync crashes while reading the source directory. the process grows to over 1G (the machine has 2G RAM + 2G swap), and then crashes. Probably on a memory allocation failure. Probably when the process hits about 2G, but I haven't seen it closer to 2G than 1.2G (I get bored watching it all the time, it takes a long time).
Comment 1 Matt McCutchen 2008-08-31 08:44:37 UTC
In this case, you really should try a 3.0.x version because memory usage is significantly reduced.  If rsync fails again, please provide the error message.
Comment 2 Roger Wolff 2008-08-31 09:03:03 UTC
Thanks for trying to help me solve my problem. Here we should focus on improving rsync. 

Just in: I have 28 million files. 

Just upgraded to 3.0.3... After scanning 2.8 million files, it's only using 133Mb of memory so it will likely fit due to the smaller memory footprint. However this doesn't mean that the bug has been fixed. It means it's harder to trigger. 
Comment 3 Roger Wolff 2008-08-31 10:28:25 UTC
rsync . --exclude current -e rsh driepoot:/backup/abra2_usr_src/ -avHS --progress --min-size 1 --max-size 93
building file list ... 
invalid message 101:7104843 [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(390) [sender=3.0.3]
rsync . --exclude current -e rsh driepoot:/backup/abra2_usr_src/ -avHS --progress --min-size 93 --max-size 158
building file list ... 
invalid message 101:7104843 [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(390) [sender=3.0.3]
rsync . --exclude current -e rsh driepoot:/backup/abra2_usr_src/ -avHS --progress --min-size 158 --max-size 273
building file list ... 
Comment 4 Wayne Davison 2008-08-31 13:16:59 UTC
This error:

"invalid message 101:7104843 [sender]"

Indicates that this byte sequence that was received:

0x4b 0x69 0x6c 0x6c 

Which is the ASCII string "Kill".  So, it looks like your shell likes to send extra text on stdout if something is killed ("Killed 1234" or some such).

If you haven't done so already, you should investigate what is killing the remote process (e.g. an out-of-memory error).

"However this doesn't mean that the bug has been fixed. It means it's harder to trigger."

That's all that we can hope for with respect to to memory issues.  With the way rsync works, it is pretty near the limit of what memory reduction is possible, and incremental recursion is already a huge gain for transferring larger sets of files than ever before.  Changing rsync to not cache all the files in a directory would be radical surgery, and is not going to happen in this codebase.

One thing that can make a difference with regard to memory issues is the alloc library that is used.  There are some malloc libraries that will gradually lose memory if they don't do a good job of re-using freed memory.  There was a recent report by a user that switching their malloc library made rsync's incremental recursion work for them.  (If you're using glibc, this is probably not your problem, though.)
Comment 5 Roger Wolff 2008-09-01 02:29:14 UTC
The remote process gets killed by the OOM killer. I've installed the newer version inside a chroot environment. I'm not sure yet how to do that with the "remote" side.