The Samba-Bugzilla – Bug 3186
Surprisingly large unshared memory usage
Last modified: 2006-10-15 13:29:35 UTC
I'm running a command like "rsync -vrltH --delete -pgo --stats -z -D
--numeric-ids -i --link-dest=foo blah:bar baz" (part of a dirvish run) with an
input fileset of about 2.4 million files (400K of those file are actually
hardlinked to each other on the sending machine, and remain that way on the
receiving machine---and in fact all but about 30 of them haven't changed, so
virtually all 2.4M of those files also wind up hardlinked to the --link-dest
directory; this is about 280G total).
It takes about 10 minutes to scan a filesystem of this size, and both the
sending & receiving machines rsyncs slowly expand to about 200M during this
scan; that's understandable. But then, as soon as the scan is done, the second
rsync process on the receiving side inflates (over the course of about 5 seconds
or so) to -another- 200M. I don't think I'm being faked out by shared memory
being reported twice, since the free memory on the machine declines
precipitously at exactly the same time. This isn't quite screwing me yet (the
machine's got half a gig of RAM and very little else that must stay resident
during the run), but if the filesystem gets much bigger, I fear massive
thrashing due to swapping. (Really, what I'll have to do is buy more RAM.)
I was under the impression that this wasn't supposed to happen---that rsync
tried hard not to modify lots of pages after the fork, and that Linux (I'm
running Ubuntu Breezy, which has a 2.6 kernel) had copy-on-write fork semantics.
Is the essentially instantaneous inflation of the second rsync process
happening because of either the -H or the --link-dest, or is it a bug?
[This transfer also accumulates about an hour of CPU time on this Athon 1200MHz
CPU; I assume this is due to the expense of -H, and works out to about 1.5
milliseconds of processing per file, assuming I haven't goofed on the math; this
is about a million instructions (or 21000 non-cached memory fetches) per file.
I'd love it if this could be brought down, but I'm probably being unrealistic
about an essentially O(n^2) algorithm...]
Any ideas on this? It's been open 5 weeks and probably got overlooked... Tnx!
What version are you using? You have 2.6.7 selected in the bug report, but that's still in development.
The copy-on-write optimization wasn't done until v2.6.1 (Apr 2004):
- The generator is now better about not modifying the file list
during the transfer in order to avoid a copy-on-write memory
bifurcation (on systems where fork() uses shared memory).
Previously, rsync's shared memory would slowly become unshared,
resulting in real memory usage nearly doubling on the receiving
side by the end of the transfer. Now, as long as permissions
are being preserved, the shared memory should remain that way
for the entire transfer.
You use the -p option, so you meet the "permissions being preserved" condition.
The file_struct data and other chunks of data are allocated out of the free memory pool. If lots of those allocated chunks are returned to the pool, pool management involves memory being modified, so that would require new writes.
Wayne - does rsync free up anything substantial at any time after the fork that might trigger this?
I'm actually using 2.6.7. In fact, I'm using a version from CVS in which Wayne added --min-size and --max-size, and fixed (and then re-fixed) a bug in hlink.
This isn't currently the very latest CVS, but I believe corresponds to rsync-HEAD-20051014-2036GMT, plus the hlink stuff. I could fairly trivially update to the very latest CVS if it was useful in figuring out what's going on; my fundamental question was, "Is this a bug or am I misunderstanding something?" and it's sounding like you believe it's a bug.
I'm reasonably sure that I saw this same behavior in the rsync that ships in Ubuntu Breezy, which is their (somewhat Debianized and with a broken --min|max-size) version of 2.6.5. If it was really necessary, I could re-verify that I saw this behavior in that version, although it'd take some work.
Thanks for confirming that you are using CVS. No need to update to the very latest.
Hmm. You are using --delete, which is done before any file transfers. Ahhhhh. Looks like it builds an entire equivalent file list for files on the receiving side. So it may also be building a 200MB file list to do deletes.
Try running without --delete and see if that extra memory usage still happens. If it goes away, try using --delete-during, which now that I think about it, was intended to solve this very problem with large numbers of files.
I have been trying various copy commands to try to duplicate this, but haven't seen anything wrong. If -p is left off the memory for the shared file list will become unshared, but I haven't yet seen another scenario that would cause that.
> Looks like it builds an entire equivalent file list
> for files on the receiving side.
Older rsync versions did create a duplicate file list when deleting, but that was changed in recent releases to only need enough memory for a single directory at a time plus whatever memory is needed for the scanning function to recurse down to the deepest dir.
I tried adding --delete-during (so the full invocation now looks like "rsync -vrltH --delete -pgo --stats -z -D --numeric-ids -i --delete-during --exclude-from=/FS/dirvish/HOST/20051124/exclude --link-dest=/FS/dirvish/HOST/20051123/tree root@HOST:/ /FS/dirvish/HOST/20051124/tree" and the behavior didn't change.
But now that I think about it, it's not clear if --delete could be a problem in the first place, because these are dirvish runs. That means that I'm using -H and --link-dest to populate a tree that originally starts out empty, and winds up containing only a very few files that consume actual disk space (whatever got created or modified since the dirvish run yesterday), and about 2 million hardlinks into the previous day's tree. If rsync is writing to an otherwise-empty tree, it seems to me that --delete has nothing to do---which makes me wonder why dirvish even bothers to supply it automatically, frankly, since dirvish -always- starts from an empty destination tree. Is there some reason why it makes sense to supply --delete at all? (Unfortunately, we can't ask dirvish's original author why he did this, alas.) Or does --delete cause process inflation if there -isn't- much to do instead of if there -is-?
Once this run completes in a couple hours (I'm debugging some other, unrelated things at the same time in this run), I may just blow its tree away and start over without --delete in any form (by editing the dirvish script) and see if that changes its behavior, but I'd be pretty mystified if it did unless my understanding of --delete, --link-dest, and empty destination trees is just wrong.
Just in case I'm being completely faked out here and the second process really is sharing most of its memory, here are the top few lines of "top" running on the destination host:
top - 00:34:30 up 8 days, 8:25, 2 users, load average: 3.05, 2.00, 0.96
Tasks: 65 total, 3 running, 62 sleeping, 0 stopped, 0 zombie
Cpu(s): 6.0% us, 39.9% sy, 0.0% ni, 0.0% id, 51.3% wa, 2.0% hi, 0.7% si
Mem: 516492k total, 508012k used, 8480k free, 76180k buffers
Swap: 1544184k total, 12476k used, 1531708k free, 68404k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10795 root 18 0 259m 253m 676 D 15.2 50.3 0:47.19 rsync
10865 root 16 0 251m 245m 688 S 0.0 48.7 0:00.08 rsync
What's actually kinda interesting there is that it claims to have 8m free, and 76m of buffers, -and- to have 253+245m of rsync resident, all on a machine with only 512m total memory (and not including ~30m of other processes). And yet I'm pretty sure I -saw- the free memory go from about 200m to about 0 when that second process started up on, on previous runs. (On this one, I didn't quite catch it in the act and am not sure how much free memory there was before the inflation of the second process.)
You are right that dervish does not need to use --delete when copying into a new directory, but it also doesn't hurt anything (since it won't actually do much of anything).
I read something that made it sound like this might be a recent change in the Linux kernel, so I added a sleep(1000) to both the generator and the receiver right after they fork and then ran a test on a system with a linux-2.4 kernel and a linux-2.6 kernel. The processes stayed shared on the 2.4 system, but became unshared right after the fork on the 2.6 kernel, so this appears to be something that needs to be investigated in Linux itself.
Yikes. Well, I'm certainly runnning 2.6 on everything here.
You're certainly in a better position than I am to try to bug-report this to the kernel developers; is that your next step? (I'm assuming there wasn't some API change that makes this "not a bug", but I haven't investigated.)
Btw, I suspect that dirvish's use of --delete might have originated in debugging, or if someone tries to recreate a failed run by redoing it on the same day (and hence in the same tree) without blowing away the tree first, since by default the trees are named by dates. In those cases, --delete would make sense, and since leaving it in theoretically will do nothing in the usual case, I'll ignore it.
I don't think there's anything we can do about this in rsync.