4082 – RFE: prioritize work by size (or other criteria)

Bug 4082 - RFE: prioritize work by size (or other criteria)

Summary: RFE: prioritize work by size (or other criteria)

Status:	ASSIGNED

Alias:	None

Product:	rsync
Classification:	Unclassified
Component:	core (show other bugs)
Version:	2.6.9
Hardware:	All All

Importance:	P3 enhancement (vote)
Target Milestone:	---
Assignee:	Wayne Davison
QA Contact:	Rsync QA Contact

URL:
Keywords:

Depends on:
Blocks:

Reported:	2006-09-05 14:20 UTC by Bill McGonigle (dead mail address)
Modified:	2006-09-11 23:34 UTC (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Bill McGonigle (dead mail address) 2006-09-05 14:20:54 UTC

initial idea:
  It would be great if rsync could copy files in increasing order of size.

why: 
  I was using rsync to backup and reload my remote server over a slow connection and noticed that the really important stuff was being held up by some very large stuff (.ISO's, etc) that did need to be copied back to the server but wasn't urgent.  It occurred to me that the important stuff tended to be small (mail files, homedir stuffs, html files) and the big stuff tended to be unimportant.  So I wound up writing some scripts with 'find' to copy out the big stuff, to a parallel tree, rsync the small files, then go back and copy up the big stuff.  It would be swell if rsync could do the work for me.

something like:

  rsync --prioritize=[size,date,uid] --prioritize-order=[ascending,descending]

is a first approximation of a syntax.  I have no use for the other two criteria myself, but one might as well generalize the idea.  I noticed an RFE to use pregenerated transfer lists; I could accomplish the same task with a wrapper script with those, so maybe that means the two requests could share some plumbing.

Comment 1 Matt McCutchen 2006-09-11 17:20:42 UTC

I agree that it would be useful if rsync could process files in an order specified by the user.  However, I think individual prioritization algorithms are not rsync's job.  Thus, I support the idea of wrapper scripts.

In the meantime, you might be able to use a hack along the following lines to control the transfer order.  Create ten symlinks named "0" through "9" in the root directories of the source and the destination; they should all link to ".".  Run two passes of rsync.  The first deals with everything except transfers of regular files; to do this, specify contradictory --max-size and --min-size.  Then run a script that finds all regular files and assigns each a priority expressed as an integer with the same number of digits.  Insert slashes in the output so you get paths like "0/4/2/foo/bar" for a file at "foo/bar" of priority 042.  Feed these paths to the second pass of rsync using --files-from, and be sure to use --no-implied-dirs.  When rsync sorts the file list, it effectively sorts according to your priorities.

Comment 2 Wayne Davison 2006-09-11 19:07:33 UTC

One thing you can do now is to use --max-size for a small-file first pass, and then --min-size for a large-file second pass.

Comment 3 Bill McGonigle (dead mail address) 2006-09-11 23:34:18 UTC

>use --max-size for a small-file first pass, and
>then --min-size for a large-file second pass.

And it seems so obvious in retrospect!  Thanks.

I can envision a wrapper script now that would use 'find' to figure out the biggest file size in the source tree and figure out a reasonable step function calling rsync iteratively with --min-size and --max size to emulate something like this request. 

Depending on the number of steps, the amount of traffic for building file lists would be far from efficient, so a 1-pass solution would still be great too.