Bug 3168 - --min-size cores in 2.6.5 and is completely missing in 2.6.6
Summary: --min-size cores in 2.6.5 and is completely missing in 2.6.6
Status: CLOSED INVALID
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 2.6.6
Hardware: All Linux
: P3 major (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-10-14 04:41 UTC by Lenny Foner
Modified: 2006-03-12 02:56 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lenny Foner 2005-10-14 04:41:16 UTC
--min-size is documented in 2.6.5 (manpage and --help) but segfaults in any
command I've tried it in.

It's vanished in 2.6.6, and seems to exist only as an unofficial patch.  Why is
that?

I don't understand why --max-size is there but --min-size isn't, and the
misleading documentation of --min-size in 2.6.5's --help just screwed me, since
I've been mere hours from setting up a backup system that depended on its
presence.  (--max-size worked just fine in testing; imagine my surprise when
--min-size cored and compiling the latest rsync caused it to vanish altogether!)

I'm on the mailing list and see no discussion of this change from the April
timeframe, which was when I thought it went into the mainline rsync.  It's also
apparently not mentioned in any NEWS file.  And searching for it in the bugzilla
 hasn't turned up anything relevant.

Can this please be committed to the mainline version?

P.S.  I -am- very pleased to see that --max-size, at least, worked when pulling
files from a 2.6.3 to a 2.6.5; I wasn't sure a priori if that would work, since
it wasn't clear who might be doing the filtering.  (My guess is, "the sender if
it supports it, otherwise the receiver", but that's just a guess.)  I'm hoping
that adding --min-size won't break this behavior, since the rsync I'm pulling
from may have to stay at 2.6.3 for a while.
Comment 1 Lenny Foner 2005-10-14 04:50:37 UTC
Obtw, when this gets reinstated, it would be -really nice- if one or the other
(but NOT BOTH) of --max-size or --min-size was an "OR EQUAL".  Right now, if I'm
trying to copy all files under one size to one place, and all files over one
size to another place, it's quite inconvenient not to miss the files that are
exactly on the boundary---instead of being able to use (say) --min-size=50M on
one run and --max-size=50M on the other, I have to say --min-size=49999999 and
--max-size=50000000 to be sure of not getting hit by the fencepost.  This is
error-prone at the very least (because I have to count digits precisely), and
worse if I want the powers-of-two behavior---and by the way, the manpage, even
for 2.6.6, is sloppy in mentioning the K, M, or G multipliers but in not
specifying whether those are human-readable [10^3] or machine [2^10]
multipliers---without reading the code, I have no idea.

Thanks!
Comment 2 Wayne Davison 2005-10-14 10:53:23 UTC
--min-size has never been released in a version of rsync except in the patchs
dir.  I know that Debian has released several versions with --min-size included
(and has recently switched over to using the official patch, which does NOT dump
core), but there may be other distributions that may have decided to include the
option.  This option is being considered for a future release, but seeing how
you've compiled your own version, just apply the patch from the patches dir and
enjoy.

As for who needs to know about the option, the receiver is the one that
implements the filtering logic for what files get transferred, so as long as
you're pulling files, the sending rsync doesn't ever see options like
--max-size, --update, --existing, etc.

As for the size comparison boundry, one solution would be to allow any easy way
to specify +1 or -1, such as --min-size=50k+1 or --max-size=50m-1.

Yes, the man page should be improved to mention exactly what the suffixes mean
-- thanks for pointing that out.  I'm also thinking about allowing the suffix to
specify if the user wants a K to be 1000 instead of 1024, such as suffixing the
K, M, or G with a T to indicate that a power of ten is desired.
Comment 3 Lenny Foner 2005-10-14 11:48:22 UTC
(In reply to comment #2)
> --min-size has never been released in a version of rsync except in the patchs
> dir.  I know that Debian has released several versions with --min-size included
> (and has recently switched over to using the official patch, which does NOT dump
> core), but there may be other distributions that may have decided to include the
> option.  This option is being considered for a future release, but seeing how
> you've compiled your own version, just apply the patch from the patches dir and
> enjoy.

Well, that explains it---Ubuntu Breezy (officially released yesterday, and thus
around for the next six months) picked up a (defective) Debian version of it. 
Unfortunately, this means that every Breezy user who tries this option will have
rsync core on them.  (Maybe Breezy will pick up a fixed Debian version, but that
actually doesn't help a lot---see below.)  Furthermore, many Breezy users will
not be nearly sophisticated enough to ask, "did Debian make an incompatible
change?"  Instead, they'll send bug reports.

I've compiled my own version, but not having this in the non-Debian version is
actually more of a big deal than you think (I think :).  For one thing, it means
I now have a quandry about -my- version.  Do I blow away the Ubuntu one?  Then
updates will blow mine away.  Do I nail the version so that doesn't happen? 
Then it becomes the one sore thumb that -won't- get updates, including security
updates.  Do I put it elsewhere instead?  Then I have to worry about it being in
the path of everything that uses it---including root, including random scripts,
etc etc.  It's a hassle and a waste of time---and risks being forgotten at an
inopportune moment.

Furthermore, since Debian has a version but rsync mainline doesn't, script
writers are in a total quandry, since there's this incompatible feature they
can't depend on being there.  Sure, that's the case for every new feature in
rsync, but it's particularly weird that max is there but min ain't.  It makes
writing scripts that say "put all the big files -here-, and all the little files
-here-" suddenly become a pain to write and/or maintain.

Plus, since even the Debian version has the fencepost issue, I have to kluge
around it.  And if you ever -do- release a version without it (either by parsing
+1 at the end, or by changing < in min-size to <= as I did), then scripts that
others have built based on the Debian one will be subtly wrong.  This seems an
enormous amount of pain and bookkeeping to handle a tiny change with, as far as
I can see, no impact on the rest of rsync if it isn't being used.  Not having it
added in April seems senseless (if it had been there then, Breezy would
certainly have it now, instead of an incompatible coredump), and seems doubly
senseless now.

I mean, it'd be one thing if it was a performance or stability issue, but it's
clearly not, especially since --max-size made it in.  (And no, saying "use find"
isn't the solution, either---some uses of rsync, e.g., dirvish, make that really
painful, again just to work around a tiny bit of non-orthogonality.)

> As for who needs to know about the option, the receiver is the one that
> implements the filtering logic for what files get transferred, so as long as
> you're pulling files, the sending rsync doesn't ever see options like
> --max-size, --update, --existing, etc.

Ah.  Good to know.  (But wouldn't it be somewhat more efficient to have the
sending rsync be able to apply filters if it can?  Less wire traffic and maybe
faster in the filesystem.)

> As for the size comparison boundry, one solution would be to allow any easy way
> to specify +1 or -1, such as --min-size=50k+1 or --max-size=50m-1.

That's a cute idea.  More work to code, but definitely cute.

> Yes, the man page should be improved to mention exactly what the suffixes mean
> -- thanks for pointing that out.  I'm also thinking about allowing the suffix to
> specify if the user wants a K to be 1000 instead of 1024, such as suffixing the
> K, M, or G with a T to indicate that a power of ten is desired.

I think the last thing we need is yet -another- incompatible way to say "human
or machine?"  I don't recall seeing T anywhere else to do this, but maybe some
popular piece of software does this and I haven't noticed.  (It's bad enough
that some accept only lowercase kmg and some accept only uppercase!)  du has
clearly been having these issues, having gone for -h and -H and then deprecating
one in favor of -si (yuck!) and who knows where that's gonna end.  Are there any
other utilities out there that seem popular and have settled this one way or the
other?  What do they do?  Do they use upper vs lower case to decide it? 
Something else?  (I honestly don't know, but I'm hoping somebody's thought about
this...)

Thanks.

P.S. Would it have made sense to have opened this directly on the mailing list
and not in the bug database?  I guess it would have gotten it wider discussion;
I can forward, or you can if you want, if you think it would help.


Comment 4 Lenny Foner 2005-10-16 08:31:21 UTC
I saw the CVS checkin comments a couple days ago and just looked at them---thanks!

(One tiny nit---you might want to mention in the manual that the +1/-1 are
explicitly for avoiding fenceposts when using min/max-size.  This -seems-
obvious, but ya never know.)

I'm not sure if I should open a bug report in Ubuntu Breezy to get them to apply
pressure upstream to get either the current CVS or 2.6.7, when it's released (is
there an estimate?).  Ordinarily I'd just wait, but since Ubuntu shipped a
-broken- Debianized (and soon-to-be-incompatible) version of this, it might be
nice if Ubuntu and/or their upstream pushed out a newer version relatively soon.
 If you have any ideas (or would like to do the pushing yourself), let me know.

[I actually just downloaded the latest nightly 'cause I needed, in addition to
the +1/-1 logic, the fix to hardlinking and devices that was theoretically
installed in April but was fixed again in late July---it just bit me and I spent
an hour figuring out what was going on & working up a test case to send you, and
-then- discovered from the NEWS file in CVS that you'd fixed it 10 weeks ago...
:) Is there a regression test to make sure this doesn't get broken again?  I've
currently doing a long tetst to see if there are still problems in the
hardlinking code & will send mail or open a new bug report if I see anything.]

P.S.  Just for bookkeeping, should this bug be changed from "resolved invalid"
to "closed"?