Bug 1467 - Hash table generation seems to be flawed
Hash table generation seems to be flawed
Product: rsync
Classification: Unclassified
Component: core
All Linux
: P3 normal
: ---
Assigned To: Wayne Davison
Rsync QA Contact
Depends on:
  Show dependency treegraph
Reported: 2004-06-19 02:12 UTC by Aaron Drew
Modified: 2004-06-30 13:29 UTC (History)
0 users

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Aaron Drew 2004-06-19 02:12:01 UTC
The following code is used to generate the hash table for matching. I believe 
there is a line missing that might lead to certain blocks being transmitted 
unnecessarily over the network.

static void build_hash_table(struct sum_struct *s)
        int i;

        if (!tag_table)
                tag_table = new_array(int, TABLESIZE);

        targets = new_array(struct target, s->count);
        if (!tag_table || !targets)

        for (i = 0; i < (int)s->count; i++) {
                targets[i].i = i;
                targets[i].t = gettag(s->sums[i].sum1);

        qsort(targets,s->count,sizeof(targets[0]),(int (*)())compare_targets);

        for (i = 0; i < TABLESIZE; i++)
                tag_table[i] = NULL_TAG;

        for (i = s->count-1; i >= 0; i--)
                // ************** THE FOLLOWING LINE WAS MISSING **********
                if(tag_table[targets[i].t] == NULL_TAG || i < tag_table[targets
                        tag_table[targets[i].t] = i;
Comment 1 Aaron Drew 2004-06-19 02:18:23 UTC
The way the code seems to work is as follows:

A sorted list of hashes and their block numbers are generated (targets). 
They're sorted so that identical hashes are numerically next to each other.

A reverse lookup table is then generated, that (i presume) is supposed to 
return the index of the first of these targets for any given hash. 
Unfortunately, this doesn't look like it will always be the case and so certain 
blocks from the 'generator' might be ignored and their contents retransmitted.
Comment 2 Wayne Davison 2004-06-19 10:48:52 UTC
You'll note that the loop is running in reverse, with "i" going from high to
low. This means that "i < tag_table[targets[i].t]" will always be true for tags
that are not NULL_TAG (since all the values came from prior iterations of the
loop).  So, I believe that the if you added can never be false.

If you disagree with this analysis, please show me the mistake in my reasoning.