Bug 14477 - naive one at a time talloc_realloc (explaining several fuzz timeouts)
Summary: naive one at a time talloc_realloc (explaining several fuzz timeouts)
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: DCE-RPCs and pipes (show other bugs)
Version: 4.12.5
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Samba QA Contact
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-09-01 23:32 UTC by Douglas Bagnall
Modified: 2022-03-12 00:22 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Douglas Bagnall 2020-09-01 23:32:32 UTC
In several places we do this kind of thing, building up an array of N items using N reallocations:


for (i = 0; remaining > 0; i++) {
     r->count++;
     r->x = talloc_realloc(ndr->current_mem_ctx,
                           r->x,
                           struct whatever, 
                           r->count);
     if (r->x == NULL) {
         return NDR_ERR_UNHAPPY;
     }

     NDR_CHECK(ndr_pull_whatever(ndr,
                                 NDR_SCALARS,
                                 &r->x[i]));

     remaining = ndr->data_size - ndr->offset;
}


When ndr->data_size is big, there will be a lot of reallocs, and the overall effect is quadratic with respect to the data_size, given O(n) memmoves of O(n) items.

In practice, in normal non-malicious traffic, this is insignificant. But fuzzers have a knack of creating packets that take multiple seconds or minutes to parse.

The obvious fix is to allocate more than one item at a time, keeping count of the extra space using another variable. For example 9148f38c203c3481a43ef6d39ea9313dfa1c1bea (fixed in bf16cd72b285662bcb4367cd2bdd6eac7655254f!) uses gentle exponential growth.

BUT while the effect is drastic in the fuzzer (particularly in oss-fuzz), it is hugely exaggerated the overhead of (I think) address sanitiser, which carefully nitpicks every talloc_realloc. I have seen an example that timed out at 60 seconds with address sanitiser take < 2 seconds without it.

This bug is here to point to and from oss-fuzz timeout bugs to remind us of the probable cause and a fix, and to remind us the probably aren't as drastic as they first look, once address sanitiser is turned off.
Comment 1 Douglas Bagnall 2020-09-01 23:40:05 UTC
Exhibit 1: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=24761
Comment 2 Douglas Bagnall 2022-03-11 02:13:11 UTC
More examples

ndr_pull_preg_file
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=45048


ndr_pull_DNS_RPC_RECORDS_ARRAY
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38810
Comment 3 Douglas Bagnall 2022-03-12 00:22:18 UTC
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=41185


https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38861 which is probably worth fixing (it is in ldap wildcards; although the slowdown is much reduced without ASAN, it is still an easy user controlled slowdown).