Bug 13191 - Performance regression in DNS server with introduction of DNS wildcard
Summary: Performance regression in DNS server with introduction of DNS wildcard
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: DNS server (internal) (show other bugs)
Version: 4.7.3
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-12-14 21:16 UTC by Garming Sam
Modified: 2017-12-22 17:08 UTC (History)
7 users (show)

See Also:


Attachments
possible patch for master and 4.7 (4.52 KB, patch)
2017-12-14 23:43 UTC, Andrew Bartlett
no flags Details
backported patch for 4.7 (51.57 KB, patch)
2017-12-20 03:49 UTC, Andrew Bartlett
no flags Details
patch for 4.7 cherry-picked and adapted from master (51.56 KB, patch)
2017-12-20 08:34 UTC, Andrew Bartlett
garming: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Garming Sam 2017-12-14 21:16:02 UTC
It appears that with a large number of DNS records, the DNS server becomes unusable. This appears to be most noticed with the BIND DLZ, but this seems to be related to the fact that such installations will have more records and are more reliant on a performant DNS server.
Comment 1 Andrew Bartlett 2017-12-14 23:43:08 UTC
Created attachment 13868 [details]
possible patch for master and 4.7
Comment 2 Andrew Bartlett 2017-12-20 03:49:52 UTC
Created attachment 13879 [details]
backported patch for 4.7

The attached patch is derived from the changes in master and others currently being built in autobuild.

It is not a direct backport because of changes made in ldb since then. 

I have tested the DNS performance specifically.
Comment 3 Andrew Bartlett 2017-12-20 05:57:25 UTC
I should clarify further.

We can't backport directly as the intersection between the SCOPE_ONELEVEL index and the filter tree index is O(n*m) without the GUID index.  

We instead just do it for small values of m. 

In master we just do it in GUID index mode so as to avoid more magic constants long term.
Comment 4 Andrew Bartlett 2017-12-20 08:34:42 UTC
Created attachment 13880 [details]
patch for 4.7 cherry-picked and adapted from master

This version of the patch has the cherry-pick and adapted-from markers for the changes that just landed in master.
Comment 5 Garming Sam 2017-12-20 09:31:52 UTC
Please include this patch for 4.7.
Comment 6 Stefan Metzmacher 2017-12-20 10:04:26 UTC
Comment on attachment 13880 [details]
patch for 4.7 cherry-picked and adapted from master

There's an ldb change without a version change, which means we would still run
with the old code if we're using a system ldb.

What is the impact of that?
Comment 7 Andrew Bartlett 2017-12-20 17:37:32 UTC
(In reply to Stefan Metzmacher from comment #6)
You say there is an ldb change without a version change, but I included the version change:

From 0d154d00d308bd1c31ca7693553513ad005b1837 Mon Sep 17 00:00:00 2001
From: Andrew Bartlett <abartlet@samba.org>
Date: Mon, 18 Dec 2017 10:14:31 +1300
Subject: [PATCH 4/7] ldb: Release 1.2.3

Even if that were true, the impact would be DNS resolution times of 300ms vs multiple seconds for the domain I was testing with. 

This patch is not the same as master, and I've not made a version change in master as we normally do that at freeze time in a few weeks anyway. 

Please re-read the 4.7 patch and remove your NACK.

Thanks!
Comment 8 Stefan Metzmacher 2017-12-20 20:09:34 UTC
(In reply to Andrew Bartlett from comment #7)

4.7 can use 1.3.0 from the system.

We never create ldb releases from older samba release branches
and I'm not happy to rush that into a 4.7.4.

I'm wondering why we have such a complex
build_wildcard_query().

For normal dns queries we're only doing base searches, correct?

Why don't we do base searches for trimmed dns names,
then we do at most num component base searches instead
of a complex one level search.
Comment 9 Andrew Bartlett 2017-12-20 20:30:21 UTC
(In reply to Stefan Metzmacher from comment #8)

I'm proposing we do exactly that, maintain ldb 1.2.x in the 4.7 tree. 

Master was moved to 1.3.x specifically to allow this. 

I'm not at this point just before Christmas about to rework the wildcard DNS algorithm, it was built this way to avoid an attacker making us do an additional DNS lookup per . they insert into a query, that is to perform queries in constant time. 

I agree it is technically possible that a user could combine Samba 4.7 with ldb 1.3.0 I would suggest that is is unlikely and the result is just a slower search, just as slow as before this patch.  You could add a blacklist if you like. 

Sorry,

Andrew Bartlett
Comment 10 Andrew Bartlett 2017-12-20 20:43:05 UTC
To be clear, the performance issue isn't the complexity of the SCOPE_ONELEVEL query (even a simple ONELEVEL query is slow for a flat tree). 

The issue is that such queries were not indexed aside from the @IDXONE index, that is the filter tree index was not considered.  This is a generic LDB issue and so is addressed there.
Comment 11 Garming Sam 2017-12-21 03:34:03 UTC
(In reply to Stefan Metzmacher from comment #8)

> I'm wondering why we have such a complex build_wildcard_query().

There was originally a version of the patches which would do multiple base searches but we had a number of concerns with it. Prior to the release of 4.7, we accidentally introduced just a single base search as part of another set of DNS fixes which instantly halved the throughput of the DNS server (fixed before the release). Seeing how easy it was to hurt the overall performance of the DC (which we noted had measurable effects to the rest of the services), the decision was made to try and do all the wildcard processing in a single search. The idea was that the indexes would make this all quick, but the performance regression wasn't noticed because our DNS partitions simply didn't have enough records.

The new patches have been tested against a proper database with thousands of DNS records. The eventual fixes actually do have all the properties that we were originally intending and none of the degenerate cases. The divergences with master are effectively because of the GUID index work (which can't/shouldn't be backported).
Comment 12 Stefan Metzmacher 2017-12-21 20:42:47 UTC
Comment on attachment 13880 [details]
patch for 4.7 cherry-picked and adapted from master

Karolin, let us coordinate the releases tomorrow I should prepare the ldb release
before you tag the samba release.
Comment 13 Stefan Metzmacher 2017-12-21 20:46:49 UTC
(In reply to Garming Sam from comment #11)

Is it correct that 4.7 will still be slow if it uses ldb 1.3.1 or higher?
Comment 14 Andrew Bartlett 2017-12-21 21:26:31 UTC
(In reply to Stefan Metzmacher from comment #13)
Correct.  In ldb 1.3.1 we currently only intersect the SCOPE_ONELEVEL index with the attribute filter tree index if GUID index mode is enabled.

The GUID index mode won't be enabled for Samba 4.7. 

This is unlike ldb 1.2.3 where we do such an intersection but only for smaller sets of possible matches.  The behaviour diverges because it avoids magic numbers (why is 10 the maximum number of tree matches?) in the long term.

If you feel the magic number and code complexity is warranted for the benefit of forward compatibility, you could patch master to have this behaviour also.  This could be done before ldb 1.3.1 is released if desired. 

Finally, I should clarify what 'slow' is for the ldb changes.  As long as the DNS sever patches go in, we change from 20s to .3s response time for the top DNS record in my test zone.  With the ldb change that becomes .1s.  

At the same time the ldb change means a 'NXDOMAIN' response, simple A records and wildcard responses drops from .3s to .02s.

I hope this clarifies things.
Comment 15 Karolin Seeger 2017-12-22 17:08:29 UTC
Pushed to v4-7-test.
Closing out bug report.

Thanks!