It appears that with a large number of DNS records, the DNS server becomes unusable. This appears to be most noticed with the BIND DLZ, but this seems to be related to the fact that such installations will have more records and are more reliant on a performant DNS server.
Created attachment 13868 [details] possible patch for master and 4.7
Created attachment 13879 [details] backported patch for 4.7 The attached patch is derived from the changes in master and others currently being built in autobuild. It is not a direct backport because of changes made in ldb since then. I have tested the DNS performance specifically.
I should clarify further. We can't backport directly as the intersection between the SCOPE_ONELEVEL index and the filter tree index is O(n*m) without the GUID index. We instead just do it for small values of m. In master we just do it in GUID index mode so as to avoid more magic constants long term.
Created attachment 13880 [details] patch for 4.7 cherry-picked and adapted from master This version of the patch has the cherry-pick and adapted-from markers for the changes that just landed in master.
Please include this patch for 4.7.
Comment on attachment 13880 [details] patch for 4.7 cherry-picked and adapted from master There's an ldb change without a version change, which means we would still run with the old code if we're using a system ldb. What is the impact of that?
(In reply to Stefan Metzmacher from comment #6) You say there is an ldb change without a version change, but I included the version change: From 0d154d00d308bd1c31ca7693553513ad005b1837 Mon Sep 17 00:00:00 2001 From: Andrew Bartlett <abartlet@samba.org> Date: Mon, 18 Dec 2017 10:14:31 +1300 Subject: [PATCH 4/7] ldb: Release 1.2.3 Even if that were true, the impact would be DNS resolution times of 300ms vs multiple seconds for the domain I was testing with. This patch is not the same as master, and I've not made a version change in master as we normally do that at freeze time in a few weeks anyway. Please re-read the 4.7 patch and remove your NACK. Thanks!
(In reply to Andrew Bartlett from comment #7) 4.7 can use 1.3.0 from the system. We never create ldb releases from older samba release branches and I'm not happy to rush that into a 4.7.4. I'm wondering why we have such a complex build_wildcard_query(). For normal dns queries we're only doing base searches, correct? Why don't we do base searches for trimmed dns names, then we do at most num component base searches instead of a complex one level search.
(In reply to Stefan Metzmacher from comment #8) I'm proposing we do exactly that, maintain ldb 1.2.x in the 4.7 tree. Master was moved to 1.3.x specifically to allow this. I'm not at this point just before Christmas about to rework the wildcard DNS algorithm, it was built this way to avoid an attacker making us do an additional DNS lookup per . they insert into a query, that is to perform queries in constant time. I agree it is technically possible that a user could combine Samba 4.7 with ldb 1.3.0 I would suggest that is is unlikely and the result is just a slower search, just as slow as before this patch. You could add a blacklist if you like. Sorry, Andrew Bartlett
To be clear, the performance issue isn't the complexity of the SCOPE_ONELEVEL query (even a simple ONELEVEL query is slow for a flat tree). The issue is that such queries were not indexed aside from the @IDXONE index, that is the filter tree index was not considered. This is a generic LDB issue and so is addressed there.
(In reply to Stefan Metzmacher from comment #8) > I'm wondering why we have such a complex build_wildcard_query(). There was originally a version of the patches which would do multiple base searches but we had a number of concerns with it. Prior to the release of 4.7, we accidentally introduced just a single base search as part of another set of DNS fixes which instantly halved the throughput of the DNS server (fixed before the release). Seeing how easy it was to hurt the overall performance of the DC (which we noted had measurable effects to the rest of the services), the decision was made to try and do all the wildcard processing in a single search. The idea was that the indexes would make this all quick, but the performance regression wasn't noticed because our DNS partitions simply didn't have enough records. The new patches have been tested against a proper database with thousands of DNS records. The eventual fixes actually do have all the properties that we were originally intending and none of the degenerate cases. The divergences with master are effectively because of the GUID index work (which can't/shouldn't be backported).
Comment on attachment 13880 [details] patch for 4.7 cherry-picked and adapted from master Karolin, let us coordinate the releases tomorrow I should prepare the ldb release before you tag the samba release.
(In reply to Garming Sam from comment #11) Is it correct that 4.7 will still be slow if it uses ldb 1.3.1 or higher?
(In reply to Stefan Metzmacher from comment #13) Correct. In ldb 1.3.1 we currently only intersect the SCOPE_ONELEVEL index with the attribute filter tree index if GUID index mode is enabled. The GUID index mode won't be enabled for Samba 4.7. This is unlike ldb 1.2.3 where we do such an intersection but only for smaller sets of possible matches. The behaviour diverges because it avoids magic numbers (why is 10 the maximum number of tree matches?) in the long term. If you feel the magic number and code complexity is warranted for the benefit of forward compatibility, you could patch master to have this behaviour also. This could be done before ldb 1.3.1 is released if desired. Finally, I should clarify what 'slow' is for the ldb changes. As long as the DNS sever patches go in, we change from 20s to .3s response time for the top DNS record in my test zone. With the ldb change that becomes .1s. At the same time the ldb change means a 'NXDOMAIN' response, simple A records and wildcard responses drops from .3s to .02s. I hope this clarifies things.
Pushed to v4-7-test. Closing out bug report. Thanks!