Bug 15226 - samba-tool gpo listall fails IPv6 only - finddcs() fails to find DC when there is only an AAAA record for the DC in DNS
Summary: samba-tool gpo listall fails IPv6 only - finddcs() fails to find DC when ther...
Status: ASSIGNED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Other (show other bugs)
Version: 4.16.5
Hardware: All All
: P5 major (vote)
Target Milestone: ---
Assignee: Jeremy Allison
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-11-07 05:17 UTC by Matthew Grant
Modified: 2022-11-09 22:51 UTC (History)
2 users (show)

See Also:


Attachments
Patch for bug attached (1.97 KB, patch)
2022-11-07 05:17 UTC, Matthew Grant
no flags Details
git-am fix for master. (3.14 KB, patch)
2022-11-08 01:53 UTC, Jeremy Allison
no flags Details
Jeremy's tested by Matthew Grant proposed fix (678 bytes, patch)
2022-11-08 21:05 UTC, Matthew Grant
no flags Details
git-am fix for 4.17.next, 4.16.next. (1.51 KB, patch)
2022-11-09 22:51 UTC, Jeremy Allison
jra: review? (dmulder)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matthew Grant 2022-11-07 05:17:40 UTC
Created attachment 17630 [details]
Patch for bug attached

samba-tool gpo listall and all other samba-tool gpo commands fail with:

dns child failed to find name '_ldap._tcp.SMB.ANATHOTH.NET' of type SRV
finddcs: Failed to find SRV record for _ldap._tcp.SMB.ANATHOTH.NET
ERROR(runtime): uncaught exception - ('Could not find a DC for domain', NTSTATUSError(3221225524, 'The object name is not found.'))
  File "/usr/lib/python3/dist-packages/samba/netcmd/__init__.py", line 186, in _run
    return self.run(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/samba/netcmd/gpo.py", line 469, in run
    self.url = dc_url(self.lp, self.creds, H)
  File "/usr/lib/python3/dist-packages/samba/netcmd/gpo.py", line 127, in dc_url
    raise RuntimeError("Could not find a DC for domain", e)

 when there is only AAAA records for the DCs in the Samba domain DNS

Adding an A record for the DC makes the samba-tool gpo listall command work.

Bug found in source4/libcli/resolve/dns_ex.c, all addresses returned should be counted, and not just A records. Correct logic in get_a_aaaa_records() should beto send A query only if initial AAAA query did not return anything.

Patch attached.
Comment 1 Matthew Grant 2022-11-07 05:27:09 UTC
Patch makes samba-tool gpo listall and other commands function in IPv6 only enviroment, only AAAA record for DCs in Samba DNS
Comment 2 Jeremy Allison 2022-11-07 17:40:00 UTC
Hi Matthew, this looks good (modulo the variable name change, which I believe Michael Tokarev already commented on.

I'll create a gitlab MR for this, which is the way things get into our code base right now.

In the meantime, can you send in a Samba developers declaration as specified here:

https://www.samba.org/samba/devel/copyright-policy.html

so we have a record of you being able to contribute. Thanks !

Jeremy.
Comment 3 Jeremy Allison 2022-11-07 20:37:24 UTC
OK, Samba DD received. However, this patch causes:

[59(410)/67 at 34m28s] samba.blackbox.wbinfo(rodc:local)(rodc:local)
1361UNEXPECTED(failure): samba.blackbox.wbinfo(rodc:local).wbinfo -N against rodc(rodc:local)
1362REASON: Exception: Exception: failed to call wbcResolveWinsByName: WBC_ERR_DOMAIN_NOT_FOUND
1363Could not lookup WINS by name RODC

to fail. It might be an interaction with socket_wrapper.

I'm investigating further.
Comment 4 Jeremy Allison 2022-11-08 01:53:31 UTC
Created attachment 17637 [details]
git-am fix for master.

Matt, this is what works to pass the ci build.

I think the underlying problem is that with your fix it puts the IPv6 addresses first, which the ci infrastructure doesn't cope too well with.

Can you test this fix instead, it does an IPv4 lookup followed by an IPv6 lookup (if enabled) and so the addresses are in the expected order.

If should still work in your IPv6-only setup.

Can you test this and confirm ? If you're happy I'll add it as an official MR.

Cheers,

Jeremy.
Comment 5 Jeremy Allison 2022-11-08 05:30:06 UTC
Comment on attachment 17637 [details]
git-am fix for master.

Hang on a minute Matt, the original summary of the problem is incorrect. I think the original logic is good.

The original logic does the following:

dns_lookup(..., QTYPE_AAAA, &reply);

This is then passed to:

total = reply_to_addrs(tmp_ctx, &a_num, &addrs, total, reply, port);

Now, a_num here is only incremented if the DNS server returns A records as well when asked for AAAA records.

total here is always incremented by the number of either A or AAAA records returned.

For a name lookup that only has AAAA but no A records (i.e. IPv6-only), then the return from this
will be:

total = <number of AAAA records>
a_num = 0 (there are no A records)

We then go into:

        if (qtype == QTYPE_AAAA && a_num == 0) {
                /*
                * DNS server didn't returned A when asked for AAAA records.
                * Most of the server do it, let's ask for A specificaly.
                */
                err = dns_lookup(tmp_ctx, name, QTYPE_A, &reply);
                if (!ERR_DNS_IS_OK(err)) {
                        goto done;
                }

                total = reply_to_addrs(tmp_ctx, &a_num, &addrs, total,
                                        reply, port);

        }

What this is doing is asking again for A records if the original query for AAAA records didn't return any A records.

If the return is DNS_OK then it adds any A records to the addrs[] array appending at
position addrs[total] and incrementing total by the number of A records added.

So for a name with 3 AAAA records and 2 A records we'll end up with:

addr[0] = AAAA record1
addr[1] = AAAA record2
addr[2] = AAAA record3
addr[3] = A record1
addr[4] = A record2

total = 5.

This is (arguably) correct (though somewhat tortuous) logic.

For an IPv6-only name we just have the first 3 entries and total = 3.

Is it possible that when the second dns_lookup() request for A records is done it returns a status from the DNS server other than DNS_OK (0) ?

That would explain it. Can you add a DEBUG statement here to check ?

If that is the case, then the original logic is correct and the fix is simple.

The chunk of code above should be changed to:

        if (qtype == QTYPE_AAAA && a_num == 0) {
                /*
                * DNS server didn't returned A when asked for AAAA records.
                * Most of the server do it, let's ask for A specificaly.
                */
                err = dns_lookup(tmp_ctx, name, QTYPE_A, &reply);
                if (ERR_DNS_IS_OK(err)) {
                        total = reply_to_addrs(tmp_ctx, &a_num, &addrs, total, reply, port);
                }
        }

only adding the A records on success, and ignoring any failure.

Do you concur ?
Comment 6 Matthew Grant 2022-11-08 19:18:34 UTC
(In reply to Jeremy Allison from comment #5)
I concur.

Got success with the change you suggested above on my set up. samba-tool gpo listall works.
As originally as in git source, get the failure to find the DC.

Let me repeat, the change you suggested in get_a_aaaa_records(),  source4/libcli/resolve/dns_ex.c makes samba-tool gpo listall return GPOs instead of failing with can't find the DC error.

Got the DEBUG statement working.  That second QTYPE_A query for the IPv4 records where you suggested the change was returning an error code of 10, ERROR_DNS_SOCKET_ERROR according to lib/addns/dnserr.h?

Lets see what the MR test hookup does with the vampire_dc set up in that dsrepl test that was failing.

Hope thats helpful!

Thank you for your reaching out and your help!  Its got me past this show stopper bug at work as I have to devops ansible stuff with setting up an AD server with GPO initialisation.
Comment 7 Jeremy Allison 2022-11-08 19:48:44 UTC
Bugzilla lost this:

>   Hi Jeremy!
>   Let me repeat, the change you suggested in get_a_aaaa_records(), 
>   source4/libcli/resolve/dns_ex.c makes samba-tool gpo listall return GPOs
>   instead of failing with can't find the DC error.
>   Got the DEBUG statement working.  That second QTYPE_A query for the IPv4
>   records where you suggested the change was returning an error code of
>   10, ERROR_DNS_SOCKET_ERROR according to lib/addns/dnserr.h?
>   Lets see what the MR test hookup does with the vampire_dc set up in that
>   dsrepl test that was failing.
>   Hope thats helpful!
>   Thank you for your reaching out and your help!  Its got me past this show
>   stopper bug at work as I have to devops ansible stuff with setting up an
>   AD server with GPO initialisation.
>   Best Regards,
>   Matt Grant
Comment 8 Matthew Grant 2022-11-08 21:05:45 UTC
Created attachment 17638 [details]
Jeremy's tested by Matthew Grant proposed fix

From discussion and my testing it looks like this is the fix for the issue.

This bug affects a lot more than just the samba-tool gpo code, it also affects much of Samba AD via its usage in the source4 part of the tree.
Comment 9 Jeremy Allison 2022-11-08 21:17:51 UTC
MR: https://gitlab.com/samba-team/samba/-/merge_requests/2790
Comment 10 Samba QA Contact 2022-11-09 20:35:08 UTC
This bug was referenced in samba master:

10537a89bb0b461ba31d614b7c9ed56a842422e7
Comment 11 Jeremy Allison 2022-11-09 22:51:22 UTC
Created attachment 17642 [details]
git-am fix for 4.17.next, 4.16.next.

Cherry-pick from master. Applies cleanly to 4.17.next, 4.16.next.