Created attachment 7468 [details]
Proposed patch to fix
Windows DNS and Bind dns do, and samba's replication code rely on this behavior when trying to get the IP address for <ntds_setting_objectguid>._msdcs.domain.tld.
What is happening is that samba sends a A / AAAA request for this DNS name, and has it's stored in the database as a CNAME record and so ignore the request.
It seems that the RFC indicate that both the CNAME and the A(AAA) record should be returned.
Patch tends to duplicate the code but it seems to please samba and windows dns clients.
As discussed on IRC, I'm not too happy with the patch.
For reference, here is the relevant part of
04/18/12 8:55:04 <kai> morgen
04/18/12 8:56:24 <ekacnet> kai: hi
04/18/12 8:57:07 <ekacnet> kai: http://cpaste.org/1450/
04/18/12 8:57:20 <kai> wasn't me... 0:)
04/18/12 8:57:43 <kai> hm?
04/18/12 8:57:57 <kai> I don't understand what that's trying to do
04/18/12 8:58:36 <kai> that certainly looks invalid
04/18/12 8:58:47 <kai> how are you going to handle SOA or PTR requests with that?
04/18/12 8:59:05 <kai> oh, wait
04/18/12 8:59:11 <kai> it's still invalid
04/18/12 8:59:40 <kai> what you want is to optionally return the AAAA record in the additional records section, possibly
04/18/12 8:59:47 <kai> but that's not the way to do it
04/18/12 9:00:05 <ekacnet> no what I want to do is that if I asked A or AAAA
04/18/12 9:00:16 <ekacnet> but the record is CNAME the server still reply
04/18/12 9:00:43 <ekacnet> because microsoft do so, bind do so and because it breaks the replication code of samba to samba
04/18/12 9:10:51 <kai> ekacnet: I'm not sure I understand the problem
04/18/12 9:12:15 <kai> ah, I see
04/18/12 9:12:26 <kai> but that's still not the right fix
04/18/12 9:13:30 <kai> and not part of RFC behaviour..
04/18/12 9:16:47 <ekacnet> kai: yeah wrong fix
04/18/12 9:16:58 <ekacnet> ok for not RFC behavior
04/18/12 9:17:35 <ekacnet> but it's bind and MS DNS behavior
04/18/12 9:17:45 <ekacnet> and we rely on it at least for replication
04/18/12 9:17:54 <kai> sure, but your resolver should be able to cope
04/18/12 9:18:22 <ekacnet> kai: no it doesn't
04/18/12 9:19:22 <kai> ekacnet: oh, right, that's actually in rfc1034
04/18/12 9:19:33 <kai> but your patch doesn't fix the problem for all I can see
04/18/12 9:20:03 <ekacnet> http://cpaste.org/1451/
04/18/12 9:21:14 <kai> but if the record you're looking for with an A query is a CNAME there can't be another record with the same name that has another QTYPE
04/18/12 9:21:15 <ekacnet> as the time is limited solution 1) is to fall back to bind dlz
04/18/12 9:21:20 <obnox> is the paste above the fix? or ist that s/th else?
04/18/12 9:21:36 <obnox> ekacnet: we can quickly build new packages.
04/18/12 9:21:39 <kai> obnox: no, at least not the right fix
04/18/12 9:21:41 <ekacnet> obnox: it's almost this
04/18/12 9:21:57 <ekacnet> kai: well this is doing what I want now
04/18/12 9:22:01 <ekacnet> at least it seems
04/18/12 9:22:08 <obnox> ekacnet: but I guess you can also have the src deb so you can experiment more quickly
04/18/12 9:22:11 <kai> ok, then your database is in a weird state
04/18/12 9:22:21 <ekacnet> kai: fresh install
04/18/12 9:23:02 <kai> "If a CNAME RR is present at a node, no other data should be
04/18/12 9:23:03 <kai> present"
04/18/12 9:23:08 <kai> says the RFC
04/18/12 9:23:11 <ekacnet> http://cpaste.org/1452/
04/18/12 9:23:21 <ekacnet> there is no other data
04/18/12 9:23:57 <ekacnet> I just want that if you have a CNAME record but I request a A or AAAA record of the same name you return me either a A or CNAME
04/18/12 9:24:05 <kai> right
04/18/12 9:24:12 <kai> but that's not valid behavior
04/18/12 9:24:39 <kai> you're supposed to re-run the query for the A record (and presumably the AAAA record) and then return that along with the CNAME
04/18/12 9:24:40 <ekacnet> kai: well that's bind behavior
04/18/12 9:24:59 <gladiac> abartlet: hi, I'm here
04/18/12 9:25:02 <ekacnet> or MS DNS behavior
04/18/12 9:25:10 <kai> it's not MS DNS behavior
04/18/12 9:25:22 <kai> look, hack it in if you need it, but it's not the right fix
04/18/12 9:25:45 <kai> I agree there's a bug in the internal DNS, but that's not the way to fix it
04/18/12 9:26:42 <ekacnet> http://cpaste.org/1453/
04/18/12 9:26:46 <ekacnet> microsoft behavior
04/18/12 9:27:40 <ekacnet> urg they also return the A record
04/18/12 9:27:49 <kai> network trace or I don't believe it ;)
04/18/12 9:28:14 <kai> you're supposed to return the CNAME and the A record of the node the CNAME points at
04/18/12 9:28:31 <kai> possibly your resolver library does that for you if you run "host"
04/18/12 9:29:02 <kai> but if you run wireshark, I don't see how your patch would make the internal server return the correct query
04/18/12 9:29:24 <kai> er, the A record of the node CNAME points at
04/18/12 9:29:44 <ekacnet> kai: sure I think I just return the CNAME record
04/18/12 9:29:53 <kai> right, and that's invalid, too
04/18/12 9:30:30 <kai> if it works as a quick fix, hack it locally
04/18/12 9:30:36 <kai> but it's not the right fix
04/18/12 9:30:57 <ekacnet> but if I ask for CNAME the DNS is supposed to return me just the CNAME ?
04/18/12 9:31:07 <kai> yes
04/18/12 9:31:19 <kai> that's page 14 on RFC1034
04/18/12 9:34:33 <ekacnet> you don't have a more cleaner fix
04/18/12 9:34:42 <ekacnet> out of your hat ?
04/18/12 9:35:01 <kai> no, it's not trivial
04/18/12 9:35:50 <kai> you need to check if it's a CNAME, and if it is, look at the node pointed at, and rerun the query for an A (and possibly AAAA) record
04/18/12 9:36:01 <kai> though the RFC only mentions A records
04/18/12 9:36:25 <kai> and I'm at work and don't have a DNS with AAAA records to play with
04/18/12 9:36:29 <ekacnet> well it seems that bind is return A if you asked for A
04/18/12 9:36:41 <ekacnet> and AAAA if you asked for AAAA
04/18/12 9:36:55 <kai> ah, that makes life easier
04/18/12 9:37:00 <ekacnet> a CNAME record can return 1 alias no ?
04/18/12 9:37:05 <kai> and of course that makes sense
04/18/12 9:37:06 <kai> yes
04/18/12 9:38:08 <kai> though I wonder if you're allowed to be authorative for foo.example.com only and have alias.foo.example.com be a CNAME for host.bar.example.com
04/18/12 9:38:29 <kai> in which case you'd need to fire off a recursion for that name to get the A(AAA) record
04/18/12 9:39:02 <kai> or possibly not, if recursion is disabled
04/18/12 9:39:29 <kai> in that case, I assume you need to return a SERVER_ERROR, which seems to be what bind does when it can't talk to forwarders
04/18/12 9:40:49 <ekacnet> well the simple solution is that if we can't resolv we just return SERVER_ERRROR ?
04/18/12 9:41:25 <kai> I don't know, I'd have to test this
04/18/12 9:48:02 <ekacnet> so in AD if you have foo.samba.org that is a CNAME for bar.samba.org
04/18/12 9:48:14 <ekacnet> but bar.samba.org didn't exists you just get the CNAME
04/18/12 9:55:40 <ekacnet> kai: is there a reason why internal dns didn't bind on lo0 ?
04/18/12 10:04:58 <kai> ekacnet: probably somebody futzed with the network setup code and I didn't notice ;)
04/18/12 10:05:09 <kai> it was cleanly stolen from the kdc code
04/18/12 10:05:29 <ekacnet> :-)
04/18/12 10:05:42 <ekacnet> well i'll figure out somehow
04/18/12 10:06:35 <kai> just use your local IP :)
04/18/12 10:09:08 <ekacnet> kai: by implementing this CNAME recursion I really have the impression to duplicate the code of handle_question
04/18/12 10:12:35 <kai> ekacnet: yes, I think to fix this, I think it's time to switch dns_process() to be event-based
04/18/12 10:13:27 <kai> then on a CNAME, you run another query
04/18/12 10:14:10 <kai> makes for a much nicer overall structure of the code, and also will stop a stalling forwarder from stalling the whole dns process
04/18/12 10:14:24 <obnox> kai: good plan
04/18/12 10:16:14 <kai> that's been on the todo list for a while, but this bug seems to call for a bump
04/18/12 10:17:20 <ekacnet> kai: well excuse me for tonight I'll have the partial code duplication solution :-)
04/18/12 10:19:38 <kai> whatever floats your boat
04/18/12 10:19:56 <ekacnet> +1
Kai: Now that the code has been made async,
will it be more easy to do the proper fix?
With a little more input, I would try to work on
some code if you don't have the ressources, currently.
Maybe we can discuss this also on irc...
Cheers - Michael
Fixed by commit f3df2988ba6928cde0bd89da321bbe74fd76f53f