Created attachment 7468 [details] Proposed patch to fix Windows DNS and Bind dns do, and samba's replication code rely on this behavior when trying to get the IP address for <ntds_setting_objectguid>._msdcs.domain.tld. What is happening is that samba sends a A / AAAA request for this DNS name, and has it's stored in the database as a CNAME record and so ignore the request. It seems that the RFC indicate that both the CNAME and the A(AAA) record should be returned.
Patch tends to duplicate the code but it seems to please samba and windows dns clients.
Reassign correctly.
As discussed on IRC, I'm not too happy with the patch.
For reference, here is the relevant part of https://irclog.samba.org/2012/04/20120418-Wed.log: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 04/18/12 8:55:04 <kai> morgen 04/18/12 8:56:24 <ekacnet> kai: hi 04/18/12 8:57:07 <ekacnet> kai: http://cpaste.org/1450/ 04/18/12 8:57:20 <kai> wasn't me... 0:) 04/18/12 8:57:43 <kai> hm? 04/18/12 8:57:57 <kai> I don't understand what that's trying to do 04/18/12 8:58:36 <kai> that certainly looks invalid 04/18/12 8:58:47 <kai> how are you going to handle SOA or PTR requests with that? 04/18/12 8:59:05 <kai> oh, wait 04/18/12 8:59:11 <kai> it's still invalid 04/18/12 8:59:40 <kai> what you want is to optionally return the AAAA record in the additional records section, possibly 04/18/12 8:59:47 <kai> but that's not the way to do it 04/18/12 9:00:05 <ekacnet> no what I want to do is that if I asked A or AAAA 04/18/12 9:00:16 <ekacnet> but the record is CNAME the server still reply 04/18/12 9:00:43 <ekacnet> because microsoft do so, bind do so and because it breaks the replication code of samba to samba 04/18/12 9:10:51 <kai> ekacnet: I'm not sure I understand the problem 04/18/12 9:12:15 <kai> ah, I see 04/18/12 9:12:26 <kai> but that's still not the right fix 04/18/12 9:13:30 <kai> and not part of RFC behaviour.. 04/18/12 9:16:47 <ekacnet> kai: yeah wrong fix 04/18/12 9:16:58 <ekacnet> ok for not RFC behavior 04/18/12 9:17:35 <ekacnet> but it's bind and MS DNS behavior 04/18/12 9:17:45 <ekacnet> and we rely on it at least for replication 04/18/12 9:17:54 <kai> sure, but your resolver should be able to cope 04/18/12 9:18:22 <ekacnet> kai: no it doesn't 04/18/12 9:19:22 <kai> ekacnet: oh, right, that's actually in rfc1034 04/18/12 9:19:33 <kai> but your patch doesn't fix the problem for all I can see 04/18/12 9:20:03 <ekacnet> http://cpaste.org/1451/ 04/18/12 9:21:14 <kai> but if the record you're looking for with an A query is a CNAME there can't be another record with the same name that has another QTYPE 04/18/12 9:21:15 <ekacnet> as the time is limited solution 1) is to fall back to bind dlz 04/18/12 9:21:20 <obnox> is the paste above the fix? or ist that s/th else? 04/18/12 9:21:36 <obnox> ekacnet: we can quickly build new packages. 04/18/12 9:21:39 <kai> obnox: no, at least not the right fix 04/18/12 9:21:41 <ekacnet> obnox: it's almost this 04/18/12 9:21:57 <ekacnet> kai: well this is doing what I want now 04/18/12 9:22:01 <ekacnet> at least it seems 04/18/12 9:22:08 <obnox> ekacnet: but I guess you can also have the src deb so you can experiment more quickly 04/18/12 9:22:11 <kai> ok, then your database is in a weird state 04/18/12 9:22:21 <ekacnet> kai: fresh install 04/18/12 9:23:02 <kai> "If a CNAME RR is present at a node, no other data should be 04/18/12 9:23:03 <kai> present" 04/18/12 9:23:08 <kai> says the RFC 04/18/12 9:23:11 <ekacnet> http://cpaste.org/1452/ 04/18/12 9:23:21 <ekacnet> there is no other data 04/18/12 9:23:57 <ekacnet> I just want that if you have a CNAME record but I request a A or AAAA record of the same name you return me either a A or CNAME 04/18/12 9:24:05 <kai> right 04/18/12 9:24:12 <kai> but that's not valid behavior 04/18/12 9:24:39 <kai> you're supposed to re-run the query for the A record (and presumably the AAAA record) and then return that along with the CNAME 04/18/12 9:24:40 <ekacnet> kai: well that's bind behavior 04/18/12 9:24:59 <gladiac> abartlet: hi, I'm here 04/18/12 9:25:02 <ekacnet> or MS DNS behavior 04/18/12 9:25:10 <kai> it's not MS DNS behavior 04/18/12 9:25:22 <kai> look, hack it in if you need it, but it's not the right fix 04/18/12 9:25:45 <kai> I agree there's a bug in the internal DNS, but that's not the way to fix it 04/18/12 9:26:42 <ekacnet> http://cpaste.org/1453/ 04/18/12 9:26:46 <ekacnet> microsoft behavior 04/18/12 9:27:40 <ekacnet> urg they also return the A record 04/18/12 9:27:49 <kai> network trace or I don't believe it ;) 04/18/12 9:28:14 <kai> you're supposed to return the CNAME and the A record of the node the CNAME points at 04/18/12 9:28:31 <kai> possibly your resolver library does that for you if you run "host" 04/18/12 9:29:02 <kai> but if you run wireshark, I don't see how your patch would make the internal server return the correct query 04/18/12 9:29:24 <kai> er, the A record of the node CNAME points at 04/18/12 9:29:44 <ekacnet> kai: sure I think I just return the CNAME record 04/18/12 9:29:53 <kai> right, and that's invalid, too 04/18/12 9:30:30 <kai> if it works as a quick fix, hack it locally 04/18/12 9:30:36 <kai> but it's not the right fix 04/18/12 9:30:57 <ekacnet> but if I ask for CNAME the DNS is supposed to return me just the CNAME ? 04/18/12 9:31:07 <kai> yes 04/18/12 9:31:19 <kai> that's page 14 on RFC1034 04/18/12 9:34:33 <ekacnet> you don't have a more cleaner fix 04/18/12 9:34:42 <ekacnet> out of your hat ? 04/18/12 9:35:01 <kai> no, it's not trivial 04/18/12 9:35:50 <kai> you need to check if it's a CNAME, and if it is, look at the node pointed at, and rerun the query for an A (and possibly AAAA) record 04/18/12 9:36:01 <kai> though the RFC only mentions A records 04/18/12 9:36:25 <kai> and I'm at work and don't have a DNS with AAAA records to play with 04/18/12 9:36:29 <ekacnet> well it seems that bind is return A if you asked for A 04/18/12 9:36:41 <ekacnet> and AAAA if you asked for AAAA 04/18/12 9:36:55 <kai> ah, that makes life easier 04/18/12 9:37:00 <ekacnet> a CNAME record can return 1 alias no ? 04/18/12 9:37:05 <kai> and of course that makes sense 04/18/12 9:37:06 <kai> yes 04/18/12 9:38:08 <kai> though I wonder if you're allowed to be authorative for foo.example.com only and have alias.foo.example.com be a CNAME for host.bar.example.com 04/18/12 9:38:29 <kai> in which case you'd need to fire off a recursion for that name to get the A(AAA) record 04/18/12 9:39:02 <kai> or possibly not, if recursion is disabled 04/18/12 9:39:29 <kai> in that case, I assume you need to return a SERVER_ERROR, which seems to be what bind does when it can't talk to forwarders 04/18/12 9:40:49 <ekacnet> well the simple solution is that if we can't resolv we just return SERVER_ERRROR ? 04/18/12 9:41:25 <kai> I don't know, I'd have to test this 04/18/12 9:48:02 <ekacnet> so in AD if you have foo.samba.org that is a CNAME for bar.samba.org 04/18/12 9:48:14 <ekacnet> but bar.samba.org didn't exists you just get the CNAME 04/18/12 9:55:40 <ekacnet> kai: is there a reason why internal dns didn't bind on lo0 ? 04/18/12 10:04:58 <kai> ekacnet: probably somebody futzed with the network setup code and I didn't notice ;) 04/18/12 10:05:09 <kai> it was cleanly stolen from the kdc code 04/18/12 10:05:29 <ekacnet> :-) 04/18/12 10:05:42 <ekacnet> well i'll figure out somehow 04/18/12 10:06:35 <kai> just use your local IP :) 04/18/12 10:09:08 <ekacnet> kai: by implementing this CNAME recursion I really have the impression to duplicate the code of handle_question 04/18/12 10:12:35 <kai> ekacnet: yes, I think to fix this, I think it's time to switch dns_process() to be event-based 04/18/12 10:13:27 <kai> then on a CNAME, you run another query 04/18/12 10:14:10 <kai> makes for a much nicer overall structure of the code, and also will stop a stalling forwarder from stalling the whole dns process 04/18/12 10:14:24 <obnox> kai: good plan 04/18/12 10:16:14 <kai> that's been on the todo list for a while, but this bug seems to call for a bump 04/18/12 10:17:20 <ekacnet> kai: well excuse me for tonight I'll have the partial code duplication solution :-) 04/18/12 10:19:38 <kai> whatever floats your boat 04/18/12 10:19:56 <ekacnet> +1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Kai: Now that the code has been made async, will it be more easy to do the proper fix? With a little more input, I would try to work on some code if you don't have the ressources, currently. Maybe we can discuss this also on irc... Cheers - Michael
Fixed by commit f3df2988ba6928cde0bd89da321bbe74fd76f53f