Bug 13298 - Samba fails to join Active Directory with WERR_DNS_ERROR_RCODE_NAME_ERROR
Summary: Samba fails to join Active Directory with WERR_DNS_ERROR_RCODE_NAME_ERROR
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: AD: LDB/DSDB/SAMDB (show other bugs)
Version: 4.8.0rc2
Hardware: x64 Linux
: P5 major (vote)
Target Milestone: ---
Assignee: Andrew Bartlett
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-24 20:16 UTC by Carsten Menke
Modified: 2021-07-03 09:18 UTC (History)
4 users (show)

See Also:


Attachments
Trace of the Join Process (19.54 KB, text/plain)
2018-02-24 20:16 UTC, Carsten Menke
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Carsten Menke 2018-02-24 20:16:16 UTC
Created attachment 13980 [details]
Trace of the Join Process

We wanted to join a Windows Server 2003 AD as a DC, but Samba fails to join the DC with WERR_DNS_ERROR_RCODE_NAME_ERROR. We tested it with earlier versions and found out that it works perfectly with Samba 4.6.13, and Samba 4.5, however Samba 4.7.5 and up to 4.8.0rc3 fail. 

This setup is a little bit special so I will describe in Detail what we've done. We want to migrate a Windows Server 2003, this was installed in mixed mode initially years ago, we then bumped up the Schema to 2003. As the naming design was not good chosen we also needed to do a domain rename procedure on the 2003 DC. dcdiag and repadmin showed no errors before the renaming and do also not show errors after the rename. After the rename Windows Clients can join the domain without problems.

We've already read that setting ms-DS-ReplicationEpoch = 0 is required, which we've done with ADSIEDIT after the rename.

With Samba < 4.7 we are able to join the domain, with Samba >= 4.7 the join fails.
I've tested that the join proceeds before the domain rename with all versions. 

I substituted the Real domain Name by "example" for disclosure reasons in this bug report.

Domain before rename was: example.local, Workgroup: EXAMPLE
this was renamed to:
Domain: intranet.example.com, Workgroup: INTRANET

DNS is working correctly, all references are updated to the new settings netdiag shows now errors.

I'll attach a Debug level 5 trace to this report, setting the debug level higher does not give more information as the failing part does not getting more verbose and the error message remains the same.

I've got a copy of the server 2003 before the rename as a virtual machine so I'm able to help and test things.

We used the Sernet Packages for CentOS 7.4
Comment 1 Carsten Menke 2018-02-24 20:18:57 UTC
Sorry it is Debug Level 3, however raising the level does not contain more informations on before or after the line, so I don't have even an idea *why* the join fails (even tested with Debug Level 9 this does not show the real reason

Join failed - cleaning up
Comment 2 Rick H 2020-01-25 23:12:35 UTC
I can verify that this bug occurs, at least for me, with
the Ubuntu 18.04 LTS server (buster) distro of Samba. (Samba v4.7.6)

Testing on Debian 9 Jessie (Samba v4.5.16) was successful and this error does not occur.

Both tests were done joining a samba DC to a Windows Server 2008 R2 domain:

samba-tool domain join SAMDOM.mydomain.com DC -U"SAMDOM\administrator" -d5

Since this bug is still status NEW, I'm assuming it's still unresolved.
Comment 3 Frank Rochlitzer 2021-02-13 13:17:39 UTC
Summary: Add additional DC to existing Microsoft Win2k8R2 Domain

What was I going to do?
There is one existing Microsoft Win2k11 small business server with one domain.
The domain function level is Windows 2008 R2.
We would add an additional Samba 4.11 (Ubuntu 20.x focal) server as secondary dc to replace later the Microsoft Win2k11 SBS.
The first problem was related to the existing DNS Zones (Forward and Reverse). Some of them were configured in Win2000 mode,
so they are not part of the ForestDnsZone or the DomainDnsZone. I've solved this in the Microsoft Win2k11 SBS DNS configuration
and configured the replication mode of every Zone to the DomainDNSZone (Replicate to all DNS Servers in this domain).
See: https://serverfault.com/questions/578666/change-active-directory-dns-replication-from-domain-level-to-forest

Then I started the samba domain join process with the following command:
samba-tool domain join mydomain.local DC  --username='Administrator'  --workgroup='mydomain'  --dns-backend=BIND9_DLZ  --realm='MYDOMAIN.LOCAL'  --site='Default-First-Site-Name'

But this command always fails with the following error:
	INFO 2021-02-12 16:39:57,043 pid:52889 /usr/lib/python3/dist-packages/samba/join.py #1159: Adding DNS record TEST.mydomain.local
	INFO 2021-02-12 16:39:57,056 pid:52889 /usr/lib/python3/dist-packages/samba/join.py #1180: Adding DNS A record TEST.mydomain.local for IPv4 IP: 192.168.10.204
	Join failed - cleaning up
	ldb_wrap open of secrets.ldb
	Could not find machine account in secrets database: Failed to fetch machine account password for MYDOMAIN from both secrets.ldb (Could not find entry to match filter: '(&(flatname=MYDOMAIN)(objectclass=primaryDomain))' base: 'cn=Primary Domains': No such object: dsdb_search at ../../source4/dsdb/common/util.c:4733) and from /var/lib/samba/private/secrets.tdb: NT_STATUS_CANT_ACCESS_DOMAIN_INFO
	Deleted CN=RID Set,CN=TEST,OU=Domain Controllers,DC=mydomain,DC=local
	Deleted CN=TEST,OU=Domain Controllers,DC=mydomain,DC=local
	Deleted CN=dns-TEST,CN=Users,DC=mydomain,DC=local
	Deleted CN=NTDS Settings,CN=TEST,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=mydomain,DC=local
	Deleted CN=TEST,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=mydomain,DC=local
	ERROR(runtime): uncaught exception - (9003, 'WERR_DNS_ERROR_RCODE_NAME_ERROR')
		File "/usr/lib/python3/dist-packages/samba/netcmd/__init__.py", line 186, in _run
			return self.run(*args, **kwargs)
		File "/usr/lib/python3/dist-packages/samba/netcmd/domain.py", line 701, in run
			join_DC(logger=logger, server=server, creds=creds, lp=lp, domain=domain,
		File "/usr/lib/python3/dist-packages/samba/join.py", line 1560, in join_DC
			ctx.do_join()
		File "/usr/lib/python3/dist-packages/samba/join.py", line 1457, in do_join
			ctx.join_add_dns_records()
		File "/usr/lib/python3/dist-packages/samba/join.py", line 1198, in join_add_dns_records
			= ctx.samdb.dns_lookup("%s.%s" % (name, zone),
		File "/usr/lib/python3/dist-packages/samba/samdb.py", line 1176, in dns_lookup
			return dsdb_dns.lookup(self, dns_name,
			
After many hours of googling and searching for configuration errors (systemd-resolve, Windows domain configuration etc.) I found some similar errors:
https://www.spinics.net/lists/samba/msg158582.html many thanks to Rowland to point me to the right direction
https://github.com/zentyal/zentyal/issues/1845

I don't know why but https://wiki.samba.org/index.php/Samba_AD_DC_Troubleshooting was one of the last results while I was looking for a solution. But never mind.

So I've looked a little bit deeper to the code of samba/join.py file.
While I played around with the Domain- and ForestDnsZone configuration in Windows I've recognized that the join process fails on different positions in join.py.
While I configured the DNS Settings of mydomain.local to DomainDnsZone I passed line 1205 (the script stopped at line 1220):
	ctx.logger.info("Adding DNS CNAME record %s.%s for %s"
					% (msdcs_cname, msdcs_zone, cname_target))
And If configured the DNS settings of mydomain.local to ForestDnsZone I passed only line 1177 (the script stopped at line 1194:
	ctx.logger.info("Adding DNS A record %s.%s for IPv4 IP: %s"
					% (name, zone, IP))
					
So I've looked to the lines 1194 and 1220:
Line 1194:
(ctx.dns_a_dn, ldap_record) \
                = ctx.samdb.dns_lookup("%s.%s" % (name, zone),
                                       dns_partition=domaindns_zone_dn)
Line 1220:
(ctx.dns_cname_dn, ldap_record) \
                = ctx.samdb.dns_lookup("%s.%s" % (msdcs_cname, msdcs_zone),
                                       dns_partition=forestdns_zone_dn)

Solution:
But this conditions are only fulfilled if the msdcs zone is configured to ForestDnsZone and the main domain is configured to DomainDnsZone.
But this is in windows DNS settings configurable.
After I changed line 1196 from "dns_partition=domaindns_zone_dn)" to "dns_partition=forestdns_zone_dn)", the join process ended successfully.
So either the windows configuration is like I described to fulfill the condition or the join process will always fail.
I think this is a bug in join.py and the dns_lookup have to be called for both zones (DomainDnsZone and ForestDnsZone). Only if the lookup for both zones fails, the join process must also fail.

This bug doesn't exists in versions below 4.7, because you have to add the dns records in this versions manually.
Comment 4 Andrew Bartlett 2021-02-16 02:18:08 UTC
Thanks for the careful analysis. 

Yes, older Samba versions did the DNS update after the join, but this was also error-prone so we moved it into the join.

What we are trying (and failing...) to do is to find the record so we can correct the ownership, which is the one thing we can't do over RPC.
Comment 5 stefan.bauer 2021-06-28 10:11:58 UTC
Unfortunately the fix in comment 3 did not help in my case:

2021-06-25 17:30:25,617 ERROR(<type 'exceptions.UnboundLocalError'>): uncaught exception - local variable 'forestdns_zone_dn' referenced before assignment
2021-06-25 17:30:25,618   File "/usr/lib/python2.7/dist-packages/samba/netcmd/__init__.py", line 185, in _run

Will there be any update from samba team?

thank you.
Comment 6 stefan.bauer 2021-07-03 09:18:30 UTC
Want to confirm, that i now could join the system to the windows domain WITHOUT modifying any source with samba 4.10.18 (part of UCS / Univention).

All i had to do is make sure, as already mentioned,

that the msdcs zone is configured to ForestDnsZone and the main domain is configured to DomainDnsZone.

In my case, the main domain was set in server 2000 compatibility mode and the independend msdcs-zone did not exist at all.