Our previous administrators renamed our domain, and we are aware that it can cause issues, but it appears to me that samba-tool is not checking and setting the replication epoch prior to attempting to replicate with our Windows Server 2003 R2 Domain Controller. The output of running samba-tool is below. We are running this compiled from your source on CentOS 6.3. I imagine that this issue is likely something that is as rare as unicorn feathers. /usr/local/samba/bin/samba-tool domain join corporate.lavidamassage.com DC -Ucorporate/dave Finding a writeable DC for domain 'corporate.lavidamassage.com' Found DC beelzebub.corporate.lavidamassage.com Password for [CORPORATE\dave]: workgroup is CORPORATE realm is corporate.lavidamassage.com checking sAMAccountName Adding CN=CHARON,OU=Domain Controllers,DC=corporate,DC=lavidamassage,DC=com Adding CN=CHARON,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=corporate,DC=lavidamassage,DC=com Adding CN=NTDS Settings,CN=CHARON,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=corporate,DC=lavidamassage,DC=com Adding SPNs to CN=CHARON,OU=Domain Controllers,DC=corporate,DC=lavidamassage,DC=com Setting account password for CHARON$ Enabling account Calling bare provision No IPv6 address will be assigned Provision OK for domain DN DC=corporate,DC=lavidamassage,DC=com Starting replication Join failed - cleaning up checking sAMAccountName Deleted CN=CHARON,OU=Domain Controllers,DC=corporate,DC=lavidamassage,DC=com Deleted CN=NTDS Settings,CN=CHARON,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=corporate,DC=lavidamassage,DC=com Deleted CN=CHARON,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=corporate,DC=lavidamassage,DC=com ERROR(runtime): uncaught exception - (8593, 'WERR_DS_DIFFERENT_REPL_EPOCHS') File "/usr/local/samba/lib/python2.6/site-packages/samba/netcmd/__init__.py", line 175, in _run return self.run(*args, **kwargs) File "/usr/local/samba/lib/python2.6/site-packages/samba/netcmd/domain.py", line 552, in run machinepass=machinepass, use_ntvfs=use_ntvfs, dns_backend=dns_backend) File "/usr/local/samba/lib/python2.6/site-packages/samba/join.py", line 1104, in join_DC ctx.do_join() File "/usr/local/samba/lib/python2.6/site-packages/samba/join.py", line 1009, in do_join ctx.join_replicate() File "/usr/local/samba/lib/python2.6/site-packages/samba/join.py", line 731, in join_replicate replica_flags=ctx.replica_flags) File "/usr/local/samba/lib/python2.6/site-packages/samba/drs_utils.py", line 248, in replicate (level, ctr) = self.drs.DsGetNCChanges(self.drs_handle, req_level, req)
The following warning accompanies this issue in the windows event log: Event Type: Warning Event Source: NTDS Replication Event Category: DS RPC Server Event ID: 1876 Date: 12/13/2012 Time: 10:21:28 AM User: CORPORATE\dave Computer: SERVER3 Description: The local domain controller cannot replicate with the following remote domain controller because of a mismatched replication epoch (msDS-ReplicationEpoch). This typically occurs as part of the domain rename process. Remote domain controller: CN=NTDS Settings,CN=CHARON,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=corporate,DC=lavidamassage,DC=com Remote domain controller replication epoch: 0 Local domain controller replication epoch: 1 Domain controllers undergoing a domain rename are not allowed to communicate with those domain controllers that have not yet undergone the domain rename. When all domain controllers have completed the domain rename, replication will once again be allowed. For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp. ----------------------------------------------- It seems to be an issue with Line 168 of /source4/rpc_server/drsuapi/dcesrv_drsuapi.c where the epoch is statically assigned a value of 0.
Have the same problem as the original report using 4.0.5 attempting to join a domain that was previously renamed. There appear to be a number of places where repl_epoch is set to 0, though not sure at this point which one will be relevant here.
I'm seeing the same issue with v4-0-stable, trying to add a DC to a w2k3 domain. sudo /usr/local/samba/bin/samba-tool domain join ny.clientdomain.com DC -Uadministrator --realm=ny.clientdomain.com [sudo] password for localadmin: Finding a writeable DC for domain 'ny.clientdomain.com' Found DC rsa-dc-one.ny.clientdomain.com Password for [WORKGROUP\administrator]: workgroup is NY realm is ny.clientdomain.com checking sAMAccountName Adding CN=smbdc,OU=Domain Controllers,DC=ny,DC=clientdomain,DC=com Adding CN=smbdc,CN=Servers,CN=RSA,CN=Sites,CN=Configuration,DC=ny,DC=clientdomain,DC=com Adding CN=NTDS Settings,CN=smbdc,CN=Servers,CN=RSA,CN=Sites,CN=Configuration,DC=ny,DC=clientdomain,DC=com Adding SPNs to CN=smbdc,OU=Domain Controllers,DC=ny,DC=clientdomain,DC=com Setting account password for smbdc$ Enabling account Calling bare provision No IPv6 address will be assigned Provision OK for domain DN DC=ny,DC=clientdomain,DC=com Starting replication Join failed - cleaning up checking sAMAccountName Deleted CN=smbdc,OU=Domain Controllers,DC=ny,DC=clientdomain,DC=com Deleted CN=NTDS Settings,CN=smbdc,CN=Servers,CN=RSA,CN=Sites,CN=Configuration,DC=ny,DC=clientdomain,DC=com Deleted CN=smbdc,CN=Servers,CN=RSA,CN=Sites,CN=Configuration,DC=ny,DC=clientdomain,DC=com ERROR(runtime): uncaught exception - (8593, 'WERR_DS_DIFFERENT_REPL_EPOCHS') File "/usr/local/samba/lib/python2.7/site-packages/samba/netcmd/__init__.py", line 175, in _run return self.run(*args, **kwargs) File "/usr/local/samba/lib/python2.7/site-packages/samba/netcmd/domain.py", line 552, in run machinepass=machinepass, use_ntvfs=use_ntvfs, dns_backend=dns_backend) File "/usr/local/samba/lib/python2.7/site-packages/samba/join.py", line 1104, in join_DC ctx.do_join() File "/usr/local/samba/lib/python2.7/site-packages/samba/join.py", line 1009, in do_join ctx.join_replicate() File "/usr/local/samba/lib/python2.7/site-packages/samba/join.py", line 731, in join_replicate replica_flags=ctx.replica_flags) File "/usr/local/samba/lib/python2.7/site-packages/samba/drs_utils.py", line 248, in replicate (level, ctr) = self.drs.DsGetNCChanges(self.drs_handle, req_level, req)
Having the same problem as the original report. Samba 4.0.6 on CentOS 6.4, joining as a DC to an existing domain. Domain was renamed previously; was originally a Windows 2003 domain and is now Windows 2008 R2.
the issue is not super trivial to fix, because we need to modify a couple of place where it's fixed to 0.
samba ver 4.0.8 Matthieu Patou How to fix this bug?
when this will be fixed?
Posting on the bug report won't get this fixed any faster. There are a limited number of developers working on this area at the moment. Accordingly, I'm resetting the blocker bug to 4.2, because it seems quite unlikely we are going to fix it at this late stage, but if we do (or someone provides a working and tested patch) then of course it is likely to be backported. Sorry, Andrew Bartlett
>Hello! >Can you send me some instruction how to fix bug 9500? So I could do the testing. >Best regards, >Andrey. So first things first you need to query the remote DC for its version of ms-DS-replicationepoch, then you need to put it in a variable so that it could be used in the join code. As a separate effort you need to bump the attribute ms-DS-replicationepoch on a windows DC (a test one). Then you have to modify the python function drs_DsBind so that it takes a optional parameter which will be the replicationepoch you need to make that the calling function drsuapi_connect also accept a parameter and the caller (it seems in join_add_ntdsdsa()) needs to set this epoch. In order to make things easier you might want at the beginning to hard code the value in join_add_ntdsdsa, or as when it's called it's just after a samdb call you might want to pass the content of the variable that you have done at the very beginning. At that point it might replicate a bit *but* it's far from being finished, first you need to add the correct value in the ms-DS-replicationepoch in the NTDS object that we are creating for the samba AD DC. Then you need to modify those files: source4/dsdb/repl/drepl_out_helpers.c source4/dsdb/repl/drepl_service.c source4/libnet/libnet_become_dc.c source4/libnet/libnet_unbecome_dc.c So that we read from the local database or from the remote database the replicationepoch, I suggest that for libnet_become_dc you add a field in the struct becomeDC_drsuapi to store this value read that you will have to read from remote database in a function called from becomeDC_connect_ldap1, you can take inspiration on becomeDC_ldap1_crossref_behavior_version on how to read from the remote database. For the other function it will be simplier as you have "just" to read from the local database. I expect that when you will need more details but already having the initial replication working will require you a bit of coding and you will most probably have questions
how long to wait for a bug fix? The whole plant with 5,000 workers can not go on Linux without fixing this bug.
(In reply to comment #11) > how long to wait for a bug fix? The whole plant with 5,000 workers can not go > on Linux without fixing this bug. Where is your patch ? I mean I have a day job that is not related to Samba and so I'm doing this on my spare time and it's the case for quite a lot of person. Don't expect something very soon, if you do something it will go faster. If you can't do something think of contracting a company to do the changes if you can't then learn how to be patient.
(In reply to comment #12) > (In reply to comment #11) > > how long to wait for a bug fix? The whole plant with 5,000 workers can not go > > on Linux without fixing this bug. > > Where is your patch ? > I mean I have a day job that is not related to Samba and so I'm doing this on > my spare time and it's the case for quite a lot of person. > Don't expect something very soon, if you do something it will go faster. If you > can't do something think of contracting a company to do the changes if you > can't then learn how to be patient. I need more details for patch. Can you give me more detailed path for patching?
Just wanted to add to the discussion that I am obviously experiencing the same problem with Samba4 cloned from git (Version Version 4.2.0pre1-GIT-58cb40d) on current Debian (Kernel 3.2.46-1+deb7u1) and Fedora 19 (Kernel 3.11x) installations. No chance to join an existing domain that has been previously renamed due to the mismatch of the epoch attribute. I found a workaround though I suppose. Changing the Epoch on the existing DC using the "dssite.msc" and resetting the value of msDS-ReplicationEpoch to "not-set" (can be Found un the Default-First-Site-Name/Servers/"DC-Name"/NTDS Settings -> Properties -> Attribute Editor) allowed me to join Samba4 to the existing domain. Currently it is replicating the database! This is however only of use if you do not have a lot of DCs to be reset I am afraid. Looking forward to the patch fixing the issue - any idea when this will arrive?
(In reply to comment #14) > Just wanted to add to the discussion that I am obviously experiencing the same > problem with Samba4 cloned from git (Version Version 4.2.0pre1-GIT-58cb40d) on > current Debian (Kernel 3.2.46-1+deb7u1) and Fedora 19 (Kernel 3.11x) > installations. > No chance to join an existing domain that has been previously renamed due to > the mismatch of the epoch attribute. I found a workaround though I suppose. > > Changing the Epoch on the existing DC using the "dssite.msc" and resetting the > value of msDS-ReplicationEpoch to "not-set" (can be Found un the > Default-First-Site-Name/Servers/"DC-Name"/NTDS Settings -> Properties -> > Attribute Editor) allowed me to join Samba4 to the existing domain. Currently > it is replicating the database! > This is however only of use if you do not have a lot of DCs to be reset I am > afraid. > > Looking forward to the patch fixing the issue - any idea when this will arrive? I say thank you!!!I'm testing "dssite.msc" now!!!
The initial reaction of my contacts at Microsoft is that removing this value would be "catastrophic". I strongly urge users of Samba and of Microsoft AD not to do this. The reason is that this is a fundamental part of the replication state. I'm asking for further clarification as if there are any circumstances under which changing this might be valid, or safe, but I do want to make my grave concerns known regarding this workaround. On the original issue, I do not have a timeframe for a fix at this point.
Any news on this one?
when?
Sadly no progress on this one. We need patches to: - check the replication epoch - add the replication epoch into all the usn calcuations - send the new replication epoch - test all the above Preferably these patches would include a tool to rename the domain, so as to be both useful and set the replication epoch in the test environment. This is not a small task, somewhere between two and four weeks of developer time, so I would ask for patience from our users who understandably are keen to see this feature implemented, or patches to at least get us started (every bit helps).
(In reply to comment #18) > when? If it's so urgent for you might want to get some commercial support to speed this up. See https://www.samba.org/samba/support/globalsupport.html
We can't migrate to Samba because of this bug, so join solution is not an option at this moment. But what about some other solution like migrating users with passwords to a new provisioned Samba DC from scratch? Are we out of all options?
(In reply to JME from comment #14) @JME Hoping you are still reading this thread. Have you reach some sort of problem setting msDS-ReplicationEpoch to "not-set"? Even on these 6 years? Thanks! P.S.: someone else tried to set this value to "not-set"? (obviously I am experiencing the same problem after renamed a domain and trying to join with a new samba AD DC)