There are 2 issues when Samba server runs on site with Read Only Domain Controller : Issue 1 -------- 1) When run "net ads join -U administrator" , error "Failed to join domain: NT_STATUS_NOT_SUPPORTED" is returned. 2) And when joining to same domain using Windows XP located in the same site, XP can join to domain successfully. So, to workaround this join problem, I have to remove gencache.tdb, and add password server="writable DC" to smb.conf. Issue 2 ------- So, after the computer join domain successfully, and wait until the computer object in AD is replicated to the Read Only Site. Then, I removed the password server="writable DC". --> And, attempted to list share directory as AD user, always failed with NT_STATUS_NO_TRUST_SAM_ACCOUNT: smbclient -U <ad user>\\<NTLM Domain> -L localhost
Created attachment 3756 [details] smb.conf
Created attachment 3757 [details] logfile when run smbclient -U aduser -L localhost
Created attachment 3758 [details] Output when run net ads join to domain from RODC site
Could you please provide network captures of wireshark or tcpdump -s 0 of the windows xp box joining + a logon of a user and maybe a password change. And also network captures of the failing net ads join and smbclient commands.
Created attachment 3759 [details] Network trace when run smbclient see pkt 36, when the machine trying to authenticate itself using NetrServerAuthenticate2, the RODC return with error NO_TRUST_SAM_ACCOUNT.
Created attachment 3760 [details] network trace when run ads join didn't set password server to writable DC in the smb.conf . Run "net ads join -U administrator"
Created attachment 3761 [details] network trace when XP (located in RODC Site) join to domain RODC IP : 192.168.0.113 XP IP : 192.168.3.242
Created attachment 3762 [details] Network trace for AD user login to XP in RODC site, right after XP join to domain note that :The XP computer object has not yet been replicated to the RODC.
Hi metze, the requested network traces are attached in comment#7 and comment#8
Facing same issue with Samba 3.6.12 as well. Domain join in an RODC site(only RODC in the site) is always contacting the RODC though we request for writable DC. status = dsgetdcname(mem_ctx, r->in.msg_ctx, r->in.domain_name, NULL, NULL, DS_FORCE_REDISCOVERY | DS_DIRECTORY_SERVICE_REQUIRED | DS_WRITABLE_REQUIRED | =====>> DS_RETURN_DNS_NAME, &info); Whereas Other windows clients in the same site is able to contact the Writable DC outside the site and join is successful.
I am able to figure out the root cause. Looks like we are not doing domain level DC discovery if we find few DCs at site level. In the code , initially discover_dc_dns() will find DCs at site level first. And if the number of DCs returned from site is zero then it will try to fetch the DCs at domain level(by setting site_name to NULL). DC validation is actually done later in process_dc_dns(). There we realize that the list of DCs are not valid for domain join as in this case they are not writable. Later discover_dc_netbios attempt also failing and finally dsgetdcname_rediscover() is failing with "domain controller not found" error and carried back to join failure. Here I am proposing a patch which will make this case work. --- source3/libsmb/dsgetdcname.c.orig 2013-08-30 05:48:26.000000000 -0700 +++ source3/libsmb/dsgetdcname.c 2013-08-30 05:49:56.000000000 -0700 @@ -1152,16 +1152,28 @@ NTSTATUS dsgetdcname(TALLOC_CTX *mem_ctx &myinfo); done: - SAFE_FREE(query_site); if (!NT_STATUS_IS_OK(status)) { if (!first) { *info = first_info; + SAFE_FREE(query_site); return NT_STATUS_OK; } + else if( (query_site) && (query_site[0] != '\0') ) { + /* DC discovery failed at site level. Could be there are no local writable DCs. + * Lets try to discover DCs at domain level by sending site name as NULL. + */ + first = false; + first_info = myinfo; + SAFE_FREE(query_site); + query_site = NULL; + goto rediscover; + } + SAFE_FREE(query_site); return status; } + SAFE_FREE(query_site); if (!first) { TALLOC_FREE(first_info); } else if (!is_closest_site(myinfo)) { I am going for rediscovery at domain level once we are unable to find valid DCs at site level. I have tested this and able to join the domain outside of RODC site. Please let me know if this looks correct.
Looks to me like the underlying bug is the horrible decision inside of : NTSTATUS dsgetdcname(TALLOC_CTX *mem_ctx, struct messaging_context *msg_ctx, const char *domain_name, const struct GUID *domain_guid, const char *site_name, uint32_t flags, struct netr_DsRGetDCNameInfo **info) to add the code: if ((site_name == NULL) || (site_name[0] == '\0')) { query_site = sitename_fetch(domain_name); } else { query_site = SMB_STRDUP(site_name); } which implicitly sets the site name if set to NULL by the caller. Can we fix the underlying callers not to do this and remove the implicit setting of site_name instead ? Jeremy.
Yeah, this code is a mess. Almost all callers of dsgetdcname() set site_name to NULL, except for _wbint_DsGetDcName() inside winbindd, which sets site_name to "" if it's passed in as NULL (which in itself is wrong - sitename should be set to NULL if it's not required as it's being used in a talloc_asprintf() to add a %s string into a DNS lookup string). The only other caller that sets site_name to non-null is the 'net lookup dsgetdcname' code I think. I think we should change dsgetdcname() not to do the implicit setting of site_name, and add a function dsgetdcname_anysite() that first calls dsgetdcname() with site_name set to sitename_fetch(domain_name), and if that fails then re-calls dsgetdcname() with a site_name of NULL (if sitename_fetch(domain_name) didn't return NULL in the first place). Then change all the callers of dsgetdcname() that are currently using the implicit NULL site_name to call dsgetdcname_anysite() instead. Jeremy.
Created attachment 9178 [details] Test patch for v3-6-test. Can you try the attached fix ? It should have the same effect as your patch but is a little cleaner and preserves the intent of the callers I think. If you can confirm it works I'll get it pushed to master and into release branches. Cheers, Jeremy.
Created attachment 9179 [details] Fixed version. With NULL -> "(null)" in the debug :-). Jeremy.
(In reply to comment #15) > Created attachment 9179 [details] > Fixed version. > > With NULL -> "(null)" in the debug :-). > > Jeremy. Thanks Jeremy. I have tested the patch. It's working fine.
Jeremy, In the new dsgetdcname() ... ... if ((site_name == NULL) || (site_name[0] == '\0')) { query_site = sitename_fetch(domain_name); ... Shouldn't we do SAFE_FREE(query_site) before the redisovery with site name as NULL? sitename_fetch() is returning a string which is string duped. We need to free this up right?
Good catch. I'll fix that up and re-submit. Thanks for the review ! Jeremy.
Created attachment 9182 [details] New patch fixing the memory leak. Let me know if you're happy with this and I'll get it into all released branches. Jeremy.
Yeah. This looks good Jeremy. Thanks.
Created attachment 9183 [details] git-am fix for master, 4.1.0, 4.0.next and 3.6.next Patchset split into micro-commits to make it clearer. Applies cleanly to master, 4.1.next, 4.0.next, 3.6.next. Jeremy.
Created attachment 9184 [details] git-am fix for master, 4.1.0, 4.0.next and 3.6.next Fixed the commit messages to remove the erroneous "First part" etc. text. Jeremy.
Ok, here are the commits that went into master: git cherry-pick -x dd12bfbcbf359c1642cc2e968aec62ae904aad5d git cherry-pick -x 66006be7ef703b2935334633d27641050cee5f58 git cherry-pick -x 181c11066bd53b07015a199f56eb71182e89ff71 git cherry-pick -x 68e7b1c9446c7d1274b0fb85b59b90ac1a7f6041 git cherry-pick -x bdab6f9431715fbfd28f8cc0dfb4dde2966f22f3 Jeremy.
Created attachment 9186 [details] Patchset that went into master. Applies cleanly to 4.1.0, 4.0.next, 3.6.next. Jeremy
This was already reviewed by Andrew Bartlett and Richard Sharpe. Re-assigning to Karolin for inclusion in 4.1.0, 4.0.next, 3.6.next. Jeremy.
Comment on attachment 9186 [details] Patchset that went into master. sorry to be a pain, but can you get that with the 'cherry-pick -x' markers? (Otherwise, I can get those and re-upload. It helps us track the matching patches though the branches).
Created attachment 9187 [details] git-am fix with the cherry-pick info. Patchset containing cherry-pick info as Andrew requested. Jeremy.
Comment on attachment 9187 [details] git-am fix with the cherry-pick info. Thanks, that looks good!
Re-assigning to Karolin for inclusion in 4.1.0, 4.0.next, 3.6.next. Jeremy.
Pushed to autbuild-v4-1-test, autobuild-v4-0-test and v3-6-test.
Pushed to v4-1-test and v4-0-test. Closing out bug report. Thanks!