Bug 5917 - Samba does not work on site with Read Only Domain Controller
Samba does not work on site with Read Only Domain Controller
Status: RESOLVED FIXED
Product: Samba 3.6
Classification: Unclassified
Component: Winbind
3.6.18
Other Linux
: P3 normal
: ---
Assigned To: Karolin Seeger
Samba QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-11-24 03:59 UTC by Yen Liew
Modified: 2013-09-09 08:00 UTC (History)
3 users (show)

See Also:


Attachments
smb.conf (3.90 KB, application/octet-stream)
2008-11-24 04:08 UTC, Yen Liew
no flags Details
logfile when run smbclient -U aduser -L localhost (192.57 KB, application/tgz)
2008-11-24 04:10 UTC, Yen Liew
no flags Details
Output when run net ads join to domain from RODC site (183.69 KB, text/x-log)
2008-11-24 04:13 UTC, Yen Liew
no flags Details
Network trace when run smbclient (16.18 KB, application/octet-stream)
2008-11-24 04:29 UTC, Yen Liew
no flags Details
network trace when run ads join (35.56 KB, application/octet-stream)
2008-11-24 04:56 UTC, Yen Liew
no flags Details
network trace when XP (located in RODC Site) join to domain (106.75 KB, application/octet-stream)
2008-11-24 05:54 UTC, Yen Liew
no flags Details
Network trace for AD user login to XP in RODC site, right after XP join to domain (122.98 KB, application/octet-stream)
2008-11-24 05:57 UTC, Yen Liew
no flags Details
Test patch for v3-6-test. (3.71 KB, patch)
2013-08-31 03:42 UTC, Jeremy Allison
no flags Details
Fixed version. (3.72 KB, patch)
2013-08-31 03:44 UTC, Jeremy Allison
no flags Details
New patch fixing the memory leak. (3.80 KB, patch)
2013-09-03 16:45 UTC, Jeremy Allison
no flags Details
git-am fix for master, 4.1.0, 4.0.next and 3.6.next (8.22 KB, patch)
2013-09-03 19:21 UTC, Jeremy Allison
no flags Details
git-am fix for master, 4.1.0, 4.0.next and 3.6.next (8.19 KB, patch)
2013-09-03 19:23 UTC, Jeremy Allison
no flags Details
Patchset that went into master. (10.10 KB, patch)
2013-09-03 23:31 UTC, Jeremy Allison
no flags Details
git-am fix with the cherry-pick info. (10.44 KB, patch)
2013-09-03 23:47 UTC, Jeremy Allison
abartlet: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yen Liew 2008-11-24 03:59:19 UTC
There are 2 issues when Samba server runs on site with Read Only Domain Controller : 

Issue 1 
--------
  1) When run  "net ads join -U administrator" , error 
    "Failed to join domain: NT_STATUS_NOT_SUPPORTED" is returned.  
  2) And when joining to same domain using Windows XP located in the same site, 
     XP can join to domain successfully. 
 
  So, to workaround this join problem, I have to remove gencache.tdb, and add password server="writable DC"  to smb.conf.    

Issue 2
-------
  So, after the computer join domain successfully,  and wait until the computer object in AD is replicated to the Read Only Site. 
  Then, I removed the password server="writable DC".  
  --> And, attempted to list share directory as AD user, always failed with  NT_STATUS_NO_TRUST_SAM_ACCOUNT: 
    smbclient -U <ad user>\\<NTLM Domain> -L localhost
Comment 1 Yen Liew 2008-11-24 04:08:42 UTC
Created attachment 3756 [details]
smb.conf
Comment 2 Yen Liew 2008-11-24 04:10:16 UTC
Created attachment 3757 [details]
logfile when run smbclient -U aduser -L localhost
Comment 3 Yen Liew 2008-11-24 04:13:28 UTC
Created attachment 3758 [details]
Output when run net ads join to domain from RODC site
Comment 4 Stefan Metzmacher 2008-11-24 04:17:11 UTC
Could you please provide network captures of wireshark or tcpdump -s 0
of the windows xp box joining + a logon of a user and maybe a password change.

And also network captures of the failing net ads join and smbclient commands.
Comment 5 Yen Liew 2008-11-24 04:29:03 UTC
Created attachment 3759 [details]
Network trace when run smbclient  

see pkt 36, when the machine trying to authenticate itself using NetrServerAuthenticate2, the RODC return with error NO_TRUST_SAM_ACCOUNT.
Comment 6 Yen Liew 2008-11-24 04:56:01 UTC
Created attachment 3760 [details]
network trace when run ads join 

didn't set password server to writable DC in the smb.conf . 
Run "net ads join -U administrator"
Comment 7 Yen Liew 2008-11-24 05:54:50 UTC
Created attachment 3761 [details]
network trace  when XP (located in RODC Site) join to domain 

RODC IP : 192.168.0.113
XP IP : 192.168.3.242
Comment 8 Yen Liew 2008-11-24 05:57:20 UTC
Created attachment 3762 [details]
Network trace for AD user login to XP in RODC site, right after XP join to domain

note that :The XP computer object has not yet been replicated to the RODC.
Comment 9 Yen Liew 2008-11-24 05:58:38 UTC
Hi metze, the requested network traces are attached in comment#7 and comment#8
Comment 10 Hemanth 2013-08-28 08:22:44 UTC
Facing same issue with Samba 3.6.12 as well. Domain join in an RODC site(only RODC in the site) is always contacting the RODC though we request for writable DC.

  status = dsgetdcname(mem_ctx,
                                     r->in.msg_ctx,
                                     r->in.domain_name,
                                     NULL,
                                     NULL,
                                     DS_FORCE_REDISCOVERY |
                                     DS_DIRECTORY_SERVICE_REQUIRED |
                                     DS_WRITABLE_REQUIRED | =====>>
                                     DS_RETURN_DNS_NAME,
                                     &info);


Whereas Other windows clients in the same site is able to contact the Writable DC outside the site and join is successful.
Comment 11 Hemanth 2013-08-30 16:46:48 UTC
I am able to figure out the root cause. Looks like we are not doing domain level DC discovery if we find few DCs at site level.

In the code , initially discover_dc_dns() will find DCs at site level first. And if the number of DCs returned from site is zero then it will try to fetch the DCs at domain level(by setting site_name to NULL).

DC validation is actually done later in process_dc_dns(). There we realize that the list of DCs are not valid for domain join as in this case they are not writable.

Later discover_dc_netbios attempt also failing and finally dsgetdcname_rediscover() is failing with "domain controller not found" error and carried back to join failure.

Here I am proposing a patch which will make this case work.

--- source3/libsmb/dsgetdcname.c.orig   2013-08-30 05:48:26.000000000 -0700
+++ source3/libsmb/dsgetdcname.c        2013-08-30 05:49:56.000000000 -0700
@@ -1152,16 +1152,28 @@ NTSTATUS dsgetdcname(TALLOC_CTX *mem_ctx
                                        &myinfo);

  done:
-       SAFE_FREE(query_site);

        if (!NT_STATUS_IS_OK(status)) {
                if (!first) {
                        *info = first_info;
+                       SAFE_FREE(query_site);
                        return NT_STATUS_OK;
                }
+               else if( (query_site) && (query_site[0] != '\0') ) {
+                        /* DC discovery failed at site level. Could be there are no local writable DCs.
+                         * Lets try to discover DCs at domain level by sending site name as NULL.
+                         */
+                        first = false;
+                        first_info = myinfo;
+                        SAFE_FREE(query_site);
+                        query_site = NULL;
+                        goto rediscover;
+                }
+               SAFE_FREE(query_site);
                return status;
        }

+       SAFE_FREE(query_site);
        if (!first) {
                TALLOC_FREE(first_info);
        } else if (!is_closest_site(myinfo)) {


I am going for rediscovery at domain level once we are unable to find valid DCs at site level.

I have tested this and able to join the domain outside of RODC site. Please let me know if this looks correct.
Comment 12 Jeremy Allison 2013-08-31 00:08:42 UTC
Looks to me like the underlying bug is the horrible decision inside of :

NTSTATUS dsgetdcname(TALLOC_CTX *mem_ctx,
                     struct messaging_context *msg_ctx,
                     const char *domain_name,
                     const struct GUID *domain_guid,
                     const char *site_name,
                     uint32_t flags,
                     struct netr_DsRGetDCNameInfo **info)

to add the code:

        if ((site_name == NULL) || (site_name[0] == '\0')) {
                query_site = sitename_fetch(domain_name);
        } else {
                query_site = SMB_STRDUP(site_name);
        }

which implicitly sets the site name if set to NULL by the caller.

Can we fix the underlying callers not to do this and remove the implicit setting of site_name instead ?

Jeremy.
Comment 13 Jeremy Allison 2013-08-31 00:32:45 UTC
Yeah, this code is a mess.

Almost all callers of dsgetdcname() set site_name to NULL, except for _wbint_DsGetDcName() inside winbindd, which sets site_name to "" if it's passed in as NULL (which in itself is wrong - sitename should be set to NULL if it's not required as it's being used in a talloc_asprintf() to add a %s string into a DNS lookup string).

The only other caller that sets site_name to non-null is the 'net lookup dsgetdcname' code I think.

I think we should change dsgetdcname() not to do the implicit setting of site_name, and add a function dsgetdcname_anysite() that first calls dsgetdcname() with site_name set to sitename_fetch(domain_name), and if that fails then re-calls dsgetdcname() with a site_name of NULL (if sitename_fetch(domain_name) didn't return NULL in the first place).

Then change all the callers of dsgetdcname() that are currently using the implicit NULL site_name to call dsgetdcname_anysite() instead.

Jeremy.
Comment 14 Jeremy Allison 2013-08-31 03:42:16 UTC
Created attachment 9178 [details]
Test patch for v3-6-test.

Can you try the attached fix ? It should have the same effect as your patch but is a little cleaner and preserves the intent of the callers I think. If you can confirm it works I'll get it pushed to master and into release branches.

Cheers,

Jeremy.
Comment 15 Jeremy Allison 2013-08-31 03:44:16 UTC
Created attachment 9179 [details]
Fixed version.

With NULL -> "(null)" in the debug :-).

Jeremy.
Comment 16 Hemanth 2013-09-02 16:13:34 UTC
(In reply to comment #15)
> Created attachment 9179 [details]
> Fixed version.
> 
> With NULL -> "(null)" in the debug :-).
> 
> Jeremy.

Thanks Jeremy. I have tested the patch. It's working fine.
Comment 17 Hemanth 2013-09-03 06:24:07 UTC
Jeremy,

In the new dsgetdcname() ...

...
if ((site_name == NULL) || (site_name[0] == '\0')) {
                query_site = sitename_fetch(domain_name);

...
Shouldn't we do SAFE_FREE(query_site) before the redisovery with site name as NULL? sitename_fetch() is returning a string which is string duped. We need to free this up right?
Comment 18 Jeremy Allison 2013-09-03 15:44:56 UTC
Good catch. I'll fix that up and re-submit. Thanks for the review !
Jeremy.
Comment 19 Jeremy Allison 2013-09-03 16:45:58 UTC
Created attachment 9182 [details]
New patch fixing the memory leak.

Let me know if you're happy with this and I'll get it into all released branches.

Jeremy.
Comment 20 Hemanth 2013-09-03 17:06:01 UTC
Yeah. This looks good Jeremy. 
Thanks.
Comment 21 Jeremy Allison 2013-09-03 19:21:47 UTC
Created attachment 9183 [details]
git-am fix for master, 4.1.0, 4.0.next and 3.6.next

Patchset split into micro-commits to make it clearer. Applies cleanly to master, 4.1.next, 4.0.next, 3.6.next.

Jeremy.
Comment 22 Jeremy Allison 2013-09-03 19:23:34 UTC
Created attachment 9184 [details]
git-am fix for master, 4.1.0, 4.0.next and 3.6.next

Fixed the commit messages to remove the erroneous "First part" etc. text.

Jeremy.
Comment 23 Jeremy Allison 2013-09-03 23:29:45 UTC
Ok, here are the commits that went into master:

git cherry-pick -x dd12bfbcbf359c1642cc2e968aec62ae904aad5d
git cherry-pick -x 66006be7ef703b2935334633d27641050cee5f58
git cherry-pick -x 181c11066bd53b07015a199f56eb71182e89ff71
git cherry-pick -x 68e7b1c9446c7d1274b0fb85b59b90ac1a7f6041
git cherry-pick -x bdab6f9431715fbfd28f8cc0dfb4dde2966f22f3

Jeremy.
Comment 24 Jeremy Allison 2013-09-03 23:31:07 UTC
Created attachment 9186 [details]
Patchset that went into master.

Applies cleanly to 4.1.0, 4.0.next, 3.6.next.

Jeremy
Comment 25 Jeremy Allison 2013-09-03 23:31:44 UTC
This was already reviewed by Andrew Bartlett and Richard Sharpe. Re-assigning to Karolin for inclusion in 4.1.0, 4.0.next, 3.6.next.

Jeremy.
Comment 26 Andrew Bartlett 2013-09-03 23:36:52 UTC
Comment on attachment 9186 [details]
Patchset that went into master.

sorry to be a pain, but can you get that with the 'cherry-pick -x' markers?

(Otherwise, I can get those and re-upload.  It helps us track the matching patches though the branches).
Comment 27 Jeremy Allison 2013-09-03 23:47:43 UTC
Created attachment 9187 [details]
git-am fix with the cherry-pick info.

Patchset containing cherry-pick info as Andrew requested.
Jeremy.
Comment 28 Andrew Bartlett 2013-09-03 23:58:46 UTC
Comment on attachment 9187 [details]
git-am fix with the cherry-pick info.

Thanks, that looks good!
Comment 29 Jeremy Allison 2013-09-04 22:39:18 UTC
Re-assigning to Karolin for inclusion in 4.1.0, 4.0.next, 3.6.next.

Jeremy.
Comment 30 Karolin Seeger 2013-09-06 08:51:12 UTC
Pushed to autbuild-v4-1-test, autobuild-v4-0-test and v3-6-test.
Comment 31 Karolin Seeger 2013-09-09 08:00:21 UTC
Pushed to v4-1-test and v4-0-test.
Closing out bug report.

Thanks!