Bug 9899 - winbind_lookup_names() fails because of NT_STATUS_CANT_ACCESS_DOMAIN_INFO
Summary: winbind_lookup_names() fails because of NT_STATUS_CANT_ACCESS_DOMAIN_INFO
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.6
Classification: Unclassified
Component: Winbind (show other bugs)
Version: 3.6.15
Hardware: All All
: P5 major
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
: 9615 (view as bug list)
Depends on: 9615
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-21 12:25 UTC by Florian Riehm
Modified: 2013-09-20 17:00 UTC (History)
5 users (show)

See Also:


Attachments
Patch: Set can_do_ncacn_ip_tcp to FALSE after cm_connect_lsat has failed (447 bytes, patch)
2013-05-21 12:25 UTC, Florian Riehm
no flags Details
git-am fix for 4.1.0, 4.0.next, 3.6.next (1.79 KB, patch)
2013-08-14 00:21 UTC, Jeremy Allison
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Florian Riehm 2013-05-21 12:25:18 UTC
Created attachment 8906 [details]
Patch: Set can_do_ncacn_ip_tcp to FALSE after cm_connect_lsat has failed

Hi,

I'm using samba under OpenBSD for ntlm authentication with an windows server
2008 active directory. During the upgrade of OpenBSD 5.2 to OpenBSD 5.3 samba
got upgraded from samba-3.6.6 to samba-3.6.12.

Now winbindd doesn't work anymore as before:
Samba 3.6.6:
# wbinfo -n inetuser1
S-1-5-21-1262556113-2025608247-2761087495-1108 SID_USER (1)

Samba 3.6.12:
# wbinfo -n inetuser1
failed to call wbcLookupName: WBC_ERR_DOMAIN_NOT_FOUND
Could not lookup name inetuser1

The error reason is cm_connect_lsat() in winbindd_lookup_names() returns
NT_STATUS_CANT_ACCESS_DOMAIN_INFO. The error occurs since commit
c64473ab88ca36462e7976bf0006bc092386894c (Bug 9439 - ncacn_ip_tcp reconnection
code for lsa lookups still broken) in samba-3.6.10.

In the suspicious commit the winbindd behavior changes. Before the commit,
after a failed cm_connect_lsa_tcp() domain->can_do_ncacn_ip_tcp gets set to
false and cm_connect_lsat() was called to try lookup again.
Now after a failed cm_connect_lsat it just returns without changing
domain->can_do_ncacn_ip_tcp.

The attached diff fixes the problem for me, but I am not sure if it's the right
place for the fix. Please have a look to the diff an think about it.

Attached you find my smb.conf. I'm also not sure why I run into the NT_STATUS_CANT_ACCESS_DOMAIN_INFO error. Does it work as intended or do I have to fix my smb.conf?

Thanks in advance!

Regards,

Florian.


# cat smb.conf 
[global]
netbios name = TST1
server string = tst1
workgroup = TEST
security = DOMAIN
encrypt passwords = yes
password server = 192.168.1.1
preferred master = no
local master = no
domain master = no
dns proxy = no
ldap ssl = no
winbind separator = +
;winbind uid = 10000-20000
;winbind gid = 10000-20000
idmap config * : range = 10000-20000
client schannel = no
server schannel = no
winbind use default domain = yes
winbind cache time = 10
winbind enum users = yes
log file = /var/log/%m.log
log level = 5
client ntlmv2 auth = yes
Comment 1 Florian Riehm 2013-06-18 20:02:48 UTC
Hi,

I have tested my problem now with the most recent samba version.
The bug still occurs. I'm pretty sure that it's not only an OpenBSD problem.

I would be glad if you could have a look to my patch.

Thanks in advance!

Regards,

Florian
Comment 2 Jeremy Allison 2013-07-23 22:56:33 UTC
cm_connect_lsat() checks if domain->can_do_ncacn_ip_tcp
is set and then tries cm_connect_lsa_tcp(), otherwise it
calls cm_connect_lsa() (the non-TCP version).

domain->can_do_ncacn_ip_tcp is set to true from the
bool domain->active_directory, which is set to true
when we call dcerpc_netr_DsrEnumerateDomainTrusts()
and the domain->domain_type == NETR_TRUST_TYPE_UPLEVEL
on return.

domain->active_directory (and domain->can_do_ncacn_ip_tcp)
are also set to true when dcerpc_lsa_QueryInfoPolicy2()
returns success (from the comments) :

               /* This particular query is exactly what Win2k clients use 
                   to determine that the DC is active directory */

Under what circumstances should we have 'domain->active_directory = true'
but 'domain->can_do_ncacn_ip_tcp = false' ?

Yeah, I know they're two separate variables which means
they have the potential to be different, but I still
want to know *when* does this actually happen.

What are the conditions where you're getting a failed
TCP connection to an AD controller where a named pipe
connection subsequently succeeds ?
Comment 3 jofficer 2013-08-02 14:38:47 UTC
bump on this ticket

wbinfo -t returns valid
wbinfo -u returns valid
wbinfo -g returns valid

wbinfo -n <username> returns 
img-cifs-2 ~ # wbinfo -n <username>
failed to call wbcLookupName: WBC_ERR_DOMAIN_NOT_FOUND
Could not lookup name <username>

I have log level set to 10 with a ton of info - just need to know what you want to see.
Comment 4 Florian Riehm 2013-08-02 15:46:23 UTC
I'm gonna try to figure out under what circumstances we have 'domain->active_directory = true' and 'domain->can_do_ncacn_ip_tcp = false' next week.

@jofficer
Could you please test my patch to verify you are running in the same problem like me? It would be very helpful.
Comment 5 jofficer 2013-08-02 20:48:41 UTC
I can confirm that the patch supplied by Florian has resolved our issue with Samba 3.6.16.  Manually applied the one line patch and ran through the configure/build process.  

Once the new winbindd binary is in place, I can successfully enumerate groups, users, SIDs (user/groups) and domain groups (most of which had previously failed with the unpatched winbindd).  The problem as experienced by our users, unable to enumerate the actual share, has been resolved.

In addition to the problem we were having on our Gentoo Linux build, we are experiencing the same problem on our Solaris hosts.  We have patched and resolved 1 of our several Linux hosts and next will test against Solaris (Solaris 10 atleast).

Under Solaris 10, every month during our patch cycle, the Solaris update for Samba breaks.  We remove patch 119758-27 (Samba 3.6.12 or greater) and leave in patch 119758-24 (Samba 3.6.6)

As a side note, when this line is executing:

status = cm_connect_lsat(domain, mem_ctx, &cli, &lsa_policy);

We see no traffic being blocked or captured by the any of the firewalls along the way.

Florian, hoping this will give you added ammunition to get this patch committed to the mainline source tree.

Cheers,
Joey
Comment 6 Jeremy Allison 2013-08-05 22:09:58 UTC
There's definitely a problem here we need to address, I'm just trying to work out if Florian's patch is the right way to do it (clearly we do need a fallback here, but I'd still like to understand *why* :-).

Jeremy.
Comment 7 Florian Riehm 2013-08-06 13:48:10 UTC
Hi,

I'm glad to be not the only one with that problem. I was afraid it would be a configuration issue by me, but after Joey's tests I'm pretty sure it's not.

I'm still debugging the problem to find out why we run into the fallback.

If samba says domain->active_directory = true, it's correct, because I'm using
a Windows Server 2008 AD. But it seems like my AD can't "can_do_ncacn_ip_tcp".

Is there any reason (rfc?) to assume that all active directories can do ncacn_ip_tcp?

Is the following assumption correct:
The opposite/fallback of NCACN_IP_TCP is NCACN_NP, which means the rpc stuff doesn't get directly packed into tcp packets. If NCACN_NP is enable the rpc stuff gets encapsulated into smb packets.
Short:
NCACN_IP_TCP means:
[IP [ TCP [ RPC [...] ] ] ]
NCACN_NP means:
[IP [ TCP [ NetBIOS [ SMB [ RPC [...] ] ] ] ] ]


Should I see something special with wireshark if cm_connect_lsa_tcp(...) fails?

Do you have any suggestions how to debug why cm_connect_lsa_tcp(...) fails?

If I can help you with information or something else just let me know!

Regards,

Florian
Comment 8 dant 2013-08-07 10:07:38 UTC
HI

I have the same issue, the difference is that I'm  using samba against windows 2000. I also filed a bug, 9165.

I believe that one of the circumstances when we have 'domain->active_directory = true' but 'domain->can_do_ncacn_ip_tcp = false' is when the interface UUID 12345778-1234-abcd-ef00-0123456789ab (lsarpc over TCP) is not registered as RPC endpoint through TCP port 135. If that fails I think winbind should try ncacn_np.

Here are 2 frames from wireshark capture on the domain controller showing the RPC request and RPC response. The error code "0x16c9a0d6" seems to indicate "Not registered in endpoint map".
IP addresses used are: 10.20.1.5 is the Linux server and 10.20.1.2 is the domain controller.


Frame 62 (222 bytes on wire, 222 bytes captured)
Ethernet II, Src: Vmware_62:ed:89 (00:0c:29:62:ed:89), Dst: Vmware_79:af:f6 (00:0c:29:79:af:f6)
Internet Protocol, Src: 10.20.1.5 (10.20.1.5), Dst: 10.20.1.2 (10.20.1.2)
Transmission Control Protocol, Src Port: 36772 (36772), Dst Port: epmap (135), Seq: 73, Ack: 61, Len: 156
DCE RPC Request, Fragment: Single, FragLen: 156, Call: 13 Ctx: 0, [Resp: #63]
DCE/RPC Endpoint Mapper, Map
    Operation: Map (3)
    [Response in frame: 63]
    UUID pointer:
        Referent ID: 0x00000001
        UUID: 12345778-1234-abcd-ef00-0123456789ab
    Tower pointer:
        Referent ID: 0x00000002
        Length: 75
        Length: 75
        Number of floors: 5
        Floor 1  UUID: LSARPC
        Floor 2  UUID: Version 1.1 network data representation protocol
        Floor 3  RPC connection-oriented protocol
        Floor 4  TCP Port:0
        Floor 5  IP:0.0.0.0
    Handle: 0000000000000000000000000000000000000000
    Max Towers: 1


No.     Time        Source                Destination           Protocol Info
     63 0.209635    10.20.1.2             10.20.1.5             EPM      Map response

Frame 63 (130 bytes on wire, 130 bytes captured)
Ethernet II, Src: Vmware_79:af:f6 (00:0c:29:79:af:f6), Dst: Vmware_62:ed:89 (00:0c:29:62:ed:89)
Internet Protocol, Src: 10.20.1.2 (10.20.1.2), Dst: 10.20.1.5 (10.20.1.5)
Transmission Control Protocol, Src Port: epmap (135), Dst Port: 36772 (36772), Seq: 61, Ack: 229, Len: 64
DCE RPC Response, Fragment: Single, FragLen: 64, Call: 13 Ctx: 0, [Req: #62]
DCE/RPC Endpoint Mapper, Map
    Operation: Map (3)
    [Request in frame: 62]
    Handle: 000000006BE6A3DEE1D77E4FA4D869ABBB045BC2
    Num Towers: 0
    Tower array:
        Max Count: 1
        Offset: 0
        Actual Count: 0
    Return code: 0x16c9a0d6


dan
Comment 9 Christof Schmitt 2013-08-08 22:50:45 UTC
There is a related issue when Samba is a member server in one domain,
and it tries to lookup the group members from a trusted domain:

[2013/08/09 00:18:28.131996,  1] ../librpc/ndr/ndr.c:284(ndr_print_function_debug)
       wbint_LookupGroupMembers: struct wbint_LookupGroupMembers
          in: struct wbint_LookupGroupMembers
              sid                      : *
                  sid                      : S-1-5-21-2185640139-527761023-1283802157-572
              type                     : SID_NAME_ALIAS (4)

The LDAP search against the trusted domain works:

[2013/08/09 00:18:28.133229,  5] libads/ldap_utils.c:80(ads_do_search_retry_internal)
  Search for (objectSid=\01\05\00\00\00\00\00\05\15\00\00\00\CB8F\82\7F\FEt\1F-D\85L<\02\00\00) in <dc=VIRTUAL2,dc=COM> gave 1 replies

But querying the users fails:

[2013/08/09 00:18:28.134608, 10] winbindd/winbindd_ads.c:1167(lookup_groupmem)
  ads: lookup_groupmem: 0 sids found in cache, 8 left for lsa_lookupsids
[2013/08/09 00:18:28.134646, 10] winbindd/winbindd_cm.c:2400(cm_connect_lsa_tcp)
  cm_connect_lsa_tcp
[2013/08/09 00:18:28.134677,  1] winbindd/winbindd_ads.c:1185(lookup_groupmem)
  lsa_lookupsids call failed with NT_STATUS_INTERNAL_ERROR - retrying...
[2013/08/09 00:18:28.134705, 10] winbindd/winbindd_cm.c:2400(cm_connect_lsa_tcp)
  cm_connect_lsa_tcp
[2013/08/09 00:18:28.134730, 10] winbindd/winbindd_ads.c:1229(lookup_groupmem)
  lookup_groupmem: Error looking up 8 sids via rpc_lsa_lookup_sids: NT_STATUS_INTERNAL_ERROR
[2013/08/09 00:18:28.134758, 10] winbindd/winbindd_cache.c:540(refresh_sequence_number)
  refresh_sequence_number: VIRTUAL2 time ok
[2013/08/09 00:18:28.134806, 10] winbindd/winbindd_cache.c:585(refresh_sequence_number)
  refresh_sequence_number: VIRTUAL2 seq number is now 12554
[2013/08/09 00:18:28.134832,  1] ../librpc/ndr/ndr.c:284(ndr_print_function_debug)
       wbint_LookupGroupMembers: struct wbint_LookupGroupMembers
          out: struct wbint_LookupGroupMembers
              members                  : *
                  members: struct wbint_Principals
                      num_principals           : 0
                      principals: ARRAY(0)
              result                   : NT_STATUS_INTERNAL_ERROR

This patch, similar to the one posted before makes the problem
disappear, but it is probably more of a hack than a final solution:

--- winbindd/winbindd_msrpc.c.save	2013-08-09 00:42:03.682817000 +0200
+++ winbindd/winbindd_msrpc.c	2013-08-09 00:42:24.863271692 +0200
@@ -1094,6 +1094,7 @@
  connect:
 	status = cm_connect_lsat(domain, mem_ctx, &cli, &lsa_policy);
 	if (!NT_STATUS_IS_OK(status)) {
+		domain->can_do_ncacn_ip_tcp = false;
 		return status;
 	}
Comment 10 Stefan Metzmacher 2013-08-09 10:15:54 UTC
> --- winbindd/winbindd_msrpc.c.save    2013-08-09 00:42:03.682817000 +0200
> +++ winbindd/winbindd_msrpc.c    2013-08-09 00:42:24.863271692 +0200
> @@ -1094,6 +1094,7 @@
>   connect:
>      status = cm_connect_lsat(domain, mem_ctx, &cli, &lsa_policy);
>      if (!NT_STATUS_IS_OK(status)) {
> +        domain->can_do_ncacn_ip_tcp = false;
>          return status;
>      }

I think this should be done within cm_connect_lsat()
and we should fallback ti cm_connect_lsa if cm_connect_lsa_tcp fails.
Comment 11 Guenther Deschner 2013-08-09 12:05:23 UTC
same cm_connect_lsa_tcp failure as in #9615
Comment 12 Jeremy Allison 2013-08-13 17:43:54 UTC
From metze on samba-technical.

----------------------------------------------------
> What I don't understand is why the TCP connection fails, but
> the RPC over NBT works.

for tcp we require schannel which only works against a direct trust.

----------------------------------------------------

which explains the discrepancy. Ok, let's get this fix into 3.6.next, 4.0.next and the 4.1.0 release.

Jeremy.
Comment 13 Jeremy Allison 2013-08-13 18:08:42 UTC
*** Bug 9615 has been marked as a duplicate of this bug. ***
Comment 14 Jeremy Allison 2013-08-14 00:21:17 UTC
Created attachment 9132 [details]
git-am fix for 4.1.0, 4.0.next, 3.6.next

Patch that went into master that applies cleanly to 4.1.0, 4.0.next and 3.6.next.

Jeremy.
Comment 15 Jeremy Allison 2013-08-14 00:22:02 UTC
Re-assigning to Karolin for inclusion in 4.1.0, 4.0.next, 3.6.next.

Jeremy.
Comment 16 Karolin Seeger 2013-08-20 09:07:27 UTC
(In reply to comment #15)
> Re-assigning to Karolin for inclusion in 4.1.0, 4.0.next, 3.6.next.
> 
> Jeremy.

Pushed to autobuild-v4-1-test, autobuild-v4-0-test and v3-6-test.
Comment 17 Karolin Seeger 2013-08-21 06:48:27 UTC
Pushed to v4-1-test and v4-0-test.
Closing out bug report.

Thanks!