Created attachment 8906 [details] Patch: Set can_do_ncacn_ip_tcp to FALSE after cm_connect_lsat has failed Hi, I'm using samba under OpenBSD for ntlm authentication with an windows server 2008 active directory. During the upgrade of OpenBSD 5.2 to OpenBSD 5.3 samba got upgraded from samba-3.6.6 to samba-3.6.12. Now winbindd doesn't work anymore as before: Samba 3.6.6: # wbinfo -n inetuser1 S-1-5-21-1262556113-2025608247-2761087495-1108 SID_USER (1) Samba 3.6.12: # wbinfo -n inetuser1 failed to call wbcLookupName: WBC_ERR_DOMAIN_NOT_FOUND Could not lookup name inetuser1 The error reason is cm_connect_lsat() in winbindd_lookup_names() returns NT_STATUS_CANT_ACCESS_DOMAIN_INFO. The error occurs since commit c64473ab88ca36462e7976bf0006bc092386894c (Bug 9439 - ncacn_ip_tcp reconnection code for lsa lookups still broken) in samba-3.6.10. In the suspicious commit the winbindd behavior changes. Before the commit, after a failed cm_connect_lsa_tcp() domain->can_do_ncacn_ip_tcp gets set to false and cm_connect_lsat() was called to try lookup again. Now after a failed cm_connect_lsat it just returns without changing domain->can_do_ncacn_ip_tcp. The attached diff fixes the problem for me, but I am not sure if it's the right place for the fix. Please have a look to the diff an think about it. Attached you find my smb.conf. I'm also not sure why I run into the NT_STATUS_CANT_ACCESS_DOMAIN_INFO error. Does it work as intended or do I have to fix my smb.conf? Thanks in advance! Regards, Florian. # cat smb.conf [global] netbios name = TST1 server string = tst1 workgroup = TEST security = DOMAIN encrypt passwords = yes password server = 192.168.1.1 preferred master = no local master = no domain master = no dns proxy = no ldap ssl = no winbind separator = + ;winbind uid = 10000-20000 ;winbind gid = 10000-20000 idmap config * : range = 10000-20000 client schannel = no server schannel = no winbind use default domain = yes winbind cache time = 10 winbind enum users = yes log file = /var/log/%m.log log level = 5 client ntlmv2 auth = yes
Hi, I have tested my problem now with the most recent samba version. The bug still occurs. I'm pretty sure that it's not only an OpenBSD problem. I would be glad if you could have a look to my patch. Thanks in advance! Regards, Florian
cm_connect_lsat() checks if domain->can_do_ncacn_ip_tcp is set and then tries cm_connect_lsa_tcp(), otherwise it calls cm_connect_lsa() (the non-TCP version). domain->can_do_ncacn_ip_tcp is set to true from the bool domain->active_directory, which is set to true when we call dcerpc_netr_DsrEnumerateDomainTrusts() and the domain->domain_type == NETR_TRUST_TYPE_UPLEVEL on return. domain->active_directory (and domain->can_do_ncacn_ip_tcp) are also set to true when dcerpc_lsa_QueryInfoPolicy2() returns success (from the comments) : /* This particular query is exactly what Win2k clients use to determine that the DC is active directory */ Under what circumstances should we have 'domain->active_directory = true' but 'domain->can_do_ncacn_ip_tcp = false' ? Yeah, I know they're two separate variables which means they have the potential to be different, but I still want to know *when* does this actually happen. What are the conditions where you're getting a failed TCP connection to an AD controller where a named pipe connection subsequently succeeds ?
bump on this ticket wbinfo -t returns valid wbinfo -u returns valid wbinfo -g returns valid wbinfo -n <username> returns img-cifs-2 ~ # wbinfo -n <username> failed to call wbcLookupName: WBC_ERR_DOMAIN_NOT_FOUND Could not lookup name <username> I have log level set to 10 with a ton of info - just need to know what you want to see.
I'm gonna try to figure out under what circumstances we have 'domain->active_directory = true' and 'domain->can_do_ncacn_ip_tcp = false' next week. @jofficer Could you please test my patch to verify you are running in the same problem like me? It would be very helpful.
I can confirm that the patch supplied by Florian has resolved our issue with Samba 3.6.16. Manually applied the one line patch and ran through the configure/build process. Once the new winbindd binary is in place, I can successfully enumerate groups, users, SIDs (user/groups) and domain groups (most of which had previously failed with the unpatched winbindd). The problem as experienced by our users, unable to enumerate the actual share, has been resolved. In addition to the problem we were having on our Gentoo Linux build, we are experiencing the same problem on our Solaris hosts. We have patched and resolved 1 of our several Linux hosts and next will test against Solaris (Solaris 10 atleast). Under Solaris 10, every month during our patch cycle, the Solaris update for Samba breaks. We remove patch 119758-27 (Samba 3.6.12 or greater) and leave in patch 119758-24 (Samba 3.6.6) As a side note, when this line is executing: status = cm_connect_lsat(domain, mem_ctx, &cli, &lsa_policy); We see no traffic being blocked or captured by the any of the firewalls along the way. Florian, hoping this will give you added ammunition to get this patch committed to the mainline source tree. Cheers, Joey
There's definitely a problem here we need to address, I'm just trying to work out if Florian's patch is the right way to do it (clearly we do need a fallback here, but I'd still like to understand *why* :-). Jeremy.
Hi, I'm glad to be not the only one with that problem. I was afraid it would be a configuration issue by me, but after Joey's tests I'm pretty sure it's not. I'm still debugging the problem to find out why we run into the fallback. If samba says domain->active_directory = true, it's correct, because I'm using a Windows Server 2008 AD. But it seems like my AD can't "can_do_ncacn_ip_tcp". Is there any reason (rfc?) to assume that all active directories can do ncacn_ip_tcp? Is the following assumption correct: The opposite/fallback of NCACN_IP_TCP is NCACN_NP, which means the rpc stuff doesn't get directly packed into tcp packets. If NCACN_NP is enable the rpc stuff gets encapsulated into smb packets. Short: NCACN_IP_TCP means: [IP [ TCP [ RPC [...] ] ] ] NCACN_NP means: [IP [ TCP [ NetBIOS [ SMB [ RPC [...] ] ] ] ] ] Should I see something special with wireshark if cm_connect_lsa_tcp(...) fails? Do you have any suggestions how to debug why cm_connect_lsa_tcp(...) fails? If I can help you with information or something else just let me know! Regards, Florian
HI I have the same issue, the difference is that I'm using samba against windows 2000. I also filed a bug, 9165. I believe that one of the circumstances when we have 'domain->active_directory = true' but 'domain->can_do_ncacn_ip_tcp = false' is when the interface UUID 12345778-1234-abcd-ef00-0123456789ab (lsarpc over TCP) is not registered as RPC endpoint through TCP port 135. If that fails I think winbind should try ncacn_np. Here are 2 frames from wireshark capture on the domain controller showing the RPC request and RPC response. The error code "0x16c9a0d6" seems to indicate "Not registered in endpoint map". IP addresses used are: 10.20.1.5 is the Linux server and 10.20.1.2 is the domain controller. Frame 62 (222 bytes on wire, 222 bytes captured) Ethernet II, Src: Vmware_62:ed:89 (00:0c:29:62:ed:89), Dst: Vmware_79:af:f6 (00:0c:29:79:af:f6) Internet Protocol, Src: 10.20.1.5 (10.20.1.5), Dst: 10.20.1.2 (10.20.1.2) Transmission Control Protocol, Src Port: 36772 (36772), Dst Port: epmap (135), Seq: 73, Ack: 61, Len: 156 DCE RPC Request, Fragment: Single, FragLen: 156, Call: 13 Ctx: 0, [Resp: #63] DCE/RPC Endpoint Mapper, Map Operation: Map (3) [Response in frame: 63] UUID pointer: Referent ID: 0x00000001 UUID: 12345778-1234-abcd-ef00-0123456789ab Tower pointer: Referent ID: 0x00000002 Length: 75 Length: 75 Number of floors: 5 Floor 1 UUID: LSARPC Floor 2 UUID: Version 1.1 network data representation protocol Floor 3 RPC connection-oriented protocol Floor 4 TCP Port:0 Floor 5 IP:0.0.0.0 Handle: 0000000000000000000000000000000000000000 Max Towers: 1 No. Time Source Destination Protocol Info 63 0.209635 10.20.1.2 10.20.1.5 EPM Map response Frame 63 (130 bytes on wire, 130 bytes captured) Ethernet II, Src: Vmware_79:af:f6 (00:0c:29:79:af:f6), Dst: Vmware_62:ed:89 (00:0c:29:62:ed:89) Internet Protocol, Src: 10.20.1.2 (10.20.1.2), Dst: 10.20.1.5 (10.20.1.5) Transmission Control Protocol, Src Port: epmap (135), Dst Port: 36772 (36772), Seq: 61, Ack: 229, Len: 64 DCE RPC Response, Fragment: Single, FragLen: 64, Call: 13 Ctx: 0, [Req: #62] DCE/RPC Endpoint Mapper, Map Operation: Map (3) [Request in frame: 62] Handle: 000000006BE6A3DEE1D77E4FA4D869ABBB045BC2 Num Towers: 0 Tower array: Max Count: 1 Offset: 0 Actual Count: 0 Return code: 0x16c9a0d6 dan
There is a related issue when Samba is a member server in one domain, and it tries to lookup the group members from a trusted domain: [2013/08/09 00:18:28.131996, 1] ../librpc/ndr/ndr.c:284(ndr_print_function_debug) wbint_LookupGroupMembers: struct wbint_LookupGroupMembers in: struct wbint_LookupGroupMembers sid : * sid : S-1-5-21-2185640139-527761023-1283802157-572 type : SID_NAME_ALIAS (4) The LDAP search against the trusted domain works: [2013/08/09 00:18:28.133229, 5] libads/ldap_utils.c:80(ads_do_search_retry_internal) Search for (objectSid=\01\05\00\00\00\00\00\05\15\00\00\00\CB8F\82\7F\FEt\1F-D\85L<\02\00\00) in <dc=VIRTUAL2,dc=COM> gave 1 replies But querying the users fails: [2013/08/09 00:18:28.134608, 10] winbindd/winbindd_ads.c:1167(lookup_groupmem) ads: lookup_groupmem: 0 sids found in cache, 8 left for lsa_lookupsids [2013/08/09 00:18:28.134646, 10] winbindd/winbindd_cm.c:2400(cm_connect_lsa_tcp) cm_connect_lsa_tcp [2013/08/09 00:18:28.134677, 1] winbindd/winbindd_ads.c:1185(lookup_groupmem) lsa_lookupsids call failed with NT_STATUS_INTERNAL_ERROR - retrying... [2013/08/09 00:18:28.134705, 10] winbindd/winbindd_cm.c:2400(cm_connect_lsa_tcp) cm_connect_lsa_tcp [2013/08/09 00:18:28.134730, 10] winbindd/winbindd_ads.c:1229(lookup_groupmem) lookup_groupmem: Error looking up 8 sids via rpc_lsa_lookup_sids: NT_STATUS_INTERNAL_ERROR [2013/08/09 00:18:28.134758, 10] winbindd/winbindd_cache.c:540(refresh_sequence_number) refresh_sequence_number: VIRTUAL2 time ok [2013/08/09 00:18:28.134806, 10] winbindd/winbindd_cache.c:585(refresh_sequence_number) refresh_sequence_number: VIRTUAL2 seq number is now 12554 [2013/08/09 00:18:28.134832, 1] ../librpc/ndr/ndr.c:284(ndr_print_function_debug) wbint_LookupGroupMembers: struct wbint_LookupGroupMembers out: struct wbint_LookupGroupMembers members : * members: struct wbint_Principals num_principals : 0 principals: ARRAY(0) result : NT_STATUS_INTERNAL_ERROR This patch, similar to the one posted before makes the problem disappear, but it is probably more of a hack than a final solution: --- winbindd/winbindd_msrpc.c.save 2013-08-09 00:42:03.682817000 +0200 +++ winbindd/winbindd_msrpc.c 2013-08-09 00:42:24.863271692 +0200 @@ -1094,6 +1094,7 @@ connect: status = cm_connect_lsat(domain, mem_ctx, &cli, &lsa_policy); if (!NT_STATUS_IS_OK(status)) { + domain->can_do_ncacn_ip_tcp = false; return status; }
> --- winbindd/winbindd_msrpc.c.save 2013-08-09 00:42:03.682817000 +0200 > +++ winbindd/winbindd_msrpc.c 2013-08-09 00:42:24.863271692 +0200 > @@ -1094,6 +1094,7 @@ > connect: > status = cm_connect_lsat(domain, mem_ctx, &cli, &lsa_policy); > if (!NT_STATUS_IS_OK(status)) { > + domain->can_do_ncacn_ip_tcp = false; > return status; > } I think this should be done within cm_connect_lsat() and we should fallback ti cm_connect_lsa if cm_connect_lsa_tcp fails.
same cm_connect_lsa_tcp failure as in #9615
From metze on samba-technical. ---------------------------------------------------- > What I don't understand is why the TCP connection fails, but > the RPC over NBT works. for tcp we require schannel which only works against a direct trust. ---------------------------------------------------- which explains the discrepancy. Ok, let's get this fix into 3.6.next, 4.0.next and the 4.1.0 release. Jeremy.
*** Bug 9615 has been marked as a duplicate of this bug. ***
Created attachment 9132 [details] git-am fix for 4.1.0, 4.0.next, 3.6.next Patch that went into master that applies cleanly to 4.1.0, 4.0.next and 3.6.next. Jeremy.
Re-assigning to Karolin for inclusion in 4.1.0, 4.0.next, 3.6.next. Jeremy.
(In reply to comment #15) > Re-assigning to Karolin for inclusion in 4.1.0, 4.0.next, 3.6.next. > > Jeremy. Pushed to autobuild-v4-1-test, autobuild-v4-0-test and v3-6-test.
Pushed to v4-1-test and v4-0-test. Closing out bug report. Thanks!