Created attachment 12323 [details] child winbindd log. Hi, We are currently with 4.3.5 stack. Recently, we have started unit testing the 4.3.11 changes as we wanted to include the BAD LOCK and other security fixes. With 4.3.11 we are seeing winbindd issues. Logs indicating that LDAP queries to AD server getting timed out. Eventually all the winbindd requests which requires a lookup fails. Following the log messages are seen frequently for domain child. [2016/08/07 22:50:11.390503, 3, pid=10391, effective(0, 0), real(0, 0)] ../lib/krb5_wrap/krb5_samba.c:2502(kerberos_get_principal_from_service_hostname) kerberos_get_principal_from_service_hostname: cannot get realm from, desthost drtnasdcprod01.testdom1.test.exp.domain.com or default ccache. Using default smb.conf realm testdom1.test.exp.domain.com [2016/08/07 22:50:11.324867, 3, pid=10391, effective(0, 0), real(0, 0)] ../source3/libads/ldap.c:981(ads_do_paged_search_args) ads_do_paged_search_args: ldap_search_with_timeout((objectclass=*)) -> Time limit exceeded [2016/08/07 22:50:11.324938, 1, pid=10391, effective(0, 0), real(0, 0)] ../source3/libads/ldap_utils.c:91(ads_do_search_retry_internal) Reducing LDAP page size from 1000 to 500 due to IO_TIMEOUT These suggest that LDAP queries made to DC are actually getting timedout. Also as a side effect of this, we could see lot of(seen max of 32 total) child winbindd processes getting forked even though there are no trusted domains. By attaching with gdb shows that this child domain process is in retry loop. #0 0x00007f5df9ec1248 in poll () from /lib64/libc.so.6 #1 0x00007f5dfa186b1d in ?? () from /lib64/libldap-2.4.so.2 #2 0x00007f5dfa189e3f in ldap_pvt_search_s () from /lib64/libldap-2.4.so.2 #3 0x00007f5dfa189ed0 in ldap_search_ext_s () from /lib64/libldap-2.4.so.2 #4 0x00007f5e01a1428a in ldap_search_with_timeout (ld=0x563dde44e2b0, base=0x563dde47f0e0 "", scope=0, filter=0x563dde47fba0 "(objectclass=*)", attrs=0x563dde473e10, attrsonly=0, sctrls=0x7fff254dd590, cctrls=0x0, sizelimit=0, res=0x7fff254dd908) at ../source3/libads/ldap.c:155 #5 0x00007f5e01a169c1 in ads_do_paged_search_args (ads=0x563dde480230, bind_path=0x563dde480490 "", scope=0, expr=0x7f5e01a343ce "(objectclass=*)", attrs=0x7fff254dd8d0, args=0x0, res=0x7fff254dd908, count=0x7fff254dd75c, cookie=0x7fff254dd750) at ../source3/libads/ldap.c:971 #6 0x00007f5e01a16e3d in ads_do_search_all_args (ads=0x563dde480230, bind_path=0x563dde480490 "", scope=0, expr=0x7f5e01a343ce "(objectclass=*)", attrs=0x7fff254dd8d0, args=0x0, res=0x7fff254dd908) at ../source3/libads/ldap.c:1075 #7 0x00007f5e01a2a96e in ads_do_search_retry_internal (ads=0x563dde480230, bind_path=0x7f5e01a33b2e "", scope=0, expr=0x7f5e01a343ce "(objectclass=*)", attrs=0x7fff254dd8d0, args=0x0, res=0x7fff254dd908) at ../source3/libads/ldap_utils.c:77 #8 0x00007f5e01a2af5b in ads_do_search_retry (ads=0x563dde480230, bind_path=0x7f5e01a33b2e "", scope=0, expr=0x7f5e01a343ce "(objectclass=*)", attrs=0x7fff254dd8d0, res=0x7fff254dd908) at ../source3/libads/ldap_utils.c:144 #9 0x00007f5e01a1ba86 in ads_USN (ads=0x563dde480230, usn=0x563dde463e90) at ../source3/libads/ldap.c:2884 #10 0x0000563ddc3b7d32 in sequence_number (domain=0x563dde463d60, seq=0x563dde463e90) at ../source3/winbindd/winbindd_ads.c:1506 #11 0x0000563ddc3b27db in sequence_number (domain=0x563dde463d60, seq=0x563dde463e90) at ../source3/winbindd/winbindd_reconnect_ads.c:248 #12 0x0000563ddc389396 in refresh_sequence_number (domain=0x563dde463d60, force=false) at ../source3/winbindd/winbindd_cache.c:560 #13 0x0000563ddc389a5b in wcache_fetch (cache=0x563dde449e90, domain=0x563dde463d60, format=0x563ddc43ac71 "NS/%s/%s") at ../source3/winbindd/winbindd_cache.c:713 #14 0x0000563ddc38d699 in wcache_name_to_sid (domain=0x563dde463d60, domain_name=0x563dde454fc0 "TESTDOM1", name=0x563dde470ea0 "ROOT", sid=0x563dde44afe0, type=0x563dde44af70) at ../source3/winbindd/winbindd_cache.c:1822 #15 0x0000563ddc38d7fc in name_to_sid (domain=0x563dde463d60, mem_ctx=0x563dde462780, domain_name=0x563dde454fc0 "TESTDOM1", name=0x563dde470ea0 "ROOT", flags=8, sid=0x563dde44afe0, type=0x563dde44af70) at ../source3/winbindd/winbindd_cache.c:1855 #16 0x0000563ddc3c0b5b in _wbint_LookupName (p=0x7fff254ddd20, r=0x563dde470e00) at ../source3/winbindd/winbindd_dual_srv.c:114 #17 0x0000563ddc429257 in api_wbint_LookupName (p=0x7fff254ddd20) at default/librpc/gen_ndr/srv_winbind.c:316 #18 0x0000563ddc3c05b8 in winbindd_dual_ndrcmd (domain=0x563dde463d60, state=0x7fff254de078) at ../source3/winbindd/winbindd_dual_ndr.c:322 #19 0x0000563ddc3bc0d6 in child_process_request (child=0x563dde463fb0, state=0x7fff254de078) at ../source3/winbindd/winbindd_dual.c:518 Looking at the history of changes, we suspect that the behavior changes are due to fixes made for CVE-2016-2112. Looks like now we have strong restrictions for sasl bind requests. May be I am guessing here. Not really sure whats going wrong here. But if we replace with 4.3.5 stack on same node, everything works fine. Will attach complete logs for child winbindd process performing these lookups.
Actually some one else posted similar issue at: https://access.redhat.com/solutions/2290811
This sounds like: https://bugzilla.samba.org/show_bug.cgi?id=11852 or https://bugzilla.samba.org/show_bug.cgi?id=11872
(In reply to Andreas Schneider from comment #2) ...But both those bugs are fixed now... Also the timeout is with ads_USN() which is a small query. It looks like the ldap bind is successful, yet the rather simple query times out. In general, Samba-4-3-11 successfully makes ldap queries to Windows DCs. So this must be specific to configuration, to the DC type or setup, or to the merging (since admittedly this is not "vanilla" Samba). One thing I suddenly noticed is that the ldap communications are signed but not sealed (not encrypted). The default conf is "signed", but in pre-security-patch samba, "signed" still meant "sealed", and now it seems to follow the conf... You can try setting client ldap sasl wrapping = seal and see if it fixes the issue. Another behavioral change I'm aware of is that the new code, when using GSSAPI to create a secure session, requests delegation (which effectively translates to an additional TGS request for another ticket and inclusion of that ticket inside the sasl bind request). See this thread https://lists.samba.org/archive/samba-technical/2016-July/114907.html for discussion and the place in the code that requests it (look for GSS_C_DELEG_FLAG).
Comment on attachment 12323 [details] child winbindd log. rtf format Hemanth - really ? Can get we it in plain text please :-).
Can you also get us a wireshark trace of all traffic between winbindd and the specific AD-LDAP server ? That should help a lot in tracking this down.
Created attachment 12326 [details] winbindd log in plain text. Attaching the log in plain text.
(In reply to Jeremy Allison from comment #6) Sure Jeremy. I will capture one and attach.
Created attachment 12327 [details] Packet capture. Attaching the packet capture showing the communication with DC.
(In reply to Uri Simchoni from comment #3) Thanks a lot Uri! Infact we had "client ldap sasl wrapping" set to plain. This has overwritten the new default option(sign). I have removed this explicit config to make "sign" as current setting. So far things are working fine. I will monitor for some more time and update the status.
OK. Setting "client ldap sasl wrapping" from "plain" to "sign" seems to be fixing the issue. No hangs so far. Uri, Where can I check if my ldap requests are actually signed. As mentioned earlier, we had the sasl wrapping is set to "plain". So it shouldn't have signed the packets. Capture shows the ldap query that is getting timed out shows as: Lightweight Directory Access Protocol LDAPMessage searchRequest(5) "<ROOT>" baseObject messageID: 5 protocolOp: searchRequest (3) searchRequest baseObject: scope: baseObject (0) derefAliases: neverDerefAliases (0) sizeLimit: 0 timeLimit: 15 typesOnly: False Filter: (objectclass=*) filter: present (7) present: objectclass attributes: 1 item AttributeDescription: highestCommittedUSN controls: 2 items Control controlType: 1.2.840.113556.1.4.1339 (LDAP_SERVER_DOMAIN_SCOPE_OID) controlValue: <MISSING> Control controlType: 1.2.840.113556.1.4.319 (pagedResultsControl) criticality: True SearchControlValue size: 500 cookie: <MISSING> I couldn't find any sign/seal option in this. Does this missing cookies and control value fields were the concern? Would like to know the final recommendation for this. Should I use the default option("sign") or "seal"?
(In reply to Hemanth from comment #10) AFAIK, the client default has been "sign" since 4.2.0, and the security release hardens the server requirements (look at the CVE and the new "ldap server require strong auth" parameter). Was this setup working against a Samba AD?
(In reply to Hemanth from comment #11) When the ldap message is signed or sealed, it is transmitted on top of the GSSAPI wrapping protocol. A signed message looks like this: Lightweight Directory Access Protocol SASL Buffer Length: 114 SASL Buffer GSS-API Generic Security Service Application Program Interface krb5_blob: 050405ff000c000c000000004edf574dd5d541bee89ec04c... krb5_tok_id: KRB_TOKEN_CFX_WRAP (0x0405) krb5_cfx_flags: 0x05 .... .1.. = AcceptorSubkey: Set .... ..0. = Sealed: Not set .... ...1 = SendByAcceptor: Set krb5_filler: ff krb5_cfx_ec: 12 krb5_cfx_rrc: 12 krb5_cfx_seq: 1323259725 krb5_sgn_cksum: d5d541bee89ec04c99593f00 GSS-API payload (86 bytes) LDAPMessage searchResEntry(5) "<ROOT>" [3 results] messageID: 5 protocolOp: searchResEntry (4) searchResEntry objectName: attributes: 1 item PartialAttributeList item highestCommittedUSN Note the presence of the GSSAPI stuff and that the "seal" bit is unset, and the message can be read. If the ldap is sealed, it will look like this: Lightweight Directory Access Protocol SASL Buffer Length: 1798 SASL Buffer GSS-API Generic Security Service Application Program Interface krb5_blob: 050407ff0000001c0000000079a39dda3658ebc5171bc78d... krb5_tok_id: KRB_TOKEN_CFX_WRAP (0x0405) krb5_cfx_flags: 0x07 .... .1.. = AcceptorSubkey: Set .... ..1. = Sealed: Set .... ...1 = SendByAcceptor: Set krb5_filler: ff krb5_cfx_ec: 0 krb5_cfx_rrc: 28 krb5_cfx_seq: 2040765914 krb5_sgn_cksum: 3658ebc5171bc78d4b4930f24b678b3ce572a7cb016a2920... GSS-API payload (60 bytes) LDAPMessage BER Error: Sequence expected but class:UNIVERSAL(0) primitive tag:5 was unexpected [Expert Info (Warn/Malformed): BER Error: Sequence expected] [BER Error: Sequence expected] [Severity level: Warn] [Group: Malformed] Note that the "sealed" bit is set and that the message is encrypted and cannot be parsed. If you use a keytab (see https://wiki.samba.org/index.php/Keytab_Extraction) you will be able to decrypt it. I don't know of a recommendation of whether to use "sign" or "seal". It's a matter of the security policy of the AD server, I guess. Encrypted packets add a headache of having to decrypt them when you're analyzing them, but with wireshark that's not very difficult.
(In reply to Uri Simchoni from comment #12) For some historical reasons this setting has been changed to plain on our stack. We haven't tested against samba AD. Yeah. I did see that strong server side requirements for ldap but ignored them as we are not using samba AD server. And thanks for shedding light of ldap sign and seal options.