Bug 10988 - Local private krb5.conf ignored by some winbindd children
Local private krb5.conf ignored by some winbindd children
Status: NEW
Product: Samba 3.6
Classification: Unclassified
Component: Winbind
All All
: P5 normal
: ---
Assigned To: Michael Adam
Samba QA Contact
Depends on:
  Show dependency treegraph
Reported: 2014-12-05 16:57 UTC by Harry Mason
Modified: 2014-12-05 18:01 UTC (History)
1 user (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Harry Mason 2014-12-05 16:57:09 UTC
When winbindd is connecting to netlogon to resolve the DC for a trusted domain, Kerberos settings are sometimes ignored, leading to high latency. My analysis is that cm_open_connection() should be calling create_local_private_krb5_conf_for_domain() but isn't.

In this deployment there are hundreds of DCs and about 30 trusted domains. Most of the DCs in DNS are firewalled from the client, and many others have >200ms latency. Some of the domains have no reachable DC.

Symptoms are contention on the netlogon mutex as the lock is held for several seconds per request. Periodically a child will give up with "cm_prepare_connection: mutex grab failed", which blacklists this DC in the connection cache.

Periodically, presumably Winbind tries to refresh the status of trusted domains, forking a child for each one in fork_child_dc_connect(). That calls get_dcs() to see if any DC is available. Where the domain is not our primary domain, get_dcs() uses get_dc_name_via_netlogon(), which via cm_connect_netlogon() calls cm_open_connection().

cm_open_connection() checks the server affinity cache for a DC. If there is an entry and is_ipaddress(saf_servername) is false, dcip_to_name() is not needed, so create_local_private_krb5_conf_for_domain() is never called either. This means KRB5_CONFIG is never set in this child, and Kerberos defaults to /etc/krb5.conf. In my case this file is blank, so dns_lookup_kdc = true is implied.

Once connected to the DC, cm_prepare_connection() grabs the mutex and with it held calls cli_session_setup_spnego(). In this deployment where many of the KDCs in DNS time out, this call is slow; the mutex is contended and makes other winbind processes time out and switch DCs.

Samba 3.6.11
MIT Kerberos 1.12.1