An AD member is joined to a domain using the "net ads join" command, and adding the "-k" switch causes all authentication to use Kerberos. The documented way of doing this is prior to running winbindd, hence the winbindd Kerberos locator is not operational at this stage. As a result, the process of finding a KDC is not site-aware, and an off-site KDC can be contacted. The process of finding a DC for creating the machine account (via SMB/ldap) *is* site-aware, so once there's a service ticket to that DC, everything continues in a site-aware manner. At first glance this does not appear to be a significant issue, since joining the domain is a one-time operation. However, the site-unaware operation sometimes prolongs the ticket acquisition up to a point of failing the whole operation. It appears to be customary in some enterprises to block (drop) communication between sites, so while off-site DCs appear in DNS records, they are not reachable. A UDP Kerberos handshake would fail after a few seconds (depends on Kerberos libs), and a TCP handshake would take longer to fail because the typical OS TCP timeout if SYN packets are dropped is ~15 seconds. In one enterprise with 70-80 DC's across multiple sites, it has taken more than two minutes to obtain the service ticket. However, since smbd starts obtaining the service ticket only after it has contacted the (on-site) DC and done SMB2 negotiation, the DC drops the connection after 60 seconds (an established TCP connection past the negotiate phase but no session-setup attempted). This fails the join even if the user is willing to wait the 2 minutes (which he might not be, since this all could be wrapped in a shiny REST API and a GUI). On the other hand, if we make the process site-aware, we first find on-site DC using CLDAP - this could take a few sec because of the firewall, but no SMB connection is open at this stage.
Created attachment 11905 [details] git-am fix for 4.4.0 and 4.3.next
Comment on attachment 11905 [details] git-am fix for 4.4.0 and 4.3.next LGTM
Assigning to Karolin for inclusion in 4.4.0 and 4.3.next
(In reply to Uri Simchoni from comment #3) Pushed to autobuild-v4-[4|3]-test.
(In reply to Karolin Seeger from comment #4) Pushed to both branches. Closing out bug report. Thanks!
(In reply to Karolin Seeger from comment #5) Seems like the fix only made it to 4.4.x branch. This is consistent with the release notes. 4.3.x is now in maintenance so I'm not going to push for a fix there. FWIW the patch still applies cleanly to v4-3-stable at the time of this writing.
(In reply to Uri Simchoni from comment #6) That's strange... Sorry! Pushed to autobuild-v4-3-test.
(In reply to Karolin Seeger from comment #7) Finally ended up in v4-3-test. Closing out bug report. Thanks!