Created attachment 8545 [details] Modifies source4/libcli/resolve/resolve.c to fallback on internal resolver On FreeBSD, getaddrinfo() fails to return results for hostname components that start or end with a non-RFC-compliant underscore character. This breaks lookup of records required for replication, as documented in the 4.0.x releases' known issues. I've submitted a patch upstream to FreeBSD to resolve the issue in libc: http://www.freebsd.org/cgi/query-pr.cgi?pr=176093 In addition, the attached patch works around the issue by falling back on the Samba-internal resolver in the case that a hostname contains an underscore character. This is based on abartlet's original patch (https://lists.samba.org/archive/samba-technical/2012-October/087285.html) for the issue. I've tested both the FreeBSD libc patch, and the attached patch, and confirmed that resolution occurs successfully and two FreeBSD DCs properly synchronize new users.
Created attachment 8558 [details] Standalone test case Attached a standalone test case. This approach be used at compile time (to check whether the host is using the same resolver, and exhibiting this issue), or at runtime (when available) instead of the strchr() test. landonf:~> ./test-res_hnok gc._msdcs.example.org 0 landonf:~> ./test-res_hnok gc.msdcs.example.org 1
Created attachment 8563 [details] Updated (v2) patch I've attached an updated past based on further investigation. I'd based the first patch on the findings of previous mailing list discussion; Timur Bakeyev did some additional research and noted that Linux actually rejects the same hosts (using the same bind-descended resolver code) . I set up a test installation on FreeBSD and Ubuntu to verify. As it turns out, On Linux, non-RFC 1034-complaint names (such as gc._msdsc.example.org) will result in the resolver returning the non-POSIX EAI_NODATA. In that case, the run_child_getaddrinfo() implementation will fall back on the internal resolver, allowing resolution to complete successfully. On FreeBSD, the libc resolver uses the same validation code, but the POSIX result of EAI_FAIL is returned instead of EAI_NODATA. Since there was no case for this error code in run_child_getaddrinfo(), no fallback to the internal resolver would occur. This led to replication failing on FreeBSD. The attached (git format) patch simply: - Adds a case statement for EAI_FAIL. - Extends the comment to document the behavior. I've confirmed that FreeBSD samba4 replication succeeds once the patch is applied.
Created attachment 8564 [details] Updated (v3) patch Fixes a small comment typo (s/RFC1034/non-RFC1034/)
Created attachment 8565 [details] patches cherry-picked from master Patch is in master, so upload it with cherry-pick markers.
Comment on attachment 8565 [details] patches cherry-picked from master Looks ok
Pushed to autobuild-v4-0-test.
Pushed to v4-0-test. Closing out bug report. Thanks!