We compare sAMAccountName as a case insensitive string. Windows folds a whole lot of characters together, seemingly stripping away diacritical marks until it finds ASCII. For example, 'björn', 'bjorn', 'bjǫrṅ', and 'b̨́̄̐j᪲ǫ̈́Ⓡn̨̈́' are the same person (but not 'bjoern', that's someone else entirely). This is mentioned in a few places (e.g. https://web.archive.org/web/20140310004414/http://support.microsoft.com/kb/839515) but I can't find it documented insofar as the equivalence rules are spelt out. This is *not* using Unicode NFKD/NFKC canonicalisation, based on tests -- perhaps unsurprising given it dates from the time of Windows codepages. This is significant given we are trying to ensure the cross-uniqueness of sAMAccountName and userPrincipalName (bug 14564). There is essentially no way we can know if we have any names that don't collide but our rules, but would on a Windows DC that joined the network. I have only looked at this at the LDAP level, not via Kerberos.
Along similar lines, the userPrincipalName "x@EXAMPLE.COM" is interpreted as "x@EXAMPLE.COM", as if "@" (U+FF20, FULLWIDTH COMMERCIAL AT) was "@" (ascii @).
this also applies to servicePrincipalName. For example, in Windows: "cifs/example.com" is the same as "cīfs/ëxamplē.cøm" also, "¯\_(ツ)_/¯" is the same as "¯\_(つ)_/¯" and "¯\_(㋡)_/¯", and all are valid SPNs when it comes to adding them over ldap. I am noting this here because I can't see how this is good in mixed networks, with regard to name confusion. Doing what Windows does would be the wrong fix, I think.
Right now MS-KILE says: https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-kile/efe39b86-4dda-43ce-aacf-390fcb2cbd43 3.1.5.7 Internationalization and Case Sensitivity The Kerberos V5 protocol specifies rules for encoding and processing names, both for character set and case ([RFC4120] section 6). Name comparisons, whether for users or domains, MUST NOT be case sensitive in KILE. KILE MUST use UTF-8 encoding of these names [RFC2279]. Normalization MUST NOT be performed and surrogates MUST NOT be supported. Names SHOULD<27> match. Previously, MS-KILE 3.1.5.7 said: Name comparisons, whether for users or domains, MUST NOT be case sensitive in KILE. KILE MUST use UTF-8 encoding of these names [RFC2279]. Normalization MUST NOT be performed and surrogates MUST NOT be supported. To match names, the GetWindowsSortKey algorithm ([MS-UCODEREF] section 3.1.5.2.4) with the following flags NORM_IGNORECASE, NORM_IGNOREKANATYPE, NORM_IGNORENONSPACE, and NORM_IGNOREWIDTH SHOULD be used then the CompareSortKey algorithm ([MS-UCODEREF] section 3.1.5.2.2) SHOULD be used to compare the names.Note that this applies only to names; passwords (and the transformation of a password to a key) are governed by the actual key generation specification ([RFC4120], [RFC4757], and [RFC3962]).
Removing embargo, this was embargoed to avoid drawing attention to the various SPN issues addressed in Nov 2021