Bug 14847 - Windows has a different idea of sAMAccountName equality
Summary: Windows has a different idea of sAMAccountName equality
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: AD: LDB/DSDB/SAMDB (show other bugs)
Version: unspecified
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Samba QA Contact
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-09-28 21:56 UTC by Douglas Bagnall
Modified: 2022-06-13 23:14 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Douglas Bagnall 2021-09-28 21:56:23 UTC
We compare sAMAccountName as a case insensitive string.

Windows folds a whole lot of characters together, seemingly stripping away diacritical marks until it finds ASCII. For example, 'björn', 'bjorn', 'bjǫrṅ', and 'b̨́̄̐j᪲ǫ̈́Ⓡn̨̈́' are the same person (but not 'bjoern', that's someone else entirely).

This is mentioned in a few places (e.g. https://web.archive.org/web/20140310004414/http://support.microsoft.com/kb/839515) but I can't find it documented insofar as the equivalence rules are spelt out. 

This is *not* using Unicode NFKD/NFKC canonicalisation, based on tests -- perhaps unsurprising given it dates from the time of Windows codepages.

This is significant given we are trying to ensure the cross-uniqueness of  sAMAccountName and userPrincipalName (bug 14564). There is essentially no way we can know if we have any names that don't collide but our rules, but would on a Windows DC that joined the network.

I have only looked at this at the LDAP level, not via Kerberos.
Comment 1 Douglas Bagnall 2021-09-29 04:29:43 UTC
Along similar lines, the userPrincipalName "x@EXAMPLE.COM" is interpreted as "x@EXAMPLE.COM", as if "@" (U+FF20, FULLWIDTH COMMERCIAL AT) was "@" (ascii @).
Comment 2 Douglas Bagnall 2021-10-19 23:56:11 UTC
this also applies to servicePrincipalName.

For example, in Windows:

"cifs/example.com" is the same as "cīfs/ëxamplē.cøm"

also, "¯\_(ツ)_/¯" is the same as "¯\_(つ)_/¯" and "¯\_(㋡)_/¯", and all are valid SPNs when it comes to adding them over ldap.


I am noting this here because I can't see how this is good in mixed networks, with regard to name confusion. 

Doing what Windows does would be the wrong fix, I think.
Comment 3 Alexander Bokovoy 2021-10-20 11:16:19 UTC
Right now MS-KILE says: https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-kile/efe39b86-4dda-43ce-aacf-390fcb2cbd43

3.1.5.7 Internationalization and Case Sensitivity
The Kerberos V5 protocol specifies rules for encoding and processing names, both for character set and case ([RFC4120] section 6).

Name comparisons, whether for users or domains, MUST NOT be case sensitive in KILE. KILE MUST use UTF-8 encoding of these names [RFC2279]. Normalization MUST NOT be performed and surrogates MUST NOT be supported. Names SHOULD<27> match.


Previously, MS-KILE 3.1.5.7 said:

Name comparisons, whether for users or domains, MUST NOT be case sensitive in KILE. KILE MUST use UTF-8 encoding of these names [RFC2279]. Normalization MUST NOT be performed and surrogates MUST NOT be supported. To match names, the GetWindowsSortKey algorithm ([MS-UCODEREF] section 3.1.5.2.4) with the following flags NORM_IGNORECASE, NORM_IGNOREKANATYPE, NORM_IGNORENONSPACE, and NORM_IGNOREWIDTH SHOULD be used then the CompareSortKey algorithm ([MS-UCODEREF] section 3.1.5.2.2) SHOULD be used to compare the names.Note that this applies only to names; passwords (and the transformation of a password to a key) are governed by the actual key generation specification ([RFC4120], [RFC4757], and [RFC3962]).
Comment 4 Andrew Bartlett 2022-06-13 23:13:17 UTC
Removing embargo, this was embargoed to avoid drawing attention to the various SPN issues addressed in Nov 2021