The Samba-Bugzilla – Bug 12523
classicupgrade uses previous "unix charset" to write in AD LDAP
Last modified: 2017-03-29 16:28:16 UTC
when there is a PDC with "unix charset = iso8859-1" and if you use classicupgrade to migrate the data to AD, then it writes the data in the same encoding into the AD database, for example a user description attribute "Täst" ends up as:
classicupgrade should convert strings from "unix charset" to utf8.
this is specificylly an issue with a data import from tdbsam passdb backend. ldapsam already had the strings utf-8 encoded no matter what unix charset was.
even worse is that those iso8859-1 encoded atributes cannot be modified, not even deleted, when that attribute is an indexed attribute.
A workaround currently to delete those attributes is then to take Samba offline, set unix charset on the AD server to iso8859-1, then fix the attribute offline with ldbedit, change unix charset back and start samba again.
(And that workaround actually reveals another bug: even with unix charset != UTF-8, Samba AD should always make sure that the encoding in the LDAP server is UTF-8. It would probably be a good idea to disallow samba startup with unix charset != UTF-8)
Yes, we need to honour the original unix charset for the duration of the command, and then write values using push_utf8_talloc() like pdb_ldap et al do.
That will handle the basics, but further drama is likely as the AD DC code is, as you point out, totally incompatible with a unix charset that is not UTF8.