Bug 12523 - classicupgrade uses previous "unix charset" to write in AD LDAP
Summary: classicupgrade uses previous "unix charset" to write in AD LDAP
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: AD: LDB/DSDB/SAMDB (show other bugs)
Version: 4.5.3
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Andrew Bartlett
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks: 11179
  Show dependency treegraph
 
Reported: 2017-01-17 13:26 UTC by Björn Jacke
Modified: 2017-03-29 16:28 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Björn Jacke 2017-01-17 13:26:50 UTC
when there is a PDC with "unix charset = iso8859-1" and if you use classicupgrade to migrate the data to AD, then it writes the data in the same encoding into the AD database, for example a user description attribute "Täst" ends up as:

description:: VORzdA==

classicupgrade should convert strings from "unix charset" to utf8.
Comment 1 Björn Jacke 2017-01-17 18:55:48 UTC
this is specificylly an issue with a data import from tdbsam passdb backend. ldapsam already had the strings utf-8 encoded no matter what unix charset was.
Comment 2 Björn Jacke 2017-01-23 13:32:21 UTC
even worse is that those iso8859-1 encoded atributes cannot be modified, not even deleted, when that attribute is an indexed attribute.

A workaround currently to delete those attributes is then to take Samba offline, set unix charset on the AD server to iso8859-1, then fix the attribute offline with ldbedit, change unix charset back and start samba again.

(And that workaround actually reveals another bug: even with unix charset != UTF-8, Samba AD should always make sure that the encoding in the LDAP server is UTF-8. It would probably be a good idea to disallow samba startup with unix charset != UTF-8)
Comment 3 Andrew Bartlett 2017-01-30 11:05:45 UTC
Yes, we need to honour the original unix charset for the duration of the command, and then write values using push_utf8_talloc() like pdb_ldap et al do.

That will handle the basics, but further drama is likely as the AD DC code is, as you point out, totally incompatible with a unix charset that is not UTF8.

Andrew Bartlett