Bug 11179 - Samba-tool classicupgrade problem importing russian "И" from LDAP
Summary: Samba-tool classicupgrade problem importing russian "И" from LDAP
Status: NEEDINFO
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Tools (show other bugs)
Version: 4.1.17
Hardware: x64 FreeBSD
: P5 normal (vote)
Target Milestone: ---
Assignee: Andrew Bartlett
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on: 12523
Blocks:
  Show dependency treegraph
 
Reported: 2015-03-24 14:18 UTC by Dron
Modified: 2022-08-11 02:32 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dron 2015-03-24 14:18:49 UTC
Hello.
FreeBSD 10.1 amd64
Samba 4.1.17

While trying to update Samba3+LDAP domain to Samba4 AD domain got this error:

...
...
Sorting rpmd with attid exception 3 rDN=CN DN=CN=bilyak.i,CN=Users,DC=sdr,DC=tld
convert_string_talloc: Conversion error: Illegal multibyte sequence(▒рина Сергеевна)
Conversion error: Illegal multibyte sequence(▒рина Сергеевна)
Failed to modify account record CN=bilyak.i,CN=Users,DC=sdr,DC=tld to set user attributes: objectclass_attrs: attribute 'displayName' on entry 'CN=bilyak.i,CN=Users,DC=sdr,DC=tld' contains at least one invalid value!

There is a problem in converting russian capital "И" character. Appeared at any field where this character appear.
Comment 1 Dron 2015-11-30 11:14:58 UTC
Hello!
Any updates about this issue?
Comment 2 Andrew Bartlett 2015-11-30 20:59:17 UTC
You need to confirm that 'unix charset = utf8' and that your displayName is UTF8 in the existing LDAP server.
Comment 3 Dron 2015-11-30 21:06:25 UTC
Hello Andrew!
I have unix charset = utf-8 in my smb.conf
Server locale is
# locale
LANG=ru_RU.UTF-8
LC_CTYPE="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_ALL=

And records in LDAP also in utf-8.
Comment 4 Dron 2017-03-29 11:26:22 UTC
Just tried migration on FreeBSD 11 p8 with samba 4.5.7 from ports.
Issue still persist - http://i.imgur.com/AkZUDlO.png
Workaround is to temporary change russian И in all records to something else.
Comment 5 Björn Jacke 2017-03-29 16:28:16 UTC
if "locale" is utf-8 is not relevant here. Also not utf-8 encoding in the LDAP previous server. Only the "unix charset" setting of smb.conf that was used during classicupgrade matters and I'm quit sure that this was not UTF-8.
Comment 6 Douglas Bagnall 2022-08-11 02:32:53 UTC
$ echo -n '▒рина Сергеевна' | hd
00000000  e2 96 92 d1 80 d0 b8 d0  bd d0 b0 20 d0 a1 d0 b5  |........... ....|
00000010  d1 80 d0 b3 d0 b5 d0 b5  d0 b2 d0 bd d0 b0        |..............|
0000001e

The first three bytes *are* valid utf-8, but they are the bytes for '▒', not 'И'.  C.f. https://www.mclean.net.nz/ucf/?c=U+2592

$ echo -n 'И' | hd
00000000  d0 98                                             |..|

Presumably '▒' is substituting for the invalid bytes, but the mystery is how a single character got to be invalid while the rest is clearly utf-8.