Bug 1572 - japanese UTF8 chars and CIFS
Summary: japanese UTF8 chars and CIFS
Alias: None
Product: CifsVFS
Classification: Unclassified
Component: kernel fs (show other bugs)
Version: 2.6
Hardware: All Linux
: P3 normal
Target Milestone: ---
Assignee: Steve French
QA Contact: Samba QA Contact
Depends on:
Reported: 2004-07-29 21:17 UTC by Clemens Schwaighofer
Modified: 2005-11-14 09:41 UTC (History)
1 user (show)

See Also:

tar archive with empty japanese filename (10.00 KB, application/x-tar)
2004-07-30 15:37 UTC, Björn Jacke
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Clemens Schwaighofer 2004-07-29 21:17:43 UTC
As I got recommended I use CIFS instead of SMBFS for mounting shares. But this
didn't change my problem of not seeing UTF8 (japanese kanji in this case)

The only difference is, instead of "cutting of the file" or "not showing it"
like SMBFS did, CIFS just shows ?

in the log file I see this:
[2004/07/30 13:09:33, 0] lib/iconv.c:utf8_pull(506)
  short utf8 char

but I get exaclty the same message when mounted via SMBFS.

Server: Debian/Testing 2.6.4 with 3.0.5 samba
Client: Gentoo/stable 2.6.5-gentoo-1 3.0.5 samba

I can't test this from my debian test box, because debian doesn't ship
mount.cifs. I have to compile that by hand first.
Comment 1 Björn Jacke 2004-07-30 02:35:58 UTC
I can also see problems with japanese filenames, though simple umlauts don't
make any problem. creating files works fine, "ls"'ing not. when jap. filenames
should be displayed, they are partly displayed but a lot of other garbage is
When I "ls" not the directory where the file remains but the full path including
the file, it is displayed correctly.
I have samba 3.0.5 and the SUSE 2.6.5 kernel of SLES9.
Comment 2 Clemens Schwaighofer 2004-07-30 03:53:20 UTC
I have not even tried to create files via CFIS in japanese, I just tried to view
pre-created files.
Comment 3 Steve French 2004-07-30 09:30:23 UTC
What is the cifs vfs version (see fs/cifs/CHANGES file or modinfo on cifs.ko)? 
Are the UTF-8 NLS kernel modules loaded on the client?  The translations from Unicode 
(16 bit Unicode) to the client's code page is done by kernel functions that depend on the 
optional build of certain NLS kernel modules.    Samba on the server depends on a 
different mapping table for mapping from Unicode to UTF-8. 
Are you overriding iocharset on the mount option on the client? 
Comment 4 Björn Jacke 2004-07-30 15:36:03 UTC
samba works correct here, "unix charset=utf8" with windows clients no problem
with jap. filenames.

bjacke@pell:~> /sbin/modinfo cifs
version:        1.18 8EA897319BE63BB99E39A8E

you can test it on your own, just "touch ようこそ".

Unfortunately this bugzilla is still not running in utf8, so you won't see
usefull thing after touch ;-)
I'll attach a tar file of a utf-8 encoded japanese filename. Try to put that via
cifs on a win* or samba server.
Comment 5 Björn Jacke 2004-07-30 15:37:14 UTC
Created attachment 591 [details]
tar archive with empty japanese filename
Comment 6 Björn Jacke 2004-07-30 15:43:40 UTC
and I did not mention: yes, utf8 nls module is loaded and it's the default nls
here. It works fine with german umlauts, which are also multibyte but those
japanese files fail, and just in directory listings, opening seems to work,
creating works too.
Comment 7 Clemens Schwaighofer 2004-07-30 16:59:21 UTC
same here, UTF8 is compiled in the kernel, CIFS same, so I'll probably get the
same data like him.
Overriding iocharset is not working with mount.cifs. At least there is none such
option mentioned in the man page.
Comment 8 Steve French 2004-07-31 17:45:44 UTC
Note that iocharset is a supported cifs mount option (see fs/cifs/README for 
more details, and the code that implements it is in fs/cifs/connect.c)
Comment 9 Clemens Schwaighofer 2004-08-03 01:24:09 UTC
Two things:

if I mount with option "iocharset=utf8" then the ls either core dumps or goes
into "D" state.

second, if I go with konqueror and "smb://user@server/folder/" I can see all the
UTF8 japaense files there perfectly.
Comment 10 Björn Jacke 2004-08-03 01:46:41 UTC
You are mixing up things. Konqueror does not use cifs, it uses libsmb of samba.
Steve, did you try extracting that tar onto a cifs mounted share?
Comment 11 Clemens Schwaighofer 2004-08-03 06:54:52 UTC
yeah sorry, just wanted to notice that in konqueror.

yeah, and if you call that ls twice, it will hang and bring down cifsd and at
the end, I had to reboot my box :)

So don't use iocharset with mount.cifs
Comment 12 Björn Jacke 2004-09-21 14:56:29 UTC
to keep this bug uptodate here some information from the cifs mailing list:

the cifs module will fail with filenames whose utf-8 presentation is longer than
the corresponding utf-16 presentation of the same filename. That is always a
problem with Japanese or Chinese filenames. This will eventually be fixed in
kernel 2.6.9.
Comment 13 Steve French 2005-03-07 21:08:46 UTC
readdir is fixed in 2.6.10 or later for the case of utf8 characters in a string 
that average longer than 2 bytes (patches also for some earlier kernels on the 
project page).