Bug 962 - MB ']' (0x5D) problem in share name definition
Summary: MB ']' (0x5D) problem in share name definition
Status: CLOSED FIXED
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: Extended Characters (show other bugs)
Version: 3.0.3
Hardware: Other All
: P3 normal
Target Milestone: none
Assignee: Alexander Bokovoy
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-01-12 23:15 UTC by Shiro Yamada
Modified: 2005-08-24 10:23 UTC (History)
4 users (show)

See Also:


Attachments
A patch for loading smb.conf twice, once in unix charset and once in UTF-8 (6.59 KB, patch)
2004-01-29 21:26 UTC, Shiro Yamada
no flags Details
How a share name with char 0x955d is broken in share list. (8.38 KB, image/x-png)
2005-02-19 23:43 UTC, TAKAHASHI Motonobu
no flags Details
Committed patch. (2.60 KB, patch)
2005-02-24 13:58 UTC, Jeremy Allison
no flags Details
smb.conf for testing (93 bytes, application/octet-stream)
2005-02-25 11:12 UTC, TAKAHASHI Motonobu
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Shiro Yamada 2004-01-12 23:15:05 UTC
When Samba parses a share name, it looks for closing bracket for its termination
sequentially. The code for a closing bracket is (0x5D) ascii code. 

However, for some MB character sets such as Japanese CP932 and Chinese GB18030,
there are some characters which own '0x5D' code in their second bytes or
fourth bytes. Because the parser reads the share name byte by byte, it cannot
identify a '0x5D' code as a part of a MB char but wrongly identifies the code
as the end of share name definition.

Hence, upon loading smb.conf, if a MB char with '0x5D' is used within a share
name definition, its name would be terminated in the middle of definition.
Comment 1 TAKAHASHI Motonobu 2004-01-13 05:12:07 UTC
This bug is duplicated with BUG#462
Comment 2 TAKAHASHI Motonobu 2004-01-13 05:14:49 UTC
Sorry, my mistake.
Please ignore my previous comment.
Comment 3 Shiro Yamada 2004-01-29 21:26:48 UTC
Created attachment 373 [details]
A patch for loading smb.conf twice, once in unix charset and once in UTF-8

This patch (partially) solves BUG #962 and #957 by loading smb.conf twice.
In the first phase the parser looks for unix charset, and in the second phase
it converts chars into UTF-8, parse them and convert them back to the
original unix charset before passing over to other functions. 

It works if unix charset is correctly identified in the first phase. That is,
if smb.conf is defined as

[global]
	server string = <0xXX><0x5c>
	unix charset = CP932

where <0xXX><0x5c> is a MB character with <0x5c> corresponds to '\' in ascii
code, parser fails to load unix charset in the first phase (refer BUG #957)

I know this is not a neat solution, but there is nothing much we can do unless
we change the way of handling smb.conf.
Comment 4 Gerald (Jerry) Carter (dead mail address) 2004-03-18 06:45:31 UTC
are we (in the latest SAMBA 3.0 cvs) in better shape for 
this now ? 
Comment 5 Shiro Yamada 2004-03-23 22:16:12 UTC
No, unless you've done something to param/param.c, it is impossible to fix
this bug. Same principle applies to Bug #957.
Comment 6 Gerald (Jerry) Carter (dead mail address) 2005-02-08 21:48:19 UTC
can't see this getting fixed in Samba 3.
Comment 7 TAKAHASHI Motonobu 2005-02-19 22:57:48 UTC
>can't see this getting fixed in Samba 3.

Its severity is very high in Japan. Unless this bug is fixed, in Japan we can 
hardly say Samba is i18n'ed and Japanese-ready.

I think that to change that smb.conf is always written in UTF-8 regardless of 
unix charset is a good resolution. This change will also fix BUG#1069 and 
BUG#496.

P.S.
I think the encoding of all files such as smb.conf, tdb files and etc... must 
be fixed (probably UTF-8 is better), should not depend on unix charset.

Comment 8 Jeremy Allison 2005-02-19 23:16:04 UTC
No, we can't arbitrarily force utf8 for smb.conf. This will
break a lot of smb.confs.
I thought we'd always specified that the "unix charset" and
"dos codepage" entries *must* come first for an smb.conf to be read
correctly in the native codepage. I even remember writing some
docs to that effect....
If we tell users that the "unix charset" entry must come first
in the smb.conf - does this fix the problem in Japan ? If so,
then this is a documentation issue. As I recall this was the
way it was supposed to work (unix charset must come first if
you need mb characters in smb.conf).
Jeremy.
Comment 9 TAKAHASHI Motonobu 2005-02-19 23:43:49 UTC
Created attachment 974 [details]
How a share name with char 0x955d is broken in share list.

(In reply to comment #8)
> No, we can't arbitrarily force utf8 for smb.conf. This will
> break a lot of smb.confs.

Hmmm...,

> I thought we'd always specified that the "unix charset" and
> "dos codepage" entries *must* come first for an smb.conf to be read
> correctly in the native codepage.

Of course, yes.
But this problem occurs even if we write {unix,dos} charset first in smb.conf.
I attached a sample image to show how the shares are shown. Ths smb.conf is
writte like:

-----
[global]
  dos charset = CP932
  unix charset = CP932

...

[<95><5b>]
  comment = 0x955b

[<95><5c>]
  comment = 0x955c

[<95><5d>]
  comment = 0x955d

[<95><5e>]
  comment = 0x955e

> does this fix the problem in Japan ?

Unfortunately no.
Comment 10 Jeremy Allison 2005-02-20 00:06:44 UTC
Ok so the bug looks to be no mb processing when looking for ']'
characters within smb.conf stanza processing. This is a (relatively) simple
fix within param/param.c - as ']' is the only special character looked
for (ok, maybe some spaces as well). All we need do is correctly
change param/param.c to process the current mb unix character set
and ensure all the docs say that "unix charset" must come first.
Do you concurr ? This is a much easier fix than loading twice or
converting to utf8. I'll look at this for 3.0.12.
Jeremy.
Comment 11 TAKAHASHI Motonobu 2005-02-20 07:38:09 UTC
concurr(In reply to comment #10)
> All we need do is correctly
> change param/param.c to process the current mb unix character set
> and ensure all the docs say that "unix charset" must come first.
> Do you concurr ?

Yes, I think so.

> I'll look at this for 3.0.12.
> Jeremy.

OK, thanks.

Comment 12 Jeremy Allison 2005-02-24 13:58:53 UTC
Created attachment 978 [details]
Committed patch.

Ok, this is the fix I've committed. Please test with Japanese character sets.
Jeremy.
Comment 13 Jeremy Allison 2005-02-24 14:04:18 UTC
I think this is now fixed in SVN.
Remember to set :

unix charset = "XXXX"

as the first entry in your [global] section in the smb.conf
if you want to use MB sharenames in that character set.

Jeremy.
Comment 14 TAKAHASHI Motonobu 2005-02-25 11:12:22 UTC
Created attachment 987 [details]
smb.conf for testing

(In reply to comment #13)
> I think this is now fixed in SVN.
Umm..., it seems to not be fixed yet.
I checked on Debian GNU/Linux 3.0 on x86.
Endian issue?

Attachment is my testing smb.conf
Comment 15 Jeremy Allison 2005-02-25 16:44:19 UTC
Ok, I've checked on Fedora core 3 and the problem seems to be that
the C library doesn't recognise a locale of CP932. The new code in Samba
correctly recognises the 0x955d character as one character with this smb.conf (I
checked by putting a breakpoint on FindSectionEnd in param/param.c - the first
breakpoint is triggered by [global], the second with [0x955d] - when I look at
where the code thinks the end of the section is it correctly finds the second
']' character in the ascii stream - meaning it knows the 0x955d is one character.
I can't display it using smbclient or looking at the smb.conf using gedit as the
C library on Linux complains with "Locale not supported by C library" when I set
it to "cp932".
How are you testing this ?
Jeremy.

Comment 16 Jeremy Allison 2005-02-25 16:51:35 UTC
The new code in param/param.c is definately parsing the 0x955d as one character.
I've tried to set the code page on a Win2k3 box to 932 here at connectathon but
it's failing with "invalid code page" (I'm guessing as it's not a Japanese
version of Windows). I'm going to need some help debugging this if it's not
displaying right on a Windows client. I know the sharename is being set
correctly to 0x955d in Samba when it parses the share.
Jeremy.
Comment 17 Jeremy Allison 2005-02-25 17:29:17 UTC
Ok - I checked on the wire between smbclient and the latest svn code with this
smb.conf - we *are* returning 0x955d as the sharename. Also :-) I was suprised
(but pleaesed :-) to see that smbclient *WAS* displaying the correct Japanese
SJIS character 0x955d (I looked it up on the web). Then I realised that
smbclient converts from DOS charset to UNIX charset when reading from the RAP
call, then from UNIX charset (cp932 in this case) to *display* charset (utf8 on
Fedora Core 3) when printing the string ! So it was correct.
I'm still convinced this bug is fixed.
Jeremy.

Comment 18 TAKAHASHI Motonobu 2005-02-25 19:00:41 UTC
(In reply to comment #17)
> I'm still convinced this bug is fixed.
> Jeremy.

I'm sorry, this is my mistake. I forgot "make install" and the SVN version of 
Samba was not installed.

Now I checked the correct SVN version and find this bug is fixed. Sorry again.

>I've tried to set the code page on a Win2k3 box to 932 here at connectathon but
>it's failing with "invalid code page" (I'm guessing as it's not a Japanese
>version of Windows). 

If you have MSDN, you can set codepage 932 (or other code pages) to install MUI 
(Multilingual User Interface) on English version of Windows.

> Also :-) I was suprised
> (but pleaesed :-) to see that smbclient *WAS* displaying the correct Japanese
> SJIS character 0x955d (I looked it up on the web). Then I realised that
> smbclient converts from DOS charset to UNIX charset when reading from the RAP
> call, then from UNIX charset (cp932 in this case) to *display* charset (utf8 
on 
> Fedora Core 3) when printing the string ! So it was correct.

For the sake of dprintf() and i18n feature of Samba 3.0, the client commands 
support displaying Japanese character correctly as far as I examined :-) 

Thanks.
Comment 19 Gerald (Jerry) Carter (dead mail address) 2005-08-24 10:23:52 UTC
sorry for the same, cleaning up the database to prevent unecessary reopens of bugs.