Bug 1521 - SFN process must be based on dos charset, not UCS2
SFN process must be based on dos charset, not UCS2
Status: REOPENED
Product: Samba 3.0
Classification: Unclassified
Component: Extended Characters
3.0.32
All All
: P3 normal
: none
Assigned To: Jeremy Allison
Samba QA Contact
:
: 5857 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2004-07-13 01:59 UTC by SATOH Fumiyasu
Modified: 2011-02-27 09:12 UTC (History)
5 users (show)

See Also:


Attachments
patch for mangle_hash.c (mangle_hash2.c must be fix later...) (18.25 KB, patch)
2004-09-01 22:20 UTC, SATOH Fumiyasu
no flags Details
an example for "mangling method = hash" under Japanese (77.73 KB, image/tiff)
2010-10-10 13:20 UTC, TAKAHASHI Motonobu
no flags Details
an problematic filename(UTF-8) (226 bytes, application/gzip)
2010-10-10 20:30 UTC, TAKAHASHI Motonobu
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description SATOH Fumiyasu 2004-07-13 01:59:37 UTC
In source/smbd/mangle_hash.c, the process of SNF (8.3 format
filenames) is based on UCS2. This is a bad thing because this 
process must be based on dos charset.
Comment 1 Gerald (Jerry) Carter 2004-09-01 09:09:56 UTC
another extended char set report
Comment 2 Jeremy Allison 2004-09-01 10:52:20 UTC
Can you give more comments please ? I need to know under what circumstances this
will cause a problem to understand the bug better.

Thanks,

Jeremy.
Comment 3 SATOH Fumiyasu 2004-09-01 22:20:40 UTC
Created attachment 636 [details]
patch for mangle_hash.c (mangle_hash2.c must be fix later...)

The following filename (8 x '<A>' + '.ext', <A> means Japanese
HIRAGANA letter 'A') is NOT legal in short file name (SFN):

    <A><A><A><A><A><A><A><A>.ext

This filename must be mangled for SFN, but Samba does not mangle it.

filename			must be Samba does
				mangle	mangle
--------			------- ----------
<A>.ext 			no	no
<A><A><A><A>.ext		no	no
<A><A><A><A><A>.ext		yes	no (bad!)
<A><A><A><A>x.ext		yes	no (bad!)
x<A><A><A><A>.ext		yes	no (bad!)
<A><A><A><A><A><A><A><A>.ext	yes	no (bad!)
<A><A><A><A><A><A><A><A><A>.ext yes	no (bad!)
<A><A><A><A><A><A><A><A>x.ext	yes	yes
x<A><A><A><A><A><A><A><A>.ext	yes	yes

In SFN, "8.3" means
    8 bytes code + '.' + 3 bytes code in dos charset
, NOT
    8 characters + '.' + 3 characters in XXX charset
.

The Japanese HIRAGANA letter 'A' has the following
attributes:

    encoding		byte length	character
    (charset)		(byte code)	length
    ---------		-----------	---------
    CP932		2 (82 A0)	1
    UTF-8		3 (E3 81 82)	1
    UCS-2		2 (42 30)	1
    others		-		1

'<A><A><A><A>.ext' in dos charset contains 8 bytes basename,
'.' and 3 bytes ext. This must not be mangled.
'<A><A><A><A><A>.ext' in dos charset contains 10 bytes basename,
'.' and 3 bytes ext. This must be mangled.
Comment 4 SATOH Fumiyasu 2004-09-01 22:25:48 UTC
Correction:

filename                        must be Samba does
                                mangle  mangle
--------                        ------- ----------
<A>.ext                         no      no
<A><A><A><A>.ext                no      no
<A><A><A><A><A>.ext             yes     no (bad!)
<A><A><A><A>x.ext               yes     no (bad!)
x<A><A><A><A>.ext               yes     no (bad!)
<A><A><A><A><A><A><A><A>.ext    yes     no (bad!)
<A><A><A><A><A><A><A><A><A>.ext yes     yes <-here!
<A><A><A><A><A><A><A><A>x.ext   yes     yes
x<A><A><A><A><A><A><A><A>.ext   yes     yes

Sorry.
Comment 5 Gerald (Jerry) Carter 2005-02-07 07:29:40 UTC
resetting version
Comment 6 Gerald (Jerry) Carter 2006-04-20 08:03:30 UTC
severity should be determined by the developers and not the reporter.
Comment 7 TAKAHASHI Motonobu 2008-11-11 08:44:36 UTC
Still not fixed in Samba 3.0.24.

Comment 8 SATOH Fumiyasu 2008-11-11 09:22:22 UTC
This bug exists in Samba 3.0.32 and 3.2.4 too.
Comment 9 Jeremy Allison 2008-11-19 19:49:41 UTC
*** Bug 5857 has been marked as a duplicate of this bug. ***
Comment 10 Jeremy Allison 2008-11-20 13:20:51 UTC
I have an idea on how to correctly fix this for 3.2.x and above. Unfortunately this won't get done for the next 3.2 release (or the first 3.3 release). But I do intend to fix this correctly.
Jeremy.
Comment 11 Stefan Metzmacher 2010-04-26 03:37:26 UTC
Jeremy, please reopen if you still want to fix this
Comment 12 Stefan Metzmacher 2010-04-26 04:28:18 UTC
Jeremy, can you research if this is still a bug in 3.5?
Comment 13 TAKAHASHI Motonobu 2010-10-10 13:16:45 UTC
(In reply to comment #12)
> Jeremy, can you research if this is still a bug in 3.5?
> 
Yes, I attached an image for Japanese file names.
Comment 14 TAKAHASHI Motonobu 2010-10-10 13:20:41 UTC
Created attachment 6004 [details]
an example for "mangling method = hash" under Japanese

English filename 'damedame' is truncated well in "mangling method = hash".
Japanese filenames are not, only first 4 characters (8 bytes) are shown as their short file name.
Comment 15 Björn Jacke 2010-10-10 16:01:17 UTC
can you please attach a tar archive containing emtpy files with those problematic filenames? Please also document which encodings the coresponding files have.
Comment 16 TAKAHASHI Motonobu 2010-10-10 20:30:07 UTC
Created attachment 6005 [details]
an problematic filename(UTF-8)

(In reply to comment #15)
> can you please attach a tar archive containing emtpy files with those
> problematic filenames? Please also document which encodings the coresponding
> files have.

Ok, I attached a sample of UTF-8 filename.

Please not that this is not only problematic filename(s). All multibyte characters will cause that same problem.

My smb.conf:

[global]
  dos charset = CP932
  unix charset = UTF-8

[homes]
  writeable = yes
  browseable = no

I tested under self-compiled Samba 3.5.6 with EXT3 on Lenny.
Comment 17 TAKAHASHI Motonobu 2010-10-10 20:33:11 UTC
(In reply to comment #16)

Sorry,smb.conf shown at comment #16 is wrong, please use this: 

[global]
  dos charset = CP932
  unix charset = UTF-8
  mangling method = hash

[homes]
  writeable = yes
  browseable = no