In source/smbd/mangle_hash.c, the process of SNF (8.3 format filenames) is based on UCS2. This is a bad thing because this process must be based on dos charset.
another extended char set report
Can you give more comments please ? I need to know under what circumstances this will cause a problem to understand the bug better. Thanks, Jeremy.
Created attachment 636 [details] patch for mangle_hash.c (mangle_hash2.c must be fix later...) The following filename (8 x '<A>' + '.ext', <A> means Japanese HIRAGANA letter 'A') is NOT legal in short file name (SFN): <A><A><A><A><A><A><A><A>.ext This filename must be mangled for SFN, but Samba does not mangle it. filename must be Samba does mangle mangle -------- ------- ---------- <A>.ext no no <A><A><A><A>.ext no no <A><A><A><A><A>.ext yes no (bad!) <A><A><A><A>x.ext yes no (bad!) x<A><A><A><A>.ext yes no (bad!) <A><A><A><A><A><A><A><A>.ext yes no (bad!) <A><A><A><A><A><A><A><A><A>.ext yes no (bad!) <A><A><A><A><A><A><A><A>x.ext yes yes x<A><A><A><A><A><A><A><A>.ext yes yes In SFN, "8.3" means 8 bytes code + '.' + 3 bytes code in dos charset , NOT 8 characters + '.' + 3 characters in XXX charset . The Japanese HIRAGANA letter 'A' has the following attributes: encoding byte length character (charset) (byte code) length --------- ----------- --------- CP932 2 (82 A0) 1 UTF-8 3 (E3 81 82) 1 UCS-2 2 (42 30) 1 others - 1 '<A><A><A><A>.ext' in dos charset contains 8 bytes basename, '.' and 3 bytes ext. This must not be mangled. '<A><A><A><A><A>.ext' in dos charset contains 10 bytes basename, '.' and 3 bytes ext. This must be mangled.
Correction: filename must be Samba does mangle mangle -------- ------- ---------- <A>.ext no no <A><A><A><A>.ext no no <A><A><A><A><A>.ext yes no (bad!) <A><A><A><A>x.ext yes no (bad!) x<A><A><A><A>.ext yes no (bad!) <A><A><A><A><A><A><A><A>.ext yes no (bad!) <A><A><A><A><A><A><A><A><A>.ext yes yes <-here! <A><A><A><A><A><A><A><A>x.ext yes yes x<A><A><A><A><A><A><A><A>.ext yes yes Sorry.
resetting version
severity should be determined by the developers and not the reporter.
Still not fixed in Samba 3.0.24.
This bug exists in Samba 3.0.32 and 3.2.4 too.
*** Bug 5857 has been marked as a duplicate of this bug. ***
I have an idea on how to correctly fix this for 3.2.x and above. Unfortunately this won't get done for the next 3.2 release (or the first 3.3 release). But I do intend to fix this correctly. Jeremy.
Jeremy, please reopen if you still want to fix this
Jeremy, can you research if this is still a bug in 3.5?
(In reply to comment #12) > Jeremy, can you research if this is still a bug in 3.5? > Yes, I attached an image for Japanese file names.
Created attachment 6004 [details] an example for "mangling method = hash" under Japanese English filename 'damedame' is truncated well in "mangling method = hash". Japanese filenames are not, only first 4 characters (8 bytes) are shown as their short file name.
can you please attach a tar archive containing emtpy files with those problematic filenames? Please also document which encodings the coresponding files have.
Created attachment 6005 [details] an problematic filename(UTF-8) (In reply to comment #15) > can you please attach a tar archive containing emtpy files with those > problematic filenames? Please also document which encodings the coresponding > files have. Ok, I attached a sample of UTF-8 filename. Please not that this is not only problematic filename(s). All multibyte characters will cause that same problem. My smb.conf: [global] dos charset = CP932 unix charset = UTF-8 [homes] writeable = yes browseable = no I tested under self-compiled Samba 3.5.6 with EXT3 on Lenny.
(In reply to comment #16) Sorry,smb.conf shown at comment #16 is wrong, please use this: [global] dos charset = CP932 unix charset = UTF-8 mangling method = hash [homes] writeable = yes browseable = no
Fumiyasu, Motonobu: Can you say if this is still a problem?