Bug 593 - %U macro doesn't take non-ascii chars
Summary: %U macro doesn't take non-ascii chars
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: Extended Characters (show other bugs)
Version: 3.0.20
Hardware: All All
: P3 normal
Target Milestone: none
Assignee: Jeremy Allison
QA Contact: Samba QA Contact
Depends on:
Blocks: 2345
  Show dependency treegraph
Reported: 2003-10-10 00:22 UTC by Shiro Yamada
Modified: 2007-08-28 11:52 UTC (History)
2 users (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Shiro Yamada 2003-10-10 00:22:17 UTC
When non-ascii characters are used in %U macro, Samba would convert them into
many '_'s.

For instance, if I specify a user with Japanese name in username.map file,
and if I set comment on a [data] share be:

-- smb.conf -----------------------------------
	username map = /etc/samba/username.map

	path = /opt/data
	comment = "User name is: %U"

and then I use rpcclient to display the comment on [data], I get
something like:

$ rpcclient _SERVER_ -U _NAME_IN_JAP_%_PASSWD_ -c 'netshareenum'
netname: data
        remark: User name is: ____
        path:   C:\opt\data

where _NAME_IN_JAP_ is the username I specified in username.map

I've also observed that %m has same problem. Samba creates a faulty log
file name when it has been accessed by a non-ascii NetBIOS name client.

        -- smb.conf ----------------------------
        log file = /var/log/samba/log.%m

        outcome: /var/log/samba/log.______

This problem may be applicable to other macros.
Comment 1 Andrew Bartlett 2003-10-22 16:28:01 UTC
This is by design, for security reasons.  Remember the ../../ exploit in earlier
samba versions?  We don't want a repeat, and the %U macro is the name the user
specified, not always a valid name on the system.
Comment 2 Shiro Yamada 2003-10-24 00:46:06 UTC
Yes, I am aware that there are security risks in the expansion of macros.
However, current implementation throws away all the harmless multibyte characters
as well as those illegal ascii characters, leaving only the valid ascii characters.

I am currently figuring out whether there are potentially dangerous characters
in multibyte characters. If they appear to be safe, then there is no need to
prohibit the use of multibyte characters, or if they appeared to be dangerous, an
apporpriate action should be taken for those dangerous ones.

This problem is encoding dependent and you may argue that they should be dealt
under iconv(), but let me have a time to do more research on this area.
Comment 3 Alexander Bokovoy 2003-12-09 01:22:53 UTC
Have you had a chance to perform promised research?
Comment 4 Shiro Yamada 2003-12-10 21:21:21 UTC
Here they are.
We've focused on to the CJK encodings only, but I guess they are the most
problematic languages of all. As you have suggested, some of these characters
do contain some dangerous ascii codes within themselves.

For CP932 (Japanese), it may contain some ascii codes in its second byte, ranging
(0x40-0x7E). These codes also appear in the second byte of Big5 (Taiwanese).
  40-7E = @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

The second byte of GB18030 (Chinese) may contain code range of (0x40-0x7E).
The fourth byte of GB18030 (Chinese) may also contain range of (0x30-0x39).
  40-7E = @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
  30-39 = 0123456789

UHC (Korean) may contain (41-5A, 61-7A) in its second byte.
  61-7A = abcdefghijklmnopqrstuvwxyz

Reference: ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf
Comment 5 Gerald (Jerry) Carter (dead mail address) 2005-02-08 21:27:38 UTC
ab, any progess on this one ?
Comment 6 Gerald (Jerry) Carter (dead mail address) 2005-02-15 09:34:37 UTC
update to 3.0.11 and assigning to jeremy.  He seems to 
think this must work.
Comment 7 Gerald (Jerry) Carter (dead mail address) 2005-02-15 09:35:16 UTC
*** Bug 2345 has been marked as a duplicate of this bug. ***
Comment 8 Pascal Terjan 2005-02-15 10:15:09 UTC
(In reply to comment #7)
> *** Bug 2345 has been marked as a duplicate of this bug. ***

I agree this is strongly related, however my bug was not asking to stop
replacing by __ but about the fact that after replacement it does not work (i.e.
the replacment seems to not occur everywhere).
Comment 9 Gerald (Jerry) Carter (dead mail address) 2005-06-09 05:40:53 UTC
got another report of this on the samba ml.
Comment 10 Gerald (Jerry) Carter (dead mail address) 2006-04-08 07:43:39 UTC
Cleaning up versions.  There was no 3.0.15 so leaving it in bugzilla 
is causing some confusion.  Moving these nuder 3.0.20.
Originally files against 3.0.15preX.
Comment 11 Gerald (Jerry) Carter (dead mail address) 2007-08-28 11:52:29 UTC
This is current;y by design I think.