When non-ascii characters are used in %U macro, Samba would convert them into many '_'s. For instance, if I specify a user with Japanese name in username.map file, and if I set comment on a [data] share be: -- smb.conf ----------------------------------- [global] : username map = /etc/samba/username.map [data] path = /opt/data comment = "User name is: %U" ----------------------------------------------- and then I use rpcclient to display the comment on [data], I get something like: $ rpcclient _SERVER_ -U _NAME_IN_JAP_%_PASSWD_ -c 'netshareenum' netname: data remark: User name is: ____ path: C:\opt\data password: where _NAME_IN_JAP_ is the username I specified in username.map I've also observed that %m has same problem. Samba creates a faulty log file name when it has been accessed by a non-ascii NetBIOS name client. -- smb.conf ---------------------------- log file = /var/log/samba/log.%m ---------------------------------------- outcome: /var/log/samba/log.______ This problem may be applicable to other macros.
This is by design, for security reasons. Remember the ../../ exploit in earlier samba versions? We don't want a repeat, and the %U macro is the name the user specified, not always a valid name on the system.
Yes, I am aware that there are security risks in the expansion of macros. However, current implementation throws away all the harmless multibyte characters as well as those illegal ascii characters, leaving only the valid ascii characters. I am currently figuring out whether there are potentially dangerous characters in multibyte characters. If they appear to be safe, then there is no need to prohibit the use of multibyte characters, or if they appeared to be dangerous, an apporpriate action should be taken for those dangerous ones. This problem is encoding dependent and you may argue that they should be dealt under iconv(), but let me have a time to do more research on this area.
Have you had a chance to perform promised research?
Here they are. We've focused on to the CJK encodings only, but I guess they are the most problematic languages of all. As you have suggested, some of these characters do contain some dangerous ascii codes within themselves. For CP932 (Japanese), it may contain some ascii codes in its second byte, ranging (0x40-0x7E). These codes also appear in the second byte of Big5 (Taiwanese). 40-7E = @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ The second byte of GB18030 (Chinese) may contain code range of (0x40-0x7E). The fourth byte of GB18030 (Chinese) may also contain range of (0x30-0x39). 40-7E = @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ 30-39 = 0123456789 UHC (Korean) may contain (41-5A, 61-7A) in its second byte. 41-5A = ABCDEFGHIJKLMNOPQRSTUVWXYZ 61-7A = abcdefghijklmnopqrstuvwxyz Reference: ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf
ab, any progess on this one ?
update to 3.0.11 and assigning to jeremy. He seems to think this must work.
*** Bug 2345 has been marked as a duplicate of this bug. ***
(In reply to comment #7) > *** Bug 2345 has been marked as a duplicate of this bug. *** I agree this is strongly related, however my bug was not asking to stop replacing by __ but about the fact that after replacement it does not work (i.e. the replacment seems to not occur everywhere).
got another report of this on the samba ml. http://lists.samba.org/archive/samba/2005-June/106675.html
Cleaning up versions. There was no 3.0.15 so leaving it in bugzilla is causing some confusion. Moving these nuder 3.0.20. Originally files against 3.0.15preX.
This is current;y by design I think.