Bug 6829 - smbclient does not show special characters properly
Summary: smbclient does not show special characters properly
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.4
Classification: Unclassified
Component: Client Tools (show other bugs)
Version: 3.4.2
Hardware: Other Linux
: P3 regression
Target Milestone: ---
Assignee: Jeremy Allison
QA Contact: Samba QA Contact
URL:
Keywords:
: 6852 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-10-20 04:55 UTC by Karolin Seeger
Modified: 2009-10-27 11:11 UTC (History)
1 user (show)

See Also:


Attachments
git-am format patch for 3.4.3. (2.14 KB, patch)
2009-10-22 17:16 UTC, Jeremy Allison
no flags Details
git-am format patch for 3.4.3. (2.08 KB, patch)
2009-10-22 17:33 UTC, Jeremy Allison
jmcd: review+
Details
git-am format patch for 3.3.10. (1.76 KB, patch)
2009-10-22 17:37 UTC, Jeremy Allison
jmcd: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Karolin Seeger 2009-10-20 04:55:14 UTC
When using special characters in share comments, smbclient does not show these properly. Commit 485c0baef broke this (tracked down with git bisect).

How to reproduce:

[tmp]
path = /tmp
comment = é ê è

user@host:~> smbclient -L localhost -N
Anonymous login successful
Domain=[SAMBA] OS=[Unix] Server=[Samba 3.4.2]

        Sharename       Type      Comment
        ---------       ----      -------
        IPC$            IPC       IPC Service (Samba 3.4.2)
        tmp             Disk      i j h

All 3.4 versions are affected, Samba 3.3 version are not.
Comment 1 Karolin Seeger 2009-10-20 05:27:09 UTC
Sorry, it was not commit 485c0baef that introduced this issue.
Comment 2 Jeremy Allison 2009-10-20 19:17:17 UTC
I think this is a major regression we must fix before 3.4.3 ship.
I'll investigate.
Jeremy.
Comment 3 Karolin Seeger 2009-10-21 03:06:42 UTC
I reran the git bisect and again, the result was that 485c0baef seems to be the culprit. But according to the commit message, it does not seem to be very likely that this commit breaks it. What do you think?
Comment 4 Björn Jacke 2009-10-21 07:51:28 UTC
this is a display only thing of smbclient and no regression from previous 3.4 releases, so this should not hold back 3.4.3 with a bunch of really important fixes.
Comment 5 Jeremy Allison 2009-10-21 11:24:51 UTC
My worry isn't about smbclient. It's about what other non-ascii character set conversion is also broken (possibly in smbd). IMHO we need to investigate this.
Jeremy.
Comment 6 Jeremy Allison 2009-10-21 13:23:43 UTC
Karolin, I'm trying to reproduce this with my latest 3.4.3 build and can't.

I have set:

export LANG=en_US.utf8

and my gnome terminal set to : "View" -> "Set Character Encoding" -> UTF8

with no "unix charset" setting in my smb.conf (which defaults to utf8).

I get:

bin/smbclient -L localhost -N
Anonymous login successful
Domain=[WINTEST-SAMBA] OS=[Unix] Server=[Samba 3.4.3-GIT-78ba2e1-test]

	Sharename       Type      Comment
	---------       ----      -------
	chartest        Disk      é ê è


Can you give me a better idea of how to reproduce please ? I also can't find the git revision 485c0baef in the git logs for v3-4-test.

Jeremy.
Comment 7 Jeremy Allison 2009-10-21 13:24:44 UTC
I also correctly see the comment from a W2K3R2 client, so it isn't affecting the over-the-wire transport. If there's a bug it must be in smbclient.

Jeremy.
Comment 8 Karolin Seeger 2009-10-22 01:55:13 UTC
It's 6be4bf17de in v3-4-test (485c0baef in v3-4-stable).

Downgrading to any 3.3 version immediately fixes the issue without any other changes.

host:~ # smbd -V
Version 3.4.2

host:~ # locale
LANG=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=de_DE.UTF-8

Terminal encoding UTF-8

host:~ # cat /etc/samba/smb.conf
[global]
        workgroup = samba
        security  = user

[tmp]
        path = /tmp
        comment = é ê è

host:~ # testparm -sv | grep charset
Load smb config files from /etc/samba/smb.conf
Processing section "[tmp]"
Loaded services file OK.
Server role: ROLE_STANDALONE
        dos charset = ASCII
        unix charset = UTF8
        display charset = LOCALE

host:~ # smbclient -L localhost -N
Anonymous login successful
Domain=[SAMBA] OS=[Unix] Server=[Samba 3.4.2]

        Sharename       Type      Comment
        ---------       ----      -------
        tmp             Disk      i j h
        IPC$            IPC       IPC Service (Samba 3.4.2)
Anonymous login successful

Will retry with current v3-4-test now.
Comment 9 Karolin Seeger 2009-10-22 01:58:14 UTC
The client side is affected, not the server side. Running a 3.3 smbclient against the 3.4.2 server works fine. 
Comment 10 Karolin Seeger 2009-10-22 03:53:38 UTC
Current v3-4-test and master branches do behave in the same way as 3.4.2 does.

Can confirm that Windows clients display the characters properly.
Comment 11 Jeremy Allison 2009-10-22 11:45:04 UTC
Ok, finally reproduced the problem with exactly your settings - will get a fix for this asap. Thanks !
Jeremy.
Comment 12 Jeremy Allison 2009-10-22 12:08:39 UTC
Ok, the problem seems to be the :

dos charset = ASCII

I have to explicitly set this in my smb.conf to get "dos charset = ASCII" in the testparm output. If I remove it, I get:

dos charset = CP850

instead, at which point the output from smbclient is displayed correctly.

This makes sense, as the listing of share enums from smbclient is done over an old call that uses the DOS charset as the transport, and when iconv converts from utf8 to ASCII it will drop the 0x80 bit.

Leading to:

      CP850           ASCII (equal to CP850 without &  ~0x80)
-------------------------------------
é ==  0xE9            0x69 == i

ê ==  0xEA            0x6A == j

è ==  0xE8            0x68 == h

The strange thing is that Windows clients are working correctly when testparm claims the DOS charset is ascii. My Windows client in cmd.exe console mode has a code page of CP437(US), which has the same values for é ê è as CP850 (Western European) so that's why it works here to Windows.

What system are you testing this on ? It seems strange that your default DOS charset is ASCII with that smb.conf file. If I use *exactly* that smb.conf file with my recent build of v3-4-test I get:

bin/testparm -sv | grep charset

Load smb config files from /usr/local/samba3.4//lib/smb.conf
Processing section "[tmp]"
Loaded services file OK.
Server role: ROLE_STANDALONE
	dos charset = CP850
	unix charset = UTF-8
	display charset = LOCALE


Jeremy.
Comment 13 Jeremy Allison 2009-10-22 12:16:06 UTC
Yep, looking in param/loadparm.c I find:

init_globals()

4854         /* Use codepage 850 as a default for the dos character set */
4855         string_set(&Globals.dos_charset, DEFAULT_DOS_CHARSET);

and in include/config.h

include/config.h:#define DEFAULT_DOS_CHARSET "CP850"

ah.... include/config.h - where is this coming from (must be part of the configure) ?

Jeremy.

Comment 14 Jeremy Allison 2009-10-22 12:20:21 UTC
Ok, this is it. Looks like it's not finding a working iconv on your system.

From configure.in

2309         # At this point, we have a libiconv candidate. We know that
2310         # we have the right headers and libraries, but we don't know
2311         # whether it does the conversions we want. We can't test this
2312         # because we are cross-compiling. This is not necessarily a big
2313         # deal, since we can't guarantee that the results we get now will
2314         # match the results we get at runtime anyway.
2315         if test x"$samba_cv_HAVE_NATIVE_ICONV" = x"cross" ; then
2316             default_dos_charset="CP850"
2317             default_display_charset="ASCII"
2318             default_unix_charset="UTF-8"
2319             samba_cv_HAVE_NATIVE_ICONV=yes
2320             AC_MSG_WARN(assuming the libiconv in $iconv_current_LDFLAGS can convert)
2321             AC_MSG_WARN([$default_dos_charset, $default_display_charset and $default_unix_charset to UCS-16LE])
2322         fi
2323 
2324         if test x"$samba_cv_HAVE_NATIVE_ICONV" = x"yes" ; then
2325 
2326             CPPFLAGS=$save_CPPFLAGS
2327             LDFLAGS=$save_LDFLAGS
2328             LIBS=$save_LIBS
2329 
2330             if test x"$iconv_current_LIBS" != x; then
2331                 LIBS="$LIBS $iconv_current_LIBS"
2332             fi
2333 
2334             # Add the flags we need to CPPFLAGS and LDFLAGS
2335             CPPFLAGS="$CPPFLAGS $iconv_current_CPPFLAGS"
2336             LDFLAGS="$LDFLAGS $iconv_current_LDFLAGS"
2337 
2338             # Turn the #defines into string literals
2339             default_dos_charset="\"$default_dos_charset\""
2340             default_display_charset="\"$default_display_charset\""
2341             default_unix_charset="\"$default_unix_charset\""
2342 
2343             AC_DEFINE(HAVE_NATIVE_ICONV,1,[Whether to use native iconv])
2344             AC_DEFINE_UNQUOTED(DEFAULT_DOS_CHARSET,$default_dos_charset,[Default dos charset name])
2345             AC_DEFINE_UNQUOTED(DEFAULT_DISPLAY_CHARSET,$default_display_charset,[Default display charset name])
2346             AC_DEFINE_UNQUOTED(DEFAULT_UNIX_CHARSET,$default_unix_charset,[Default unix charset name])
2347 
2348            break
2349         fi
2350 
2351     # We didn't find a working iconv, so keep going
2352     fi
2353 
2354     #  We only need to clean these up here for the next pass through the loop
2355     CPPFLAGS=$save_CPPFLAGS
2356     LDFLAGS=$save_LDFLAGS
2357     LIBS=$save_LIBS
2358     export LDFLAGS LIBS CPPFLAGS
2359 done
2360 unset libext
2361 
2362 
2363 if test x"$ICONV_FOUND" = x"no" -o x"$samba_cv_HAVE_NATIVE_ICONV" != x"yes" ; then
2364     AC_MSG_WARN([Sufficient support for iconv function was not found.
2365     Install libiconv from http://freshmeat.net/projects/libiconv/ for better charset compatibility!])
2366    AC_DEFINE_UNQUOTED(DEFAULT_DOS_CHARSET,"ASCII",[Default dos charset name])
2367    AC_DEFINE_UNQUOTED(DEFAULT_DISPLAY_CHARSET,"ASCII",[Default display charset name])
2368    AC_DEFINE_UNQUOTED(DEFAULT_UNIX_CHARSET,"UTF8",[Default unix charset name])
2369 fi

note that you're only getting DEFAULT_DOS_CHARSET == "ASCII" on the condition 

if test x"$ICONV_FOUND" = x"no" -o x"$samba_cv_HAVE_NATIVE_ICONV" != x"yes

which means ICONV_FOUND == no, or HAVE_NATIVE_ICONV == no

Looks like iconv configure is broken on the system you're building on.

As you're the one building our binaries, I still think this is a blocker :-), but you need to look at your build environment very carefully.

Jeremy.
Comment 15 Jeremy Allison 2009-10-22 17:16:10 UTC
Created attachment 4876 [details]
git-am format patch for 3.4.3.

All successful calls to cli_session_setup() *must* be followed by
    calls to cli_init_creds() to stash the credentials we successfully
connected with. There were 2 codepaths where this was missing. This
caused smbclient to be unable to open the \srvsvc pipe to do an RPC
netserverenum, and cause it to fall back to a RAP netserverenum,
which uses DOS codepage conversion rather than the full UCS2 of
RPC, so the returned characters were not correct (unless the DOS
codepage was set correctly). Phew. That was fun to track down :-).

Jim will review and confirm the bug fix tomorrow.

Jeremy.
Comment 16 Jeremy Allison 2009-10-22 17:33:41 UTC
Created attachment 4877 [details]
git-am format patch for 3.4.3.

Replace previous attachment.

All successful calls to cli_session_setup() *must* be followed by
calls to cli_init_creds() to stash the credentials we successfully
connected with. There were 2 codepaths where this was missing. This
caused smbclient to be unable to open the \srvsvc pipe to do an RPC
netserverenum, and cause it to fall back to a RAP netserverenum,
which uses DOS codepage conversion rather than the full UCS2 of
RPC, so the returned characters were not correct (unless the DOS
codepage was set correctly). Phew. That was fun to track down :-).

Contains logic simplification change for the libsmb_server.c
part of the patch also.

Jim will review and confirm the bug fix tomorrow.

Jeremy.
Comment 17 Jeremy Allison 2009-10-22 17:37:35 UTC
Created attachment 4878 [details]
git-am format patch for 3.3.10.

Same patch for 3.3.10.
Jeremy.
Comment 18 Karolin Seeger 2009-10-23 02:29:42 UTC
Thanks a lot, Jeremy! :-)

The 3.4 patch fixes the issue on my box! :-)
v3-3-test is working fine without the patch. Is it needed anyway?
Comment 19 Jeremy Allison 2009-10-23 12:45:03 UTC
Yes it needs it (IMHO). It's only working right now as the username/domainname are fstrings in 3.3 and so will never be null - but they still should be set correctly on setssionsetup, else this may cause other bugs.
Jeremy.
Comment 20 Karolin Seeger 2009-10-26 03:00:43 UTC
Pushed to v3-3-test and v3-4-test.
Closing out bug report.

Thanks!
Comment 21 Volker Lendecke 2009-10-27 11:11:06 UTC
*** Bug 6852 has been marked as a duplicate of this bug. ***