Bug 3414 - libsmbclient returns charset from server instead UTF-8
libsmbclient returns charset from server instead UTF-8
Product: Samba 3.0
Classification: Unclassified
Component: libsmbclient
x86 Linux
: P3 normal
: none
Assigned To: Derrell Lipman
Samba QA Contact
Depends on:
  Show dependency treegraph
Reported: 2006-01-16 13:11 UTC by Martin Koller
Modified: 2006-01-22 08:24 UTC (History)
0 users

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Martin Koller 2006-01-16 13:11:38 UTC
I have a local samba server, configured with unix charset = ISO8859-1, and a file /temp/x/öoil (Note: the file starts with Umlaut o).
The file itself is correctly named in iso8859-1 on the filesystem.
When I use the following simple test program to read the directory entries, libsmbclient returns the filename in iso8859-1 encoded - but I'd expect to get the names always in UTF-8 encoded as I can not know what the remote server uses; also Stephan Kulow from the KDE project said the following:
"that would be very wrong from libsmbclient as the smb URL RFI defines 

Note that I checked what the server transfers over the wire with ethereal, and this seems correct: f6 00 6f 00 69 00 6c 00  (f6 00 is ö = Umlaut o)

The output of the testprogram is:
. :: 46
.. :: 46
öoil :: 246

#include <stdio.h>
#include <string.h>
#include <libsmbclient.h>

void auth_smbc_get_data(const char *server,const char *share,
                        char *workgroup, int wgmaxlen,
                        char *username, int unmaxlen,
                        char *password, int pwmaxlen)
  strcpy(username, "root");
  strcpy(password, "");

int main()
  char buffer[16384];

  smbc_init(auth_smbc_get_data, 0);

  int dir = smbc_opendir("smb://localhost/temp/x");
  if ( dir < 0 ) return -1;

  struct smbc_dirent *entry;
  while ( entry = smbc_readdir(dir) )
    printf("%s :: %d\n", entry->name, (unsigned char)entry->name[0]);

  return 0;
Comment 1 Derrell Lipman 2006-01-21 20:09:13 UTC
libsmbclient does not itself do any character set conversions.  This means that the conversion is occurring farther down the stack.  I'm researching where this conversion is taking place.

In the mean time, please try the following...  libsmbclient applications read three configuration files: first they look for ${HOME}/.smb/smb.conf.  If that's not found, they read the global smb.conf file (often in /etc/smb.conf).  If the global smb.conf file is found, then they also read ${HOME}/.smb/smb.conf.append to make any local changes to the global configuration.  Given this, please try creating ${HOME}/.smb/smb.conf and add a "unix charset" entry to specify the character set that you'd like conversion to on the client side.  I don't know that this will have any effect, but if it works, it's a work-around for you while we decide what to do as far as changes to libsmbclient.
Comment 2 Martin Koller 2006-01-22 03:18:47 UTC
Wow - didn't know such a file exists.
And really, I had already one (from more than 2 years ago ...) which included already the "unix charset = ISO8859-1".
And indeed, changing this to e.g. UTF-8 leads to delivering the result in this encoding.

So I'm not sure now if this case is now simply a configuration error, or - as Stephan Kulow pointed out - a bug, since it always should return UTF-8 encoded strings.
Comment 3 Derrell Lipman 2006-01-22 08:24:51 UTC
I can definitely see application uses of being able to set the local encoding so I'm not inclined to change it.  I can, however, see an application wanting to override whatever the user happened to have left sitting around in their configuration file, as was the case here.  I'm going to close this bug out, but I may look into how to best expose a function for the application to call after the configuration files have been read, to set certain attributes the way the application always wants them.