Bug 5162 - using iconv with pre7 chops last special character in filenames
Summary: using iconv with pre7 chops last special character in filenames
Status: CLOSED WORKSFORME
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.0.0
Hardware: x86 Windows XP
: P3 normal (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-12-26 16:24 UTC by Eyal
Modified: 2008-07-26 10:22 UTC (History)
0 users

See Also:


Attachments
patch for finding iconv_open under cygwin (1.46 KB, patch)
2007-12-29 01:26 UTC, Andy Howell
no flags Details
Hebrew testcase (212 bytes, application/x-gzip)
2008-01-01 08:49 UTC, Eyal
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eyal 2007-12-26 16:24:37 UTC
i'm syncing two directories;
src on winxp @ x86
dst on ubunty 7.10 64bit

on my src i have hebrew filenames.
while syncing, filenames are correctly converted to utf8 on the dst (otherwise they are unreadable on the linux machine) however if a filename ends with a hebrew charater it's chopped. 
i will examplify this transform:
let a-z signify english letters
and A-Z signify hebrew letters
src filename -> dst filename
abc          -> abc
ABc          -> ABc
ABC          -> AB
A            -> <empty>

I'm using pre release 7 with --enable-iconv=. on both machines.
on cygwin i had to hardcode #define HAVE_ICONV_OPEN 1 into config.h to get it to comile correctly.

libiconv-1.11

note also that if i use --log-file on the client, the log file shows the correctly transformed filenames, yet they are created chopped on the dst filesystem.
note also that simply copying the files from the xp machine using a samba mount works perfectly.

this error applies both to files and directories.
if a directory was formed malnamed on the dst, then it will not be populated with anyfiles, only with it's subdirs.
Comment 1 Andy Howell 2007-12-29 01:26:49 UTC
Created attachment 3081 [details]
patch for finding iconv_open under cygwin

This fixes the configure script not finding iconv_open under cygwin. Seems to work ok on fedora 7 as well. The AM_CHECK_FUNCS with iconv_open fails because the libiconv header redefines iconv_open to be libiconv_open. AM_CHECK_FUNCS doesn't include iconv.h, so it fails to find the function. I'm not adept at autoconf; there may be a better way to do this.
Comment 2 Wayne Davison 2007-12-29 12:02:42 UTC
Thanks for the configure patch.  I added a slightly changed version to the latest dev version (which you can download via git, rsync or the latest nightly tar file).

To help me debug the problem, can you create a tar file with some filenames that don't convert correctly?  And also specify a name for the source character set (so that I can use an --iconv=SOURCE,utf8 spec for the test) and the command-line you used.
Comment 3 Eyal 2008-01-01 08:49:10 UTC
Created attachment 3084 [details]
Hebrew testcase

I'm not sure how to determine the locale charsets,
i guess windows uses iso88598-8, and linux uses utf8, however i'm not certain about it.
cheers
Comment 4 Eyal 2008-01-01 09:45:28 UTC
I just readlized that specifying explicitly --iconv=HEBREW,UTF-8 solves the issue.
So i guess the error is in the locale autodetection under cygwin and windows. using --iconv=.,UTF-8 still chops letters.


Comment 5 Wayne Davison 2008-01-01 11:22:04 UTC
Thanks for the test file.

If you supply two --verbose options (e.g. -vv) to the rsync command, you'll see the charset info for the client and the server output like this:

client charset: HEBREW
server charset: UTF-8

That will show you what the default_charset() function determined for the transfer when using ".".

Since "HEBREW" worked for you (and indeed, worked fine for me too), this looks like it may be an iconv library bug, but I'd like to double-check that.
Comment 6 Wayne Davison 2008-01-12 11:56:27 UTC
No response to last query, so closing (user has things working).