Bug 3299 - rsync: now replaces non-ASCII character with numerical values
Summary: rsync: now replaces non-ASCII character with numerical values
Status: CLOSED FIXED
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 2.6.6
Hardware: Other Linux
: P3 normal (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL: http://bugs.debian.org/cgi-bin/bugrep...
Keywords:
Depends on:
Blocks:
 
Reported: 2005-12-04 15:56 UTC by debian
Modified: 2006-03-12 02:56 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description debian 2005-12-04 15:56:52 UTC
Package: rsync
Version: 2.6.6-1
Followup-For: Bug #307242


I've again used two identical Sarge systems, both using UTF-8.

Using rsync (over ssh), syncing (or listing the contents) from one
system to the other, non-ASCII characters get replaced with numerical
values like '\303\245', eg:

  user@system1:~$ rsync system2:~/test_åäö_test
    drwxr-xr-x  72 2005/09/25 01:39:30 test_\303\245\303\244\303\266_test

The changelog states:
--------------------------------------------------------------------------
rsync (2.6.5-1) unstable; urgency=low 
   * Now should handle locale-specific characters better in logging output
     (i.e. the correct chars should be displayed, not '?').
--------------------------------------------------------------------------

This statement is obviously not correct. The '?' has just been replaced
with a numerical value instead. (Almost as useless.)

Is this something that's being worked on upstream? Is there a workaround?
All scripts and programs depending upon the output is almost useless since
several month now.

-- System Information:
Debian Release: 3.1
Architecture: i386 (i686)
Kernel: Linux 2.6.8-2-686-smp
Locale: LANG=sv_SE.UTF-8, LC_CTYPE=sv_SE.UTF-8 (charmap=UTF-8)

Versions of packages rsync depends on:
ii  libc6                       2.3.2.ds1-22 GNU C Library: Shared libraries an
ii  libpopt0                    1.7-5        lib for parsing cmdline parameters

-- no debconf information
Comment 1 Wayne Davison 2006-01-16 21:04:17 UTC
The changelog statement you cited is correct for locales that don't use multibyte encodings (of which UTF-8 is not one).  For instance, rsync outputs all the extended characters from ISO-8859-1 without any mangling.

I've been considering how best to add multibyte support to rsync, and I think that I can leverage the way iconv() works to have it tell me if characters are valid in the current locale.  A patch that does this (along with adding filename conversion support) is here:

http://opencoder.net/iconv.diff

This is still a young patch, so be careful if you decide to give it a try.

The patch applies to the latest CVS source.  See the diff for build and usage instructions.
Comment 2 Wayne Davison 2006-02-06 11:08:40 UTC
The CVS version now handles multibyte locales as long as the local system has iconv() (rsync uses an identity conversion to determine if the characters in a name are valid in the current characterset or not).
Comment 3 Wayne Davison 2006-02-07 05:33:28 UTC
I should also mention that there is now an option that tells rsync that you want it to pass through all high-bit characers unescaped (instead of trying to escape only the invalid ones in the current locale):  --8-bit-output (-8).