Package: rsync Version: 2.6.6-1 Followup-For: Bug #307242 I've again used two identical Sarge systems, both using UTF-8. Using rsync (over ssh), syncing (or listing the contents) from one system to the other, non-ASCII characters get replaced with numerical values like '\303\245', eg: user@system1:~$ rsync system2:~/test_åäö_test drwxr-xr-x 72 2005/09/25 01:39:30 test_\303\245\303\244\303\266_test The changelog states: -------------------------------------------------------------------------- rsync (2.6.5-1) unstable; urgency=low * Now should handle locale-specific characters better in logging output (i.e. the correct chars should be displayed, not '?'). -------------------------------------------------------------------------- This statement is obviously not correct. The '?' has just been replaced with a numerical value instead. (Almost as useless.) Is this something that's being worked on upstream? Is there a workaround? All scripts and programs depending upon the output is almost useless since several month now. -- System Information: Debian Release: 3.1 Architecture: i386 (i686) Kernel: Linux 2.6.8-2-686-smp Locale: LANG=sv_SE.UTF-8, LC_CTYPE=sv_SE.UTF-8 (charmap=UTF-8) Versions of packages rsync depends on: ii libc6 2.3.2.ds1-22 GNU C Library: Shared libraries an ii libpopt0 1.7-5 lib for parsing cmdline parameters -- no debconf information
The changelog statement you cited is correct for locales that don't use multibyte encodings (of which UTF-8 is not one). For instance, rsync outputs all the extended characters from ISO-8859-1 without any mangling. I've been considering how best to add multibyte support to rsync, and I think that I can leverage the way iconv() works to have it tell me if characters are valid in the current locale. A patch that does this (along with adding filename conversion support) is here: http://opencoder.net/iconv.diff This is still a young patch, so be careful if you decide to give it a try. The patch applies to the latest CVS source. See the diff for build and usage instructions.
The CVS version now handles multibyte locales as long as the local system has iconv() (rsync uses an identity conversion to determine if the characters in a name are valid in the current characterset or not).
I should also mention that there is now an option that tells rsync that you want it to pass through all high-bit characers unescaped (instead of trying to escape only the invalid ones in the current locale): --8-bit-output (-8).