Bug 11656 - Escaping broken with --files-from
Summary: Escaping broken with --files-from
Status: RESOLVED WORKSFORME
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.1.1
Hardware: x64 Linux
: P5 normal (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-12-30 22:25 UTC by Gennady Uraltsev
Modified: 2017-10-08 16:48 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gennady Uraltsev 2015-12-30 22:25:12 UTC
The escaping mechanism in --files-from is broken when a file name contains a carriage return. The problem is that a filename 'foo\nbar' gets written as
foo\#012bar by --out-format="%n" but gets transformed into foo\#134#012bar when passing through the --files-from directive.

On my system 
LC_CTYPE=en_US.UTF-8
LANG=en_US.UTF-8

More in general it would be great to have consistent escaping in the output of --out-format for example like with an option of 

ls --quoting-style=<style>

also to deal with space-containing names. Lots of people having problems with that (will post links to serverfault questions when I find them again...)

EXAMPLE:

Make directories 'src' and 'dst' and in 'src' create the file 'foo\nbar'
$ mkdir src; mkdir dst; touch src/"$(echo -e 'foo\nbar')"

Suppose that I want to create a list of files that would copy the file 'foo\nbar' from src to dest via the command
$ rsync -v --files-from=filelist src/ dest

It seems there is no possible way of doing so. In particular one would want the file list generated by the option --out-format="%n" to give the correct file

$ rsync -n --out-format='%n' src/* dst
foo\#012bar

But the following happens

$ rsync -n --out-format='%n' src/* dst | rsync -v --files-from=- src/ dst
building file list ... 
rsync: link_stat "/home/guraltsev/test/src/foo\#134#012bar" failed: No such file or directory (2)
done

sent 16 bytes  received 12 bytes  56.00 bytes/sec
total size is 0  speedup is 0.00
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1165) [sender=3.1.1]
Comment 1 Kevin Korb 2015-12-30 22:29:29 UTC
This is what --from0 is for.
Comment 2 Gennady Uraltsev 2015-12-30 22:41:47 UTC
Actually this doesn't help.

$ mkdir src; mkdir dst; touch src/"$(echo -e 'foo\nbar')"

$ rsync -n --out-format='%n' src/* dst/| tr '\n' '\0' | rsync -v --from0 --files-from=- src/ dst

still fails completely. The problem is that the escaped string 
foo\#012bar gets all mangled up. So I think that --from0 doesn't exactly solve the problem. 

The only way to solve the problem is

$ rsync -n --out-format='%n' src/* dst/| tr '\n' '\0'| sed 's/\\#012/\n/' | rsync -v --from0 --files-from=- src/ dst

but maybe we could agree that this is needlessly complicated. A more consistent way of dealing with this would be great especially bearing in mind that this failure is not documented... 

Finally it is still completely unexpected that 
foo\#012bar
gets changed into 
foo\#134#012bar

Anyway the problem with space delimited strings is more as follows. Immagine I want to parse the log file generated with --out-format='%n %h %M %C'  in a reliable way by some external program. Without consistent and documented escaping there seems no way to do this.
Comment 3 Kevin Korb 2015-12-30 22:45:10 UTC
I am not sure what exactly the point of using an rsync -n to feed an rsync --files-from would be.  The --files-from option is really designed to be fed from find which has a -print0 option which will format things correctly for --from0.
Comment 4 Gennady Uraltsev 2015-12-30 22:50:54 UTC
Well, imagine a poor mans replacement for batch files. We want to generate a list of operations, maybe edit it by hand (a batch file is binary...) and then feed it back to rsync. Or maybe do a dry run, look at it, and then just selectively remove some files. There are many use cases.

Apart from that I argue that the inconsistencies in interpreting escape sequences merit fixing. Ok, maybe it is not top priority but there still is no reason why foo\#012bar becomes foo\#134#012bar.
Comment 5 Kevin Korb 2015-12-30 22:57:07 UTC
I would say that if your goal is to make an editable list to be run through rsync later you would be a lot better off with an --itemize-changes list and a script to reformat it after editing.  I don't know about you but I would hate to have to edit a null terminated text file and I would hate to have to go lookup why a file is in the list without the --itemize-changes output.

Anyway, I think I am done commenting here and will leave this for Wayne to decide if this is really a bug or a use case problem.
Comment 6 Gennady Uraltsev 2015-12-30 23:00:45 UTC
I hope I am not upsetting anyone. Maybe I wasn't clear:
--itemize-changes is half the problem. Maybe I should post another bug. 
In the situation I described 

$ rsync -n --itemize-changes -a src/* dst/ gives:
>f..t...... foo\#012bar

with the new line being escaped in a weird way that cannot be effectively fed back into any kind of other program, not even rsync itself!
Comment 7 Gennady Uraltsev 2015-12-30 23:04:40 UTC
Furthermore consider this test case:
in addition to what we did before create the file with the actual name
aaa\#012bbb by doing

touch 'src/aaa\#012bbb'

then 

$ rsync -n --itemize-changes -a src/* dst/ 
>f+++++++++ aaa\#134#012bbb
>f..t...... foo\#012bar

where the escaping in itemize-change of aaa\#012bbb is completely absurd!
Comment 8 Kevin Korb 2015-12-30 23:08:12 UTC
I was not offended.  I was just trying to establish your use case and offer possible alternative methods of accomplishing it while not actually being an rsync dev.

Wayne is really the only person who can say "Yep, that's a bug" or "Nope, that is how I want it to work."
Comment 9 Gennady Uraltsev 2015-12-30 23:44:55 UTC
I looked through the source code and it seems that whatever is happening is going bad in the function

static void filtered_fwrite in log.c

in particular the line
#134
fprintf(f, "\\#%03o", *(uchar*)s); 

is suspicious. The problem is that #134 is the octal code for the backslash character.
Comment 10 Wayne Davison 2017-10-08 16:48:50 UTC
The consistent and documented escaping is that some characters get ouput with a backslash+hash+3-digit octal number. This includes control chars, some backslashes, and (without -B) high-bit chars.  From the man page:

"The escape idiom that started in 2.6.7 is to output a literal backslash (\) and a hash (#), followed by exactly 3 octal digits.  For example, a newline would output as "\#012".  A literal backslash that is in a filename is not escaped unless it is followed by a hash and 3 digits (0-9)."

One easy way to unescape is thus to filter names through something like this:

    perl -pe 's/\\#(\d\d\d)/chr(oct($1))/eg'

(...after any necessary parsing of the output to find the names or twiddle newlines into nulls).  You'll note that this only matches exactly 3-digits, as rsync will leave something like "\#5" alone in the output, since it cannot be confused with an actual escaped char.