The Samba-Bugzilla – Bug 11656
Escaping broken with --files-from
Last modified: 2017-10-08 16:48:50 UTC
The escaping mechanism in --files-from is broken when a file name contains a carriage return. The problem is that a filename 'foo\nbar' gets written as
foo\#012bar by --out-format="%n" but gets transformed into foo\#134#012bar when passing through the --files-from directive.
On my system
More in general it would be great to have consistent escaping in the output of --out-format for example like with an option of
also to deal with space-containing names. Lots of people having problems with that (will post links to serverfault questions when I find them again...)
Make directories 'src' and 'dst' and in 'src' create the file 'foo\nbar'
$ mkdir src; mkdir dst; touch src/"$(echo -e 'foo\nbar')"
Suppose that I want to create a list of files that would copy the file 'foo\nbar' from src to dest via the command
$ rsync -v --files-from=filelist src/ dest
It seems there is no possible way of doing so. In particular one would want the file list generated by the option --out-format="%n" to give the correct file
$ rsync -n --out-format='%n' src/* dst
But the following happens
$ rsync -n --out-format='%n' src/* dst | rsync -v --files-from=- src/ dst
building file list ...
rsync: link_stat "/home/guraltsev/test/src/foo\#134#012bar" failed: No such file or directory (2)
sent 16 bytes received 12 bytes 56.00 bytes/sec
total size is 0 speedup is 0.00
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1165) [sender=3.1.1]
This is what --from0 is for.
Actually this doesn't help.
$ mkdir src; mkdir dst; touch src/"$(echo -e 'foo\nbar')"
$ rsync -n --out-format='%n' src/* dst/| tr '\n' '\0' | rsync -v --from0 --files-from=- src/ dst
still fails completely. The problem is that the escaped string
foo\#012bar gets all mangled up. So I think that --from0 doesn't exactly solve the problem.
The only way to solve the problem is
$ rsync -n --out-format='%n' src/* dst/| tr '\n' '\0'| sed 's/\\#012/\n/' | rsync -v --from0 --files-from=- src/ dst
but maybe we could agree that this is needlessly complicated. A more consistent way of dealing with this would be great especially bearing in mind that this failure is not documented...
Finally it is still completely unexpected that
gets changed into
Anyway the problem with space delimited strings is more as follows. Immagine I want to parse the log file generated with --out-format='%n %h %M %C' in a reliable way by some external program. Without consistent and documented escaping there seems no way to do this.
I am not sure what exactly the point of using an rsync -n to feed an rsync --files-from would be. The --files-from option is really designed to be fed from find which has a -print0 option which will format things correctly for --from0.
Well, imagine a poor mans replacement for batch files. We want to generate a list of operations, maybe edit it by hand (a batch file is binary...) and then feed it back to rsync. Or maybe do a dry run, look at it, and then just selectively remove some files. There are many use cases.
Apart from that I argue that the inconsistencies in interpreting escape sequences merit fixing. Ok, maybe it is not top priority but there still is no reason why foo\#012bar becomes foo\#134#012bar.
I would say that if your goal is to make an editable list to be run through rsync later you would be a lot better off with an --itemize-changes list and a script to reformat it after editing. I don't know about you but I would hate to have to edit a null terminated text file and I would hate to have to go lookup why a file is in the list without the --itemize-changes output.
Anyway, I think I am done commenting here and will leave this for Wayne to decide if this is really a bug or a use case problem.
I hope I am not upsetting anyone. Maybe I wasn't clear:
--itemize-changes is half the problem. Maybe I should post another bug.
In the situation I described
$ rsync -n --itemize-changes -a src/* dst/ gives:
with the new line being escaped in a weird way that cannot be effectively fed back into any kind of other program, not even rsync itself!
Furthermore consider this test case:
in addition to what we did before create the file with the actual name
aaa\#012bbb by doing
$ rsync -n --itemize-changes -a src/* dst/
where the escaping in itemize-change of aaa\#012bbb is completely absurd!
I was not offended. I was just trying to establish your use case and offer possible alternative methods of accomplishing it while not actually being an rsync dev.
Wayne is really the only person who can say "Yep, that's a bug" or "Nope, that is how I want it to work."
I looked through the source code and it seems that whatever is happening is going bad in the function
static void filtered_fwrite in log.c
in particular the line
fprintf(f, "\\#%03o", *(uchar*)s);
is suspicious. The problem is that #134 is the octal code for the backslash character.
The consistent and documented escaping is that some characters get ouput with a backslash+hash+3-digit octal number. This includes control chars, some backslashes, and (without -B) high-bit chars. From the man page:
"The escape idiom that started in 2.6.7 is to output a literal backslash (\) and a hash (#), followed by exactly 3 octal digits. For example, a newline would output as "\#012". A literal backslash that is in a filename is not escaped unless it is followed by a hash and 3 digits (0-9)."
One easy way to unescape is thus to filter names through something like this:
perl -pe 's/\\#(\d\d\d)/chr(oct($1))/eg'
(...after any necessary parsing of the output to find the names or twiddle newlines into nulls). You'll note that this only matches exactly 3-digits, as rsync will leave something like "\#5" alone in the output, since it cannot be confused with an actual escaped char.