Bug 10372 - rsync 3.10 error in protocol data stream while rsync 3.0.9 runs through
Summary: rsync 3.10 error in protocol data stream while rsync 3.0.9 runs through
Status: RESOLVED FIXED
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.1.0
Hardware: x64 Linux
: P5 normal (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
: 10332 10497 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-01-12 16:27 UTC by Harvey
Modified: 2020-07-26 08:18 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Harvey 2014-01-12 16:27:39 UTC
While transferring a big file over the network to my backup server (13G virtualbox disk image) with the following command:

rsync --delete-during --inplace --progress -avzE 

rsync 3.1.0 fails with:

rsync error: error in rsync protocol data stream (code 12)

The exact same command and file run through with no problems when using rsync 3.0.9 as the sender. Receiver is rsync 3.1.0 in both cases.

Other rsync commands (this is in a backup script) in the same run complete without problems. Seems only the big file is affected. There is enough space on the receiver side disk (66G) so this is not the culprit. The network is a 1GB line so this should not be a problem too. Neither dmesg nor journalctl show any relevant entries.
Comment 1 Wayne Davison 2014-01-19 22:24:23 UTC
You might try "--msgs2stderr --debug=all5" to get some debug output indicating what might be failing.  See also the web site's debug page for info on how to get more debug info (e.g. stracing, capturing the protocol's data streams for later analysis).
Comment 2 Harvey 2014-01-20 09:18:58 UTC
I added --msgs2stderr --debug=all5 to my command line and here is the result...

After a lot of chunk[xxx] entries this is the end of the output:

chunk[245] of size 115440 at 28282800 offset=28282800
[sender] perform_io(4, consume&input)
[sender] got msg=2, len=54
[sender] perform_io(54, consume&input)
chunk[246] of size 115440 at 28398240 offset=28398240
[sender] perform_io(4, consume&input)
[sender] got msg=2, len=54
[sender] perform_io(54, consume&input)
chunk[247] of size 115440 at 28513680 offset=28513680
[sender] _exit_cleanup(code=10, file=io.c, line=837): entered
rsync error: error in rsync protocol data stream (code 12) at io.c(837) [sender=3.1.0]
[sender] _exit_cleanup(code=10, file=io.c, line=837): about to call exit(12)

Does this help in any way?
Comment 3 rudy.metzger 2014-03-14 12:58:28 UTC
I can confirm this bug. I am using

rsync -rltDvz --stats -e ssh "$BACULA_LOCAL/" "$RSYNCTO:$BACULA_REMOTE" \
      1>>$LOGFILE 2>>$ERRFILE

which results in this error

rsync: [sender] write error: Broken pipe (32)
rsync error: error in rsync protocol data stream (code 12) at io.c(837) [sender=3.1.0]

I am using Fedora 20 and the exact rsync version is:

Name        : rsync
Version     : 3.1.0
Release     : 2.fc20
Architecture: x86_64
Install Date: Thu 13 Mar 2014 14:03:52 CET
Group       : Applications/Internet
Size        : 736048
License     : GPLv3+
Signature   : RSA/SHA256, Thu 09 Jan 2014 17:50:06 CET, Key ID 2eb161fa246110c1
Source RPM  : rsync-3.1.0-2.fc20.src.rpm
Build Date  : Sun 20 Oct 2013 19:57:42 CEST
Build Host  : buildvm-14.phx2.fedoraproject.org
Relocations : (not relocatable)
Packager    : Fedora Project
Vendor      : Fedora Project
URL         : http://rsync.samba.org/
Summary     : A program for synchronizing files over a network

Additional info: It indeed only seems to happen at large files. And it only seems to happen when compression is enabled (-z). It does not happen with rsync 3.0.9. Could it have something to do with inclusion of zlib in rsync 3.1.0 ?
Comment 4 Harvey 2014-03-14 17:33:15 UTC
I can confirm that the error only raises if compression is invoked. Doing the same file transfer without -z in the commandline runs without errors.
Comment 5 Wayne Davison 2014-03-14 23:25:27 UTC
Please check if the version you're testing links against zlib or includes the zlib that ships with rsync.  I'd imagine that it is trying to use the system's zlib, and the code that tries to use that in a compatible manner may not be working quite right.  I'm betting that the included zlib would work where the external zlib fails.
Comment 6 Wayne Davison 2014-03-14 23:26:51 UTC
*** Bug 10497 has been marked as a duplicate of this bug. ***
Comment 7 rudy.metzger 2014-03-15 08:22:31 UTC
(In reply to comment #5)
> Please check if the version you're testing links against zlib or includes the
> zlib that ships with rsync.  I'd imagine that it is trying to use the system's
> zlib, and the code that tries to use that in a compatible manner may not be
> working quite right.  I'm betting that the included zlib would work where the
> external zlib fails.

I do not know how I can test this, but the output of ldd shows this:

[root@myhost ~]# ldd /bin/rsync
	linux-vdso.so.1 =>  (0x00007ffffa9fe000)
	libacl.so.1 => /lib64/libacl.so.1 (0x0000003f50000000)
	libz.so.1 => /lib64/libz.so.1 (0x0000003f36600000)
	libpopt.so.0 => /lib64/libpopt.so.0 (0x0000003f4a000000)
	libc.so.6 => /lib64/libc.so.6 (0x0000003f35600000)
	libattr.so.1 => /lib64/libattr.so.1 (0x0000003f4a400000)
	/lib64/ld-linux-x86-64.so.2 (0x0000003f35200000)

As said, I am using the version provided by Fedora 20:
rsync-3.1.0-2.fc20.x86_64

The package can be found here:
http://mirror.i3d.net/pub/fedora/linux/updates/20/x86_64/rsync-3.1.0-2.fc20.x86_64.rpm

The source RPM is at:
http://mirror.1000mbps.com/fedora/linux/updates/20/SRPMS/rsync-3.1.0-2.fc20.src.rpm
Comment 8 Harvey 2014-03-15 12:03:16 UTC
I don't know how to test that either. I am on archlinux, the PKGBUILD is here:
https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=packages/rsync

The configure statement is as follows:

./configure --prefix=/usr \
		--with-included-popt=no \
		--with-included-zlib=no \
		--disable-debug

See the zlib part?
Comment 9 Harvey 2014-03-15 12:05:23 UTC
Additional comment: Archlinux uses zlib 1.2.8
https://www.archlinux.org/packages/core/x86_64/zlib/
Comment 10 Orion Poplawski 2014-03-18 15:01:33 UTC
I get the same failure using the provided zlib in 3.1.1pre1.
Comment 11 Orion Poplawski 2014-03-18 15:51:41 UTC
I take that back, this is caused by using the system zlib in Fedora 20.  Still poking around.
Comment 12 Orion Poplawski 2014-03-18 16:15:18 UTC
So it appears that Fedora did not switch to using the system zlib fully until 3.1.0 in Fedora 20.  Prior to that it had kind of a hybrid situation - it compiled using the system zlib.h header (without the Z_INSERT_ONLY define), but still compiled in the bundled zlib/*.c code.  With the release of 3.1.0 and the --with-included-zlib option it now uses that, but apparently that breaks things, so it looks like that support does not work yet.
Comment 13 Michal Luscon (mail bounces back) 2014-04-15 15:11:51 UTC
I can confirm that this issue is caused by compiling rsync with system provided zlib in F20. Archlinux package doesn't contain zlib in dependencies,and therefore I suppose the package has been compiled with included zlib, regardless --with-included-zlib=no option.
Comment 14 Wayne Davison 2014-04-19 19:27:42 UTC
I have just added a new-style compression to rsync that is compatible with an external zlib.  This is currently requested by repeating the --compress option (-zz), but will eventually become the default.  If a distribution decides to configure rsync using --with-included-zlib=no then they will only get support for the new-style compression.  In such an rsync, attempting to use -z will give a warning and run w/o compression (though a server-side rsync must reject old-style compression if it doesn't support it).

Hopefully fedora will be comfortable going back to the included zlib in rsync for a while longer in order to give people time to get 3.1.1 more widely supported before disabling support for the old-style compression.

In either case at least rsync will not fail in strange ways.
Comment 15 Wayne Davison 2020-07-26 08:18:13 UTC
*** Bug 10332 has been marked as a duplicate of this bug. ***