14315 – rsync hangs when many errors

Bug 14315 - rsync hangs when many errors

Summary: rsync hangs when many errors

Status:	RESOLVED FIXED

Alias:	None

Product:	rsync
Classification:	Unclassified
Component:	core (show other bugs)
Version:	3.1.3
Hardware:	All All

Importance:	P5 normal (vote)
Target Milestone:	---
Assignee:	Wayne Davison
QA Contact:	Rsync QA Contact

URL:
Keywords:

Depends on:
Blocks:

Reported:	2020-03-05 22:31 UTC by Mark Vitale
Modified:	2020-06-08 15:34 UTC (History)
CC List:	0 users

See Also:

Attachments
test program to aid in reproducing the issue (659 bytes, text/x-csrc) 2020-03-05 22:31 UTC, Mark Vitale	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Mark Vitale 2020-03-05 22:31:33 UTC

Created attachment 15843 [details]
test program to aid in reproducing the issue

When performing a local rsync of a large directory (over 10000 files),  it will hang if a large number of errors occur on the target (destination) directory.

I am a support engineer for OpenAFS (openafs.org), and this issue was originally reported by a customer as a possible OpenAFS problem.  This customer observed a hang when rsyncing a large directory into AFS.  I was able to reproduce the problem and demonstrate that the hang is triggered when chown commands, issued by rsync to restore the group of the destination files, failed due to a security feature of AFS that prohibits the owner of a file from changing group ownership.  The large number of resultant errors caused the three rsync processes to stall.

With the help of a colleague, we were able to devise a way to reproduce this hang without requiring an AFS filesystem.  In order to recreate the rsync hang, we need a way to get a large number of errors while performing the rsync from a normal ext4 filesystem.  In our procedure, we simulate these errors by using a small Linux seccomp program to prohibit chgrp/chown syscalls.

1. Login to a linux account that belongs to at least 2 groups.
$ id
uid=1000(mvitale) gid=1000(mvitale) groups=1000(mvitale),10(wheel)

2. Build a program to simulate chown/chgrp errors:
$ sudo yum install libseccomp libseccomp-devel
$ cc -lseccmp seccomp-chown.c -o sec-kill-chown

The source code for seccomp-chown.c is attached to this ticket.

3. Create a large source directory with over 10000 files. 
$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 

These files will all have the group ownership of the user's current group.
Any sufficiently large directory should work; it doesn't have to be a git repo.

4. Switch to the alternate group (starts a new shell)
$ newgrp wheel
$ id
uid=1000(mvitale) gid=10(wheel) groups=10(wheel),1000(mvitale)

5. Enable the error generator (this also starts a new shell)
$ ./sec-kill-chown
Running shell. chown() and friends are now unavailable.

6. Create a target directory and run rsync to duplicate the hang.
$ mkdir target
$ cd target
$ rsync -av --delete --log-file=/tmp/rlog.$$ /home/mvitale/linux ./

This should hang after a few seconds.


7. Exit the two shells (seccomp and newgrp)
$ exit
$ exit


I was able to perform a git bisect to isolate the commit that introduced this hang:

d8587b4 Change the msg pipe to use a real multiplexed IO mode for the data that goes from the receiver to the generator.	

The following releases show the problem:  master, 3.1.3, 3.1.2, 3.1.0
Release 3.0.9 and older do not exhibit the problem.

Each of the following workarounds were successful for my customer and in my testing:
- use an older version of rsync  (3.0.9 or older)
- specify rsync option --msgs2stderr
- perform the rsync under a userid with the same group as the source files

Thanks for your consideration, and please let me know if there's anything else I can provide to help.

Regards,
--
Mark Vitale
mvitale@sinenomine.net

Comment 1 Mark Vitale 2020-03-05 22:37:55 UTC

Sorry, I gave the wrong commit in my report.  I bisected this hang to:

1a2704512a6f6c9bf267042ff8beb50a24e1d057 is the first bad commit
commit 1a2704512a6f6c9bf267042ff8beb50a24e1d057
Author: Wayne Davison <wayned@samba.org>
Date:   Wed Dec 21 08:30:07 2011 -0800

    Improve the handling of verbose/debug messages

Comment 2 Wayne Davison 2020-06-04 22:59:44 UTC

Should be fixed in the latest git version.

Comment 3 Mark Vitale 2020-06-08 15:34:39 UTC

Thank you very much!