Bug 3422 - dry run fails when encountering dangling symbolic link
Summary: dry run fails when encountering dangling symbolic link
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 2.6.4
Hardware: All Linux
: P3 normal (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
Depends on:
Reported: 2006-01-18 03:30 UTC by Peter Englmaier
Modified: 2006-03-12 02:56 UTC (History)
0 users

See Also:

Makes receiving rsync process deletions only in directories (363 bytes, patch)
2006-01-18 20:38 UTC, Matt McCutchen
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Englmaier 2006-01-18 03:30:52 UTC

I use rsync for backup and once in a while somebody replaces a symbolic link
with the contents of the file. In this case, I find that dry-run will
report errors while rsync without -n flag works fine.
A test case can be generated like this:

mkdir old
cd old
mkdir data
ln -s data link
cd ..
mkdir new
mkdir new/link
rsync -n -arlpogt --delete new/ old/

The rsync output will be:
building file list ... done
deleting data/
deleting link

Now when we delete data before, rsync fails:
rmdir old/data
rsync -n -arlpogt --delete new/ old/
building file list ... done
deleting link
rsync: opendir "/data1/ppe/tmp/old/link" failed: No such file or directory (2)

Rsync fails with return code 23, however, if you remove the '-n' flag,
rsync is able to sync and does not report errors. I think this is an error,
because dry-run should just report that the symbolic link is updated, i.e.:

building file list ... done

and return no error code. For me this is also a problem, because I build
scripts around rsync and I use dry-run for testing what will be done.
The script makes than a backup of the files which would be deleted. But
when dry-run fails, the script does not know whether its save to continue.
Best, Peter.
Comment 1 Matt McCutchen 2006-01-18 19:47:25 UTC
I reproduced this bug with my custom rsync 2.6.6.matt.7.  According to verbose output, delete_in_dir("link") seems to be getting called erroneously on the receiver.  I'm trying to find the cause.
Comment 2 Matt McCutchen 2006-01-18 20:38:15 UTC
Created attachment 1702 [details]
Makes receiving rsync process deletions only in directories

I found the problem.  The receiving rsync performs deletions by scanning the file list from the sender for directories.  For each directory the sender sent, the receiver looks for something at the same path in the destination.  If the receiver finds something, it assumes that something is a corresponding directory and processes deletions inside it; if the receiver finds nothing, it just moves on.

That assumption is OK most of the time.  If one runs Peter's second test case without --dry-run, rsync deletes the symlink "link" on the grounds that it is about to be replaced by a directory; when it goes to process deletions in the directory "link" that the sender named, it finds nothing there on the destination.  But with --dry-run, "link" gets fake-deleted from the receiver.  delete_in_dir("link") is called, sees something by the name of "link", incorrectly assumes it is a directory, and tries to process deletions.

That no error is produced when the symlink is valid is immaterial.  When rsync is completely in link-aware mode (--links and not --keep-dirlinks), it should treat symlinks simply as files of a special kind that contain strings; it should never issue a system call that follows a symlink.

I wrote a very simple patch that makes delete_in_dir try to delete only in directories.  That fixes the problem, but I don't know if it's the right way to fix the problem.
Comment 3 Peter Englmaier 2006-01-19 00:46:21 UTC
(In reply to comment #2)

This sounds like a good solution. Thanks, I didn't expect a response so
quickly! Shall I test this out with the latest rsync? 
You already did some testing, but perhaps my test might catch some side-effects.
I have about 3 Terrabytes of data to 'rsync' over night. For my test, I would take the latest official source rpm and upgrade it to 2.6.6 plus your patch.
Or, is there a test suite for rsync?
Comment 4 Matt McCutchen 2006-01-19 06:57:31 UTC
Yes, please do take a recent source RPM or source tarball and try the patch.  If you use rpmbuild, you can get it to apply the patch automatically after unpacking the source tarball that comes with the SRPM.  Put the patch in SOURCES/ of your RPM-building arena, and name it in rsync.spec like this:
    Patch0: delete-only-in-directories.diff
Then insert after the %setup line:
    %patch0 -p0
There's a testsuite that you can run with "make test" in a built rsync source tree, either one you make manually or the one rpmbuild makes in BUILD/.  That will check that the patch doesn't break other features.  But this might be putting the carriage before the horse, as I'm still hoping Wayne will weigh in as to whether my patch is the right way to fix the problem.

I think a "dry-run" test case should be added that tests the dry run feature comprehensively on -vv verbosity and makes sure it produces the same output as a real run in a number of borderline cases, such as the one in this bug.
Comment 5 Peter Englmaier 2006-01-19 07:46:42 UTC
(In reply to comment #4)

OK, I will do that (have done rpm's before), but it will take some time, as I'm rather busy at the moment (~2 weeks). This way, Wayne also has some more time to respond. This bug certainly is not a critical one, but should be fixed. 
Best, Peter.

Comment 6 Wayne Davison 2006-01-19 12:03:04 UTC
As Matt discovered, the bug is simply that when rsync is in a dry-run scenario, it may not have deleted an in-the-way file/symlink/device by the time the delete code decides to check if a sender-side directory exists on the receiving side.

Thanks for the patch!  It's the right fix, and I've checked it into CVS for the upcoming 2.6.7.
Comment 7 Peter Englmaier 2006-01-19 12:43:37 UTC
(In reply to comment #6)
Great, so I do not need to do the testing. Thanks to all
for the effort.