Bug 3825 - rsync won't delete directory with excluded files
Summary: rsync won't delete directory with excluded files
Status: RESOLVED FIXED
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 2.6.9
Hardware: x86 Linux
: P3 normal (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-06-06 15:23 UTC by John Van Essen
Modified: 2006-11-11 11:30 UTC (History)
0 users

See Also:


Attachments
Proposed changes to deletion, filters, etc. (25.60 KB, patch)
2006-06-11 15:00 UTC, Matt McCutchen
no flags Details
Improved patch that itemizes _changes_ and updates acls.diff (42.69 KB, patch)
2006-06-22 10:44 UTC, Matt McCutchen
no flags Details
Proposed changes to deletion, filters, etc.; revised to apply cleanly (38.85 KB, patch)
2006-07-22 16:00 UTC, Matt McCutchen
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description John Van Essen 2006-06-06 15:23:31 UTC
A full rsync was initially done.  Later, some --exclude rules for temporary and scratch files were added to the backup script.

If a directory contains a file that is now being excluded and the source directory is deleted, the target directory will be cleaned up but the excluded file will be left behind and the directory won't be deleted.

Furthermore, -n/--dry-run shows that rsync *wants* to delete the directory.

This behavior exists in 2.6.8 and in 2.6.3 (didn't test any others).

The desired behavior is that a directory deleted on the source should be deleted on the target even if it has excluded files in it (i.e. imply --delete-excluded just for the deleted hierarchy).

(I can't use the --delete-excluded option in the script because this backup is done is several pieces, with the top-level backup having excludes for the hierarchies done separately.)

Demo script:

  rm    -rf /tmp/rsync_src /tmp/rsync_dst
  mkdir     /tmp/rsync_src /tmp/rsync_dst
  mkdir     /tmp/rsync_src/dir
  echo txt >/tmp/rsync_src/dir/file.txt
  echo tmp >/tmp/rsync_src/dir/file.tmp
  echo
  echo '$' \
  rsync -va /tmp/rsync_src/ /tmp/rsync_dst/
  rsync -va /tmp/rsync_src/ /tmp/rsync_dst/
  rm    -rf /tmp/rsync_src/dir
  echo
  echo '$' \
  rsync --exclude=file.tmp --delete -van /tmp/rsync_src/ /tmp/rsync_dst/
  rsync --exclude=file.tmp --delete -van /tmp/rsync_src/ /tmp/rsync_dst/
  echo
  echo '$' \
  rsync --exclude=file.tmp --delete -va  /tmp/rsync_src/ /tmp/rsync_dst/
  rsync --exclude=file.tmp --delete -va  /tmp/rsync_src/ /tmp/rsync_dst/

Demo script output:

$ rsync -va /tmp/rsync_src/ /tmp/rsync_dst/
building file list ... done
dir/
dir/file.tmp
dir/file.txt

sent 214 bytes  received 70 bytes  568.00 bytes/sec
total size is 8  speedup is 0.03

$ rsync --exclude=file.tmp --delete -van /tmp/rsync_src/ /tmp/rsync_dst/
building file list ... done
deleting dir/file.txt
deleting dir/                  <--------------------------<<<<<<

sent 70 bytes  received 20 bytes  180.00 bytes/sec
total size is 0  speedup is 0.00

$ rsync --exclude=file.tmp --delete -va /tmp/rsync_src/ /tmp/rsync_dst/
building file list ... done
deleting dir/file.txt

sent 70 bytes  received 20 bytes  180.00 bytes/sec
total size is 0  speedup is 0.00
Comment 1 Matt McCutchen 2006-06-06 16:02:23 UTC
The --exclude filter is equivalent to a sender hide filter plus a receiver protect filter, and the receiving rsync is correct to block deletion of a directory because the directory contains protected files.

Evidently you want some of your exclude patterns to behave as a hide only (temporary files) and others to behave as a hide plus a protect (other sections of the backup).  Either use --filter="H file.tmp" instead of --exclude=file.tmp, or use --delete-excluded and use --filter="P /other-backup-area" in addition to --exclude=/other-backup-area .

Now the fact that --dry-run itemize output differs from actual itemize output _is_ a bug, in my opinion.  I think it would be most useful if, in either mode, rsync itemized a directory that it would delete if it weren't for protected files as follows:

*pinned    dir/
Comment 2 John Van Essen 2006-06-10 02:55:51 UTC
My problem is the target content is unexpectedly different from the source and rsync doesn't warn about the difference.  If rsync would emit a warning about not being able to delete a directory due to its hierarchy containing an excluded or protected file, that would work for me.  Thus notified, I would then clean up the target side that wasn't supposed to have had the excluded file in the first place.

And if the preservation on the target of the containing directory that was deleted on the source is, indeed, desired, the warning can be silenced by adding an exclude for the containing directory.
Comment 3 Matt McCutchen 2006-06-10 12:09:12 UTC
I think using hide filters (like --filter="H *.tmp") for temporary and scratch files will accomplish what you want.  If some such files accidentally get copied to the receiver and then you add a hide filter matching them, rsync will delete the files from the receiver the next time it runs.  This is probably easier than having rsync warn you so that you can clean the files up by hand.

Would itemizing pinned directories as "*pinned" provide enough of a warning for the cases in which you don't want the excluded file inside deleted automatically?
Comment 4 Wayne Davison 2006-06-11 10:48:20 UTC
Something like this has come up before in the context of excluding an .svn directory that shouldn't be transferred or or removed, unless the parent directory also went away.  I'm thinking that a filter option for exclude could indicate that the rule should only apply inside a transfer directory.  Perhaps this syntax:

--filter="-t *.tmp"

As for the lack of a complaint when rsync cannot delete a directory because it is not empty, that is the current design, but we should make a way for rsync to generate a warning.
Comment 5 Matt McCutchen 2006-06-11 11:59:39 UTC
I am working on a patch to fix a bunch of infelicities related to deletion, including all the ones discussed here.  I passed a variable through get_dirlist all the way to make_file so that the latter can report a protected file and rsync won't even try to rmdir the pinned parent directory; make_file also produces double-verbose messages about pinned directories.  At the same time it was convenient to add support for --filter="-p .svn" (p for parent-protect), but obviously I will use "t" instead of "p" if Wayne prefers.

Once I have made some more progress, I will post my patch here.  As always, there may be bugs and it might not be the right way to accomplish what we want, but it should be a good start.
Comment 6 Matt McCutchen 2006-06-11 15:00:29 UTC
Created attachment 1955 [details]
Proposed changes to deletion, filters, etc.

This patch adds:
- support for --filter="Pp .svn"
- *pinned itemize output and some double-verbose messages for pinned directories, with or without --dry-run; change get_dirlist, send_directory, send_file_name, make_file to signal a protected file
- *maxdel itemize output for files spared because of --max-delete
- support for --max-delete=0 to set a deletion limit of zero rather than disable the limit
- additional error message "could not make way for new <TYPE> at %s" when a file cannot be deleted to allow another to be created in its place
- delete_item doesn't even try to rmdir a directory containing protected files (because rsync might be used on an exotic filesystem in which rmdir can delete nonempty directories)
- "See if file is excluded before reporting an error": make sure the file would be excluded whether or not it is a directory.

Please test and comment.  Wayne, I've been rather bold with my changes, and you may want to throw out some of them, but let's talk about it.

This patch has some rather trivial conflicts with the ACL patch.  I will write up corresponding man page changes next.  Maybe I will also write a test case.
Comment 7 Matt McCutchen 2006-06-18 19:12:56 UTC
Is anyone going to look at my proposed fix?  John?  Wayne?
Comment 8 Matt McCutchen 2006-06-22 10:44:28 UTC
Created attachment 1977 [details]
Improved patch that itemizes _changes_ and updates acls.diff

I decided --itemize-changes was called that for a reason, so I took out the "*pinned" and "*maxdel" itemize outputs in favor of double-verbose messages similar to other messages that say why something wasn't changed (%s is uptodate, %s is newer, etc.).  This patch also updates acls.diff to apply correctly.
Comment 9 Matt McCutchen 2006-07-22 16:00:42 UTC
Created attachment 2052 [details]
Proposed changes to deletion, filters, etc.; revised to apply cleanly

Would someone please consider my patch?  I have updated it to apply cleanly to CVS rsync as of today, July 22.  As before, it updates acls.diff and also fixes some line numbers in acls.diff that were unrelated to my deletion improvements.
Comment 10 John Van Essen 2006-08-25 16:11:08 UTC
Matt - I finally got around to examining/testing your patch (during summers in Minnesota I have other, higher priority things to do :D).  It's a very large patch which goes well beyond the scope of this bug report.  I like your efforts to report the precise nature of a failure to perform a specific task, but I can't speak to anything in particular outside my specific issue.

The output of my demo script using a patched rsync with -v is the same as before except that --dry-run no longer reports that it will (attempt to) delete dir/.

With -vv, the output on both normal and --dry-run is now:

building file list ... 
done
deleting in .
[generator] protecting file dir/file.tmp because of pattern file.tmp
protected file dir/file.tmp pins parent directory
deleting dir/file.txt
dir is pinned

The "protected file ..." and "dir is pinned" messages are new compared with unpatched rsync output.  The "protected file ..." message is good.  In the other message, rather than used the term "pinned", I would suggest something like:

{dirname}/ cannot be deleted because it contains a protected/excluded file

I would also argue that the message should be emitted regardless of -v options since it indicates that rsync is unable to perform an expected operation.

Now we wait for Wayne to chime in with his opinion...
Comment 11 Matt McCutchen 2006-08-25 17:25:59 UTC
(In reply to comment #10)

Thanks for testing the patch!

> I would also argue that the message should be emitted regardless of -v options
> since it indicates that rsync is unable to perform an expected operation.

I disagree.  Rsync is not /unable/ to delete the directory; it merely chooses not to delete the directory because you gave it a protect filter (or an exclude filter that implies a protect filter).  It's not an error when one requested behavior (the protect filter) takes priority over another (--delete).

You have to decide what you want.  If you want rsync to delete the directory, use a hide filter instead of an exclude filter, or give the parent-protect flag.  If you want rsync not to delete the directory, continue to use an exclude filter; if you want it to remind you why it isn't deleting the directory, use -vv.

> {dirname}/ cannot be deleted because it contains a protected/excluded file

Sure, that is clearer to users, but I would change "cannot be deleted" to "was not deleted".

> Now we wait for Wayne to chime in with his opinion...

And we wait... :)
Comment 12 Wayne Davison 2006-10-15 11:33:06 UTC
I definitely like the idea of adding a way to indicate that some exclude rules shouldn't prevent the removal of a directory, and will work on including something similar to this in the next release.  Thanks for the patch!
Comment 13 Wayne Davison 2006-11-11 11:30:15 UTC
It took me a while to work through all the changes you made, but I have finally got most of what you proposed added to the CVS version, with some tweaking of implementation and a few less verbose messages for now.  I'm still considering adding in some of the ones I left out.

So, the CVS version now has the ability to mark an include/exclude as "perishable", which means that it will be ignored in a directory that is being deleted.

The code also does a proper job of making the output of --dry-run match what will be output when the option is turned off, barring unforseen permission errors.

Thanks, Matt!