Bug 10338 - Start deletion from the top of the hierarchy
Summary: Start deletion from the top of the hierarchy
Status: RESOLVED WONTFIX
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.1.0
Hardware: All All
: P5 enhancement (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-19 14:29 UTC by Ben RUBSON
Modified: 2016-12-29 18:54 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ben RUBSON 2013-12-19 14:29:32 UTC
I make my production backups with Rsync.

Here is an example of my backup tree on the destination server :
 /backups
   /2013-04-03
   /2013-04-02
   /2013-04-01
   /2013-03-31
   /2013-03-30
   /2013-03-29

At the end of the backup process, I upload a logfile in the backup directory and delete oldest backups.

For this, I use an include/exclude file, for example this inclexcl.txt :
 + /2013-04-03
 + /2013-04-03/logfile.log
 - /2013-04-03/*
 - /2013-04-02
 - /2013-04-01

I also use this empty directory where my logfile is :
 /tmp
   /path
     /2013-04-03
       /logfile.log

And I run this rsync command :
rsync -a --delete-after --exclude-from=inclexcl.txt /tmp/path/ server::backups/

Perfect, it works, my logfile is uploaded and oldest backups are deleted (in this example all backups of March).


However, what I can see in the daemon's log is that Rsync is browsing from the top down all files of the backup directories to delete, and delete them one by one.
Rsync behavior is to make a list of the files and check if any are "protected" from deletion before it removes the file.
As each of my backup directories contains hundreds of thousands of files, deletion take a very long time.

I think that it could be interesting to have the ability to skip this check, to ask Rsync to start deletion from the top of the hierarchy.
Perhaps adding a new option (--delete-from-top) ?
It would for sure speed up deletion of huge directories, for instance common daily backup directories (made for example with --link-dest), saving Rsync overhead.

Thank you very much for this improvement !

Best regards,

Ben
Comment 1 Paul Slootman 2013-12-19 15:33:15 UTC
The files have to be deleted one by one anyway, I'm not sure how much this could be improved.
Have you compared how long a simple rm -r $TOPDIR takes, compared to rsync? Make sure to flush any disk buffers / cache before running your tests (echo 3 > /proc/sys/vm/drop_caches; if you're running linux).
Comment 2 Ben RUBSON 2013-12-19 15:50:22 UTC
Paul, yes you're right, files have to be deleted on by one, but perhaps Rsync overhead could be skipped.

I made some tests, I created a directory with 300k files in it.
I deleted it with Rsync, and with rm command.
I did this test several times.
On my server, it took about 20 seconds with rm, about 40 seconds with Rsync.

Of course this is just a test example, each of the top directories of my production backup are much bigger, impact is then more important.
Comment 3 Ben RUBSON 2015-05-24 18:50:24 UTC
Hello,

Any news regarding this enhancement request ?

Thank you very much !

Ben
Comment 4 Ben RUBSON 2016-12-29 18:54:06 UTC
Using a workaround, so closing this for the moment.
Thank you !