Bug 9041 - Feature request: Better handling of btrfs based sources
Feature request: Better handling of btrfs based sources
Status: RESOLVED DUPLICATE of bug 10170
Product: rsync
Classification: Unclassified
Component: core
3.1.0
All Linux
: P5 enhancement
: ---
Assigned To: Wayne Davison
Rsync QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-13 21:58 UTC by Kevin Korb
Modified: 2013-09-27 00:31 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Korb 2012-07-13 21:58:44 UTC
Rsync currently does not handle complex btrfs systems in an efficient way.

Btrfs has subvolumes and snapshots of those subvolumes which appear like simple subdirectories (they don't show up as mount points).  One potential use case for this is to have production and development versions of a tree stored in the same filesystem using CoW to save space.

This introduces the possibility of the same file (with or without modification) appearing multiple places within the filesystem using the same inode number but not increasing the link count like a hard link would.

Here is a simple example using 2 small btrfs filesystems on a Gentoo Linux system with current btrfs tools...

# I start with 2 empty filesystems...
> df -hT /test/*/
Filesystem                Type   Size  Used Avail Use% Mounted on
/dev/mapper/vg-test_btrfs btrfs  1.0G   56K  382M   1% /test/btrfs
/dev/mapper/vg-test_rsync btrfs  1.0G   56K  382M   1% /test/rsync

# I create a subvolume within the btrfs filesystem
> btrfs sub create btrfs/current
Create subvolume 'btrfs/current'

# I copy 2 mp3 files into the subvolume
> cp -v *.mp3 btrfs/current/
`ChangeMe.mp3' -> `btrfs/current/ChangeMe.mp3'
`NoTouching.mp3' -> `btrfs/current/NoTouching.mp3'

# Now I snapshot that subvolume
> btrfs sub snapshot btrfs/current btrfs/old
Create a snapshot of 'btrfs/current' in 'btrfs/old'

# As you can see there are now two subvolumes that contain the exact same 2
# files with the same inode numbers:
> ls -li btrfs/*/*.mp3
257 -rw-r----- 1 root root 62060544 Jul 13 16:55 btrfs/current/ChangeMe.mp3
258 -rw-r----- 1 root root 46897152 Jul 13 16:59 btrfs/current/NoTouching.mp3
257 -rw-r----- 1 root root 62060544 Jul 13 16:55 btrfs/old/ChangeMe.mp3
258 -rw-r----- 1 root root 46897152 Jul 13 16:59 btrfs/old/NoTouching.mp3

# Now I change one of the files in one of the subvolumes
> id3v2 -D btrfs/current/ChangeMe.mp3 
Stripping id3 tag in "btrfs/current/ChangeMe.mp3"...id3v1 and v2 stripped.

# Now, the inode numbers are still the same but the ChangeMe.mp3 file now has
# an updateded mtime and a different file size despite still having the same
# inode number.
> ls -li btrfs/*/*.mp3
257 -rw-r----- 1 root root 62060157 Jul 13 17:01 btrfs/current/ChangeMe.mp3
258 -rw-r----- 1 root root 46897152 Jul 13 16:59 btrfs/current/NoTouching.mp3
257 -rw-r----- 1 root root 62060544 Jul 13 16:55 btrfs/old/ChangeMe.mp3
258 -rw-r----- 1 root root 46897152 Jul 13 16:59 btrfs/old/NoTouching.mp3

# Now I rsync the whole thing...
> rsync -vaihhH --stats btrfs/ rsync/
sending incremental file list
.d..tp..... ./
cd+++++++++ current/
>f+++++++++ current/ChangeMe.mp3
>f+++++++++ current/NoTouching.mp3
cd+++++++++ old/
>f+++++++++ old/ChangeMe.mp3
>f+++++++++ old/NoTouching.mp3

Number of files: 7
Number of files transferred: 4
Total file size: 207.82M bytes
Total transferred file size: 207.82M bytes
Literal data: 207.82M bytes
Matched data: 0 bytes
File list size: 161
File list generation time: 0.004 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 207.85M
Total bytes received: 99

sent 207.85M bytes  received 99 bytes  59.38M bytes/sec
total size is 207.82M  speedup is 1.00

# As you can see rsync sees 4 completely different files.  It has no idea that
# NoTouching.mp3 is the same file even though they have the same inode number.
# The use of -H doesn't matter because they aren't hard links so the link
# count is only 1.
> ls -li rsync/*/*.mp3
259 -rw-r----- 1 root root 62060157 Jul 13 17:13 rsync/current/ChangeMe.mp3
260 -rw-r----- 1 root root 46897152 Jul 13 17:12 rsync/current/NoTouching.mp3
261 -rw-r----- 1 root root 62060544 Jul 13 17:12 rsync/old/ChangeMe.mp3
262 -rw-r----- 1 root root 46897152 Jul 13 17:12 rsync/old/NoTouching.mp3

# Disk space usage is increased accordingly:
> df -hT /test/*/
Filesystem                Type   Size  Used Avail Use% Mounted on
/dev/mapper/vg-test_btrfs btrfs  1.0G  164M  219M  43% /test/btrfs
/dev/mapper/vg-test_rsync btrfs  1.0G  209M  174M  55% /test/rsync

Note that I am not asking for rsync to duplicate the subvolume or snapshot functionality.  Just recognize that the same file exists in multiple locations kind of like a hard link but not.

It seems to me the quickest way to accomplish this would be to add an option that works kind of like --hard-links except that it remembers all the file<>inode number pairings instead of just the ones with link count >1. Then, when it finds a file with the same inode number instead of writing out a new file it would use the new clone ioctl (like cp --reflink does) to make a duplicate file without consuming any additional disk space.  After that it would then do the standard mtime check to see if a delta-xfer is needed to update that cloned file.
Comment 1 Kevin Korb 2013-09-27 00:31:00 UTC
I ended up switching to ZFS for what I was doing so this is no longer important to me.  

Bug https://bugzilla.samba.org/show_bug.cgi?id=10170 has more pertinent details and a more valid use case so I am closing this request in favour of it.

*** This bug has been marked as a duplicate of bug 10170 ***