Bug 12732 - hard links can cause rsync to block or to silently skip files
Summary: hard links can cause rsync to block or to silently skip files
Status: NEW
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.1.2
Hardware: x64 Linux
: P5 normal (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-04-04 20:47 UTC by Hansjoerg Lipp
Modified: 2017-12-31 01:28 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Hansjoerg Lipp 2017-04-04 20:47:22 UTC
Overview
========

Hard link handling seems to be broken when using "rsync -aH --compare-dest". I found two possible scenarios:

1) rsync completes without error message and exit code 0, although some files are missing from the backup
2) rsync blocks and must be interrupted/killed

I found this bug when tracking down random hangs of rsync. It turned out that most, but not all of these hangs occur when hard links are present. Therefore, I hope the latter case might give some hints to a larger problem that might be triggered by this hard link bug.


How to reproduce (1)
====================

[ Using Linux on e.g. ext4 ]
############################
mkdir srclt
cd srclt
echo x > a
ln a b
echo x > c
ln c d
cd ..

cp -aix srclt dstlt
rm dstlt/{b,c}

mkdir baklt

rsync -aHvv --compare-dest=$PWD/dstlt/. $PWD/srclt/. $PWD/baklt/. >> testlt.log 2>&1
############################


Actual Results (1)
==================

cat testlt.log
#####
sending incremental file list
delta-transmission disabled for local transfer or --whole-file
./
a is uptodate
d is uptodate
b
a => b
total: matches=0  hash_hits=0  false_alarms=0 data=2

sent 173 bytes  received 160 bytes  666.00 bytes/sec
total size is 8  speedup is 0.02
#####
ls -il srclt dstlt baklt
#####
baklt:
total 8
3249818642 -rw-r--r-- 2 X X 2 2017-04-04 X a
3249818642 -rw-r--r-- 2 X X 2 2017-04-04 X b

dstlt:
total 8
2205741698 -rw-r--r-- 1 X X 2 2017-04-04 X a
2205741699 -rw-r--r-- 1 X X 2 2017-04-04 X d

srclt:
total 16
1138988347 -rw-r--r-- 2 X X 2 2017-04-04 X a
1138988347 -rw-r--r-- 2 X X 2 2017-04-04 X b
1138988348 -rw-r--r-- 2 X X 2 2017-04-04 X c
1138988348 -rw-r--r-- 2 X X 2 2017-04-04 X d
#####


Expected Results (1)
====================

The directory baklt should contain the entries b and c.
Entry c is completely ignored and does not show up in the log.
Entry a does not need to appear in the backup, as it is present in both directories srclt and dstlt.


How to reproduce (2)
====================

[ Using Linux on e.g. ext4 ]
############################
mkdir srclt2
cd srclt2
echo x > a
ln a b
cd ..

cp -aix srclt2 dstlt2
rm dstlt2/b

mkdir baklt2

rsync -aHvv --compare-dest=$PWD/dstlt2/. $PWD/srclt2/. $PWD/baklt2/. >> testlt2.log 2>&1
############################


Actual Results (2)
==================

=> rsync hangs and must be interrupted/killed.

cat testlt2.log
#####
sending incremental file list
delta-transmission disabled for local transfer or --whole-file
./
a is uptodate
b
a => b
rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(636) [sender=3.1.2]
rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at io.c(504) [generator=3.1.2]
#####
ls -il srclt2 dstlt2 baklt2
#####
baklt2:
total 8
2191211 -rw-r--r-- 2 X X 2 2017-04-04 X a
2191211 -rw-r--r-- 2 X X 2 2017-04-04 X b

dstlt2:
total 4
2191208 -rw-r--r-- 1 X X 2 2017-04-04 X a

srclt2:
total 8
2191206 -rw-r--r-- 2 X X 2 2017-04-04 X a
2191206 -rw-r--r-- 2 X X 2 2017-04-04 X b
#####


Expected Results (2)
====================

rsync should not block.
Entry a does not need to appear in the backup, as it is present in both directories srclt and dstlt.


Further information
===================

This problem exists at least for rsync versions 3.1.0, 3.1.1, and 3.1.2 for different Linux varieties using various file systems:
https://lists.samba.org/archive/rsync/2015-April/030092.html

Latest test on openSUSE 42.2 (x86_64) on ext4 + on nfs with
rsync --version
#####
rsync  version 3.1.2  protocol version 31
Copyright (C) 1996-2015 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
    64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
    socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
    append, ACLs, xattrs, iconv, symtimes, prealloc
#####
Comment 1 Hansjoerg Lipp 2017-04-05 22:07:34 UTC
Am 05.04.2017 um 22:05 schrieb L A Walsh via rsync:
>    I ran rsync 3.1.1 for over a year to help generate
> snapshots.  I can't say if it copied all the files or not, as
> it was backing up a large "/home" partition, BUT, it never hung.
> It did take 45min to a few hours to do the compare, but it
> was comparing a large amount of data (>750G) w/a snapshot
> (another 750G) to dump diffs to a third, and my /home partion
> has a *very* large number of hard links.

I've been using rsync for many years and it works fine most of the time.
I'm not sure if all of the occasional hangs have the same reason, these
are really hard to track down as they usually occur during large
transfers (e.g. when synchronizing large backup disks). That's why I was
happy that I could find a small test case which triggers this problem.

Does your rsync hang after the sequence of commands described in section
"How to reproduce (2)"?

>> Latest test on openSUSE 42.2 (x86_64) on ext4 + on nfs with
>>   
> ----
>    Ah...  I'd suspect nfs...
>      Why are you using nfs?

In order to find out if there is a difference when using another file
system type. The most recent tests were on ext4 and on nfs
(independently), older tests were on at least ext3 and xfs. IIRC I only
tested on different OpenSUSE and Debian versions on x86_64 systems, though.

>    Just checked my /home partition.
>    find shows 9295431 names (of any type), but du shows
> (using du --inodes) shows 4407458 inodes.  That means over
> half of the filenames are hard linked.  While my home
> partition takes up 60% more space now, even cutting
> those counts in half would still a large number of
> hard links -- and rsync didn't crash doing an
> rsync of the partition to an empty one, but first comparing
> to a previous snapshot (the empty partition ended up
> with differences between the main partition & the snapshot.

Probably using different options? Can this be some sort of Heisenbug,
nobody can reproduce? Do the two sequences of shell commands work for
you as expected? Please note that both rsync commands in the mail
generated by bugzilla are split into two lines (each): Both rsync
commands should read
   rsync PARAMS DIRS >> XXX.log 2>&1

Kind regards,
Hansjoerg