Bug 11521 - rsync does not use high-resolution timestamps to determine file differences
Summary: rsync does not use high-resolution timestamps to determine file differences
Status: RESOLVED FIXED
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.1.2
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-09-14 21:37 UTC by Michael McCracken
Modified: 2016-01-24 19:52 UTC (History)
0 users

See Also:


Attachments
patch to check hi-res timestamp in unchanged_file (493 bytes, patch)
2015-09-14 21:38 UTC, Michael McCracken
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael McCracken 2015-09-14 21:37:34 UTC
The sub-second timestamps available on many filesystems are preserved when requested across copies, but aren't used to determine file differences.

If a file exists at both origin and destination and its contents the same size in each place, and the timestamps only differ in the sub-second resolution, rsync will treat the files as the same (unless you use --checksum).

So if a file is created, and then a snapshot of its dir is taken, then the origin file is modified (but the size is preserved) within the same second, an attempt to update that snapshot using rsync will fail to copy the change.

Here's a script that reproduces the issue with high reliability for me:

#!/bin/bash                                                                                                                                                                                              

set -x

DIR=$(mktemp -d -p $(pwd))

mkdir $DIR/d1
mkdir $DIR/d2

echo dummy > $DIR/d1/dummy
echo dummy > $DIR/d2/dummy

echo one > $DIR/d1/afile
sleep 0.1
echo two > $DIR/d2/afile

/usr/bin/stat $DIR/d1/afile | grep Mod
/usr/bin/stat $DIR/d2/afile | grep Mod

~/packages/rsync/rsync --delete -a -HAX -vii $DIR/d2/ $DIR/d1

diff -r $DIR/d1 $DIR/d2

/usr/bin/stat $DIR/d1/afile | grep Mod
/usr/bin/stat $DIR/d2/afile | grep Mod



If the diff shows a difference, then the rsync didn't copy afile's contents over. However, note the stat info from the last two lines - the updated modify timestamp *will* be synced, making an inconsistent sync.

The following patch adds a check of the high-res timestamp to unchanged_file. This solves the problem for me, and I've guarded it so it shouldn't break on systems with no high-res timestamp. Please let me know if I can be helpful in testing it further or making it more robust.



diff --git a/generator.c b/generator.c
index 3a4504f..2f64f5d 100644
--- a/generator.c
+++ b/generator.c
@@ -588,7 +588,11 @@ int unchanged_file(char *fn, struct file_struct *file, STRUCT_STAT *st)
        if (ignore_times)
                return 0;
 
-       return cmp_time(st->st_mtime, file->modtime) == 0;
+       return cmp_time(st->st_mtime, file->modtime) == 0
+#ifdef ST_MTIME_NSEC
+               && st->ST_MTIME_NSEC == F_MOD_NSEC(file)
+#endif
+               ;
 }
Comment 1 Michael McCracken 2015-09-14 21:38:27 UTC
Created attachment 11440 [details]
patch to check hi-res timestamp in unchanged_file
Comment 2 Andrey Gursky 2016-01-23 02:04:23 UTC
(In reply to Michael McCracken from comment #1)

I believe the rsync maintainer might have commented this with at least the reference to the mailing list [1], where this has been already proposed, though ignored (like this bug report either).

The things are not so easy, of course [2] (and follow the discussion).

[1] [PATCH] Consider nanoseconds when quick-checking for unchanged files
    https://lists.samba.org/archive/rsync/2014-December/029853.html
[2] [PATCH] Consider nanoseconds when quick-checking for unchanged files
    https://lists.samba.org/archive/rsync/2016-January/030511.html
Comment 3 Wayne Davison 2016-01-24 19:52:49 UTC
The latest git version has an option that lets you choose to include nanoseconds in comparisons if you want them. Having it on by default would likely cause far too many headaches for various backup solutions that use an older filesystem (e.g. ext3) that doesn't support nanoseconds.