Bug 12742 - a proposal: fix bogus nanosecond mtimes on transfer (patch included)
Summary: a proposal: fix bogus nanosecond mtimes on transfer (patch included)
Status: NEW
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.1.1
Hardware: All All
: P5 minor (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-04-13 09:07 UTC by George
Modified: 2017-04-13 09:10 UTC (History)
1 user (show)

See Also:


Attachments
a proposed patch to ignore bogus .st_mtim.tv_nsec values by resetting them to 0 (2.13 KB, patch)
2017-04-13 09:07 UTC, George
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description George 2017-04-13 09:07:12 UTC
Created attachment 13152 [details]
a proposed patch to ignore bogus .st_mtim.tv_nsec values by resetting them to 0

This suggestion is actually in a grey area between a proposed enhancement and a minor fix.

Basically utimensat() sometimes fails to transfer mtimes for some bogus mtime values, so then rsync tries to transfer the same files over and over again.

However, a simple utime() call works just fine, what leads to the whole suggestion.

I'm attaching a proposed patch for rsync 3.1.1, but I think v. 3.1.2 can be adjusted in a similar vein ; not sure about 3.1.3, since I thought it is in development.

Now I will go into details, and then briefly discuss the alternatives.


Details
=======

First of all, this is not strictly speaking an rsync problem: we are talking about a marginally small number of files ( a few hundred of at least hundreds of thousands, sometimes millions ), with their mtimes messed up possibly because of some subtle fault in the filesystem implementation. 

However, these very few files make rsync to set return code RERR_PARTIAL (23) for the whole transfer, what leaves us with no good options.

In my experience, on a Linux system with utimensat() / futimens() the latter fails to transfer a negative .st_mtim.tv_nsec, or a value greater than 1000000000 with errno 22 (EINVAL).

However, since rsync 3.1.1 seems to compare only seconds, and rsync 3.1.2 seems to still use only seconds comparison when nominating the files for a transfer, it is very tempting to ignore bogus nanosecond values by setting them to zero, because in that case futimens() will just do the job, and we'll have no further transfer attempts.

I must admit that I do not know how and why these false mtime values appear; my another guess is that there may be a minor bug in some of utimes() implementations, since all the bogus nanosecond times that I see are a multiple of a 1000.

However, struct timespec in my opinion is just a plain invitation for this kind of error, since it defines tv_nsec as a long, which is plain insane: you do not want to define a filesystem-related field to have anything but a fixed length size, when at least for GCC a long is basically a machine word, that is, its size depends on the CPU architecture. ( Windows long type definition is better in that particular case, making it AFAIR an int32. )

Rsync, by the way, takes a much more reasonable approach by defining mod_nsec in set_modtime() as an uint32 ; however, this sanity is incompatible with insanity in the standard ( see e.g. http://pubs.opengroup.org/onlinepubs/7908799/xsh/time.h.html ), what will of course break the bogus nanosecond mtimes anyway.

For a discussion, please refer to a following rsync lists message: https://lists.samba.org/archive/rsync/2017-April/031177.html , but in short -- nanosecond mtime values:

(a) are not supposed to exist, since we need to define at least UTIME_OMIT and UTIME_NOW as long values, and have no other choice but to put these constants somewhere outside of 0 .. 1 000 000 000 interval ;

(b) can still appear on a filesystem nevertheless due to some possible imperfections in the OS code or filesystem implemenation ;

(c) are not really taken into account by rsync 3.1.1 / 3.1.2 when it compares files ;

(d) however, can still break an rsync transfer -- that is, lead to an error code 23.

Therefore, this request suggests that we ignore them on a transfer by setting set them to 0.


Alternatives
============

One could of course recompile rsync to make it ignore ns precision at all, but we can probably do much better -- that is, transfer all the proper mtime values with the proper precision, and ignore nanoseconds for the messed up ones.

So an obvious alternative to changing mtimes would be to allow the user to ignore them -- but since we still probably want to transfer mtimes for 99.99% of files, this shall rather be an option to ignore a few mtime transfer errors, and I will fill another enhancement request to address that.

Finally, running two subsequent rsync transfers -- one to transfer the file contents ignoring mtimes, and another to transfer mtimes, where we will ignore the return code, is possible, but has two drawbacks:

(1) this may double the time needed for the transfer: e.g. one one rsync session runs overnight, there would be no place for another rsync session on the same day ;

(2) rsync will still try to transfer the same files over and over again.