I'm using rsync to mirror large repository (with files owned by multiple users). The repository is subject to updates while rsync is running. I found that in few cases, the ownership of the files is left as root/other (uid 0, gid 1).
I can not replicate the problem, but I suspect the file was modified (or removed) while rsync was running (the rsync take ~30 minutes).
I checked the code, and noticed that before "rsync" commit the temp file (finish_transfer @ rsync.c), it changes the file attributes (using 'set_file_Attrs'). However, it does not check if the file attributes were updated succesfully. As a result, it may leave files with the wrong timestamp/ownership or permission.
My suggested solution is to modify set_file_perms- return 0 for no change, 1 for changes and -1 for error. finish_transfer can than check for failures, and remove the temp file if the permission/ownership/timestamp is not correct.
> However, it does not check if the file attributes were updated succesfully.
Sure it does -- it checks the return status of the various calls that it makes and outputs warnings should they fail (and sets a not-everything-copied-correctly return code too). What it does not do is to discard a file if the ownership or group info was not set correctly. Considering that any such failure would leave the file owned by the user running rsync with that user's default group, this shouldn't be a security problem (since you must already inherently trust the user running rsync). I think it is better to keep the file and let a future rsync run try to fix any failed ownership problems.
(In reply to comment #1)
> I think it is better to keep the file and let a future rsync run try to fix any
> failed ownership problems.
Waine - I hope that you will reconsider. See examples below.
I would vote to have correct files in the repository, instead of leaving incorrect data that may (or may not) be fixed in future runs. Having incorrect data (even for few hours) can have "chain" effect, when data can not be easily recalled.
I agree that the initial permissions will not create security problem. However, incorrect permissions may easily break existing processes that expect read-access to the files in the user directory, or when files are shared between accounts: binaries will not be executable, etc.
The timestamp is more serious issue - by default, the time stamp is very new, which will make it very hard to find the most updated version of several repositories. In particular, it will prevent future 'rsync --update' from working.
The last serious issue is the ownership. I'm using rsync in server mode (root) to sync repository with multiple users. Leaving "root/other 0700" files in the directory, prevented the users from updating his own files (which took us a lot of time to debug).
(In reply to comment #0)
> I checked the code, and noticed that before "rsync" commit the temp file
> (finish_transfer @ rsync.c), it changes the file attributes (using
> 'set_file_Attrs'). However, it does not check if the file attributes were
> updated succesfully. As a result, it may leave files with the wrong
> timestamp/ownership or permission.
Why isn't your operating system letting rsync set the file attributes? Obviously, if you move (or delete) a file out from under rsync's nose, there's little that rsync can do about it. However, if rsync is spuriously failing to set attributes, I recommend you truss it to find the cause. If you are pulling, do something like this:
truss -dfo rsync.log rsync there:files/ files/
If you are pushing, do something like this and then download the log file from the other machine:
rsync --rsync-path="truss -dfo rsync.log rsync" files/ there:files/
That said, I do believe rsync 3 needs to be a savvier reader and writer of the filesystem instead of complacently doing what would work assuming no concurrent modification. Perhaps as an option, rsync 3 should perform additional "defensive" checks to make sure that each received file matches the file list, at least in preserved attributes. In addition, as much as possible on its system, it should use file descriptors and f* and *at system calls in place of following paths from the root of the transfer. Since file descriptors are foolproof, this would greatly reduce race conditions, including those dangerous to security. Unfortunately, there are some deficiencies in this technique:
(1) One must open a file for reading or writing, not just to have a file descriptor for f* calls.
(2) One can't open symlinks. A "lopen" call that opens a file and/or symlink for neither reading nor writing would fix both #1 and #2.
(3) One can't read directories atomically. An "openall" call that atomically lopens all the files in a directory and returns a list of their names and file descriptors would fix this, but the memory allocation would be tricky and we have to hope for a high file descriptor limit. Furthermore, if a file is moved while rsync is working, rsync might see it twice or never; I see no way to prevent this.
I'm doing some kernel hacking, and I'm thinking about making a customized Linux kernel with the necessary system calls to allow programs like rsync to operate safely in the face of concurrent modification.
Thanks for taking the time to review this problem and my suggestions. See below for more data. I still think that it will be a good to avoid "commiting" files with bad permissions - just as extra protection against unexpected problems from NFS, or other types of remote file systems (i.e. FUSE).
As for my problem, I found that with older rsync, the "finish_transfer" used to set the permission and ownership after renaming the file, which will result in (temporary) bad ownership and permissions. By mistake, I was testing and reviewing the source code for 2.6.5. I will try to reduce the priority for this bug.
I will consider whether rsync should use some kind of a partial-transfer handling for a file that failed in its permission- or ownership-setting calls. Since these calls rarely fail, this is something of a fringe case, but it is good to consider how best to handle this.