Bug 5324 - Reduce the performance penalty of --xattrs on Mac OS X
Summary: Reduce the performance penalty of --xattrs on Mac OS X
Status: RESOLVED FIXED
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.1.3
Hardware: x86 Other
: P3 enhancement (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-03-14 09:21 UTC by Fabrice Authier
Modified: 2020-12-11 14:19 UTC (History)
8 users (show)

See Also:


Attachments
A first work in progress patch to add a hashtable (20.83 KB, text/plain)
2016-07-22 17:49 UTC, Stefan Metzmacher
no flags Details
valgrind clean work in progress patch (25.20 KB, text/plain)
2016-07-25 07:07 UTC, Stefan Metzmacher
no flags Details
Possible patch for master (27.82 KB, patch)
2016-07-25 15:19 UTC, Stefan Metzmacher
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Fabrice Authier 2008-03-14 09:21:34 UTC
system on both server :
os x server 10.4.11

when i use the option -X (xattr) to synchronize about 400000 files between two server, the time is four time more long that i don't use this (2h us 1/2h).

syntaxe : 
rsync -aAX --del --force /source/ server2:/dest/ (400000 files -> 2hours)
rsync -aA --del --force /source/ server2:/dest/ (400000 files -> half hours)

thank you
F.A.
Comment 1 Matt McCutchen 2008-03-14 12:41:19 UTC
Asking rsync to do more (preserve the xattrs) will inevitably make it take longer, but a 4x slowdown does seem excessive.
Comment 2 Wayne Davison 2008-03-15 03:04:18 UTC
Do your files have a lot of differing xattrs?  One thing I never much liked about the xattrs patch (from the very beginning) is that the code attempts to do a very simplistic linear search through all the prior xattrs looking for a matching set of attributes (to share matching attributes between files).  If your files have a lot of xattr entries, that search will eat up more and more time as the list of unique attributes grows.

One solution to this might be to create a hash of all the names and xattr data, and then store the xattrs in a hash lookup.  That should speed things up quite a bit when the list of unique xattr values grows large.

Note:  if you are using the osx-create-time.diff, please switch to the crtime.diff instead -- the oxs-create-time.diff patch is known to be slow, quitely possibly due to the bloating of unique xattr values in the list.
Comment 3 Fabrice Authier 2008-04-10 11:35:01 UTC
> One solution to this might be to create a hash of all the names and xattr data,
and then store the xattrs in a hash lookup.  That should speed things up quite
a bit when the list of unique xattr values grows large.

Thank you,
but how do it this hash ? 
thank's a lot

(I use now the rsync 3.0.2)
Comment 4 Mike Bombich 2009-06-20 21:53:52 UTC
Mac OS X makes extensive use of xattrs, and I've seen hundreds of thousands of unique xattrs on several end-user systems.  In those cases rsync eventually runs out of memory and bails.

rsync should probably stop calling find_matching_xattr when the list reaches a specified size.  And at least for Mac OS X, xattrs shouldn't be cached in rsync_xal_l, they should probably be compared on the fly by the generator somewhere near generate_and_send_sums.

Also, to address Fabrice's concern, performance will ultimately be directly linked to the lack of a quick heuristic for determining whether xattrs have been modified.  xattrs don't have modifications dates, so you can use either the size or a checksum to determine if they've changed.  In my analysis, rsync spent 20% of its time in md5_process on a task involving many xattrs with lots of data.  Unless we come up with something faster than md5 (potentially at the cost of reliability), this is just a performance hit we'll have to live with.

I'm going to take a stab at this, but I'm curious whether you've put any more thought into how xattr support is implemented since your last comment here.

Thanks!
Mike Bombich
Comment 5 Björn Jacke 2016-06-06 15:01:33 UTC
how about adding an option like "--use-ctime-before-xattr-compares", which only reads and compares EAs for files where the ctime on the source side is newer than the ctime on the target side. EA modifications update the ctime ususally. This would be a way to speed up syncing with EAs quite a lot I think.
Comment 6 Stefan Metzmacher 2016-07-22 17:49:33 UTC
Created attachment 12285 [details]
A first work in progress patch to add a hashtable

I need to clean this up and do more tests.

But with 1000 unique xattrs I got 50% less cpu instruction in callgrind.
Comment 7 Stefan Metzmacher 2016-07-25 07:07:43 UTC
Created attachment 12286 [details]
valgrind clean work in progress patch
Comment 8 Stefan Metzmacher 2016-07-25 15:19:28 UTC
Created attachment 12287 [details]
Possible patch for master

I hope this is an acceptable patchset to fix the problem for rsync master.
Comment 9 Wayne Davison 2016-08-14 21:48:53 UTC
The patchset looks very nice. Thanks!

I've made some very minor tweaks and committed it.
Comment 10 Oren Kishon 2016-12-27 09:28:32 UTC
> how about adding an option like "--use-ctime-before-xattr-compares", which only > reads and compares EAs for files where the ctime on the source side is newer
> than the ctime on the target side. EA modifications update the ctime ususally. > This would be a way to speed up syncing with EAs quite a lot I think.

Has anyone started developing this?

Thanks you
Oren Kishon, Ctera
Comment 11 Björn Jacke 2016-12-27 10:02:36 UTC
no, because the problem was fixed with metzes patch. if you see a need for the ctime compare, then please open a new bug report for it. this bug here is closed and fixed.