Bug 13082 - [REQ] Hardware / SSE based MD5 operations
Summary: [REQ] Hardware / SSE based MD5 operations
Status: RESOLVED FIXED
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.1.3
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-12 11:31 UTC by Ben RUBSON
Modified: 2022-11-14 18:36 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ben RUBSON 2017-10-12 11:31:40 UTC
Hi,

I made some performance tests and ses that rsync can be a bottleneck while it is calculating hashes of basis files (not a surprise).

For example, while I can read a big test file directly from the storage at 300 MB/s, rsync checksums it at about 200 MB/s.

Thus impact of calculating hashes for big files can be quite important (some minutes for files of several hundreds of GB).

Then, what about MD5 implementation based on SSE ?
Do you think we would have a performance improvement ?

Thank you !

Ben
Comment 1 Ben RUBSON 2020-05-18 15:30:31 UTC
First info / first patch there :
https://lists.samba.org/archive/rsync/2020-May/032175.html
Comment 2 Wayne Davison 2020-05-22 06:53:50 UTC
No, that patch has nothing to do with MD5. It is optimizing the rolling checksum.
Comment 3 Ben RUBSON 2020-05-22 07:33:14 UTC
Yes, and as discussed there also, sounds like xxhash could be a better solution than SSE / hardware backed MD5.
Feel free to close this then if not so relevant.
Comment 4 Wayne Davison 2020-05-23 06:02:04 UTC
Jorrit Jongma has supplied a nice patch that provides some assembler optimizations for MD5 on x86_64 which I have just merged (along with his optimizations for the rolling checksum algorithm).  Also, my xxhash changes have finally landed, and it includes a way for rsync to negotiate the best checksum algorithm shared between the client & server.  This will make it easier to add new checksum algorithms in the future.
Comment 5 Ben RUBSON 2020-05-23 09:19:07 UTC
Really nice additions, it looks promising, thank you very much Jorrit & Wayne !
Comment 6 Andrew Bartlett 2022-11-14 18:36:22 UTC
Samba 4.11 moved to GnuTLS for our MD5 and other hash operations, and so uses any hardware optimisation available there.