Bug 14338 - ZSTD support
Summary: ZSTD support
Status: RESOLVED FIXED
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.1.3
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-04-06 10:36 UTC by Sebastian A. Siewior
Modified: 2020-05-29 04:22 UTC (History)
2 users (show)

See Also:


Attachments
zstd support (16.17 KB, patch)
2020-04-06 10:36 UTC, Sebastian A. Siewior
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sebastian A. Siewior 2020-04-06 10:36:15 UTC
Created attachment 15898 [details]
zstd support

zstd compression was announced as "good compression with high
throughput" so I gave it a try. With zlib, on high speed links the CPU
is usually the bottle neck. With zstd I'm able to fill a 200Mbit link :)

zstd detection happens automatically via pkg-config. No zstd header means
no error about missing zstd. So that should be okay. However, pkg-config
is now kind of required…

I duplicated the zlib code and replaced it with zstd hooks once I
understood what was going on. I made a few local tests with and without
tokens and it seems to work. The compression can be selected with `-Z'
option. By default `0' is used as the compression level which is a
special default (it currently maps to 3). The compression level can be
specified by the same option as for zlib.
The compressor feeds data into zstd and starts sending data once the
outgoing buffer is full or when a flush is requested. That flush will
close the current compression block and create a new one for the
following transfer (saving the internal compression / history state).
The decompressor allocates space for two blocks. Should one block
contain more data, then it will loop more often.
Comment 1 Wayne Davison 2020-05-25 20:27:15 UTC
Thanks for the patch! I have transformed it into an official "zstd.diff" file in the rsync-patches git repo using the new negotiated compression code.  I imagine it will be included in master soon.
Comment 2 Wayne Davison 2020-05-25 20:29:23 UTC
I'll note that I had to install an upgraded zstd lib to get this to work. I first tried 1.3.8 (since it was mentioned in the configure check) but it didn't have the right exported functions. Then then tried 1.4.5, which worked fine.
Comment 3 Wayne Davison 2020-05-25 20:47:13 UTC
Actually, I decided it would be better to just go ahead and add this and the lz4 code into master and put them lower in the negotiation list, which will make them easier to test (and they're easy to disable via configure, as needed).
Comment 4 Sebastian A. Siewior 2020-05-28 20:15:10 UTC
Thanks for the merge!
Sorry for the version. According zstd's git v1.3.8 contains ZSTD_compressStream2(). Debian Buster ships v1.3.8 and it does not find the function. Buh. Looking closer it is marked as experimental and only avaible for static linking. As per header file history, since v1.4.0 it is available for linking. 

I'm going to send a pull for move the function for ZSTD_compressStream2() which hopefully should cover it.

Regarding the compression preferences.
If I understand it correctly:
- "zlibx" is the external zlib.
- "zlib" is the internal zlib which feeds blocks into the the zlib history/dictonary without sending it over the wire. 

Could those two be swapped? That zlibX sounds like zlib-eXtended. Or does it stand for zlib-eXternal?

Can we move zlib* at the end of the supported algorihms in terms of preferences? My motivation was the lower for zstd was to use decent compression with _low_ CPU usage.
The rsync.yo file says that this "--compress-choice=zlibx" can be used since version 3.1.1. Is this working for negotiation and would support say "zstd,lz4,zlib,zlibx" with zlibx with 3.1.1 and zstd on master or did misunderstand it?

Sebastian
Comment 5 Randall S. Becker 2020-05-28 20:30:10 UTC
Wondering about git 1.3.8, which is many years old. (Git platform maintainer here)
Comment 6 Wayne Davison 2020-05-29 04:22:25 UTC
The name "zlibx" is both for external and excluding of unsent data. It's listed first because it should be the safer choice with the weirdness in external zlib going on. The new compress items are listed afterwards because they're untested.

As for what you would like the order to be, please refer to the manpage, especially the RSYNC_COMPRESS_LIST environment variable.  Feel free to set it to whatever order you like.