Bug 14798 - Metadata traffic --- uncompressed with -z, interaction with --bwlimit and ssh compression
Summary: Metadata traffic --- uncompressed with -z, interaction with --bwlimit and ssh...
Status: NEW
Alias: None
Product: rsync
Classification: Unclassified
Component: core (show other bugs)
Version: 3.1.3
Hardware: All All
: P5 enhancement (vote)
Target Milestone: ---
Assignee: Wayne Davison
QA Contact: Rsync QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-08-17 09:22 UTC by zero
Modified: 2021-08-17 09:22 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description zero 2021-08-17 09:22:06 UTC
Consider the case where rsync is tasked to synchronize a large file set in which there are few changes.  Anecdotal evidence (duckduckgo search) suggests most of the network traffic will be spent exchanging file metadata, rather than file content, as intended.  The same anecdotal evidence suggests this "file list" is not exchanged in compressed form between rsync's endpoints, even when using the -z switch.  This seems accurate: setting up a suitable experiment shows ssh compression reduces overall bandwidth usage by roughly 2x in these cases.  This seems an opportunity for improvement.

The benefits would be compounded when using --bwlimit.  In this case, disabling ssh compression results in traffic that respects the requested shape.  However, this traffic is measured at the rsync endpoints.  Consequently, rsync will not use the available bandwidth effectively, precisely because in this use case there are very few file changes in the file set (which is the point of using rsync).

Note that since ssh compression is unpredictable, adequately adjusting --bwlimit for maximum efficiency is impossible.  Thus, bandwidth usage will be optimal without -z (but with redundant traffic without ssh or rsync compression), or suboptimal with or without -z and --bwlimit (due to ssh compressing file metadata without rsync realizing).  In these cases, the time required for rsync to complete the task remains unchanged regardless of the form of compression.

Would it be possible to rsync's -z switch to set up the equivalent of two compressed streams, one for file data, another for file metadata, which are then multiplexed over the wire?  In that way, ssh compression would be entirely unnecessary, and --bwlimit would still result in maximum efficiency even when most traffic is file metadata.  Having rsync compress the file list is likely to result in better compression than ssh could achieve because the shape of the file metadata will be known to rsync.

I could not find previous bug reports on this specific issue in the bug database --- I searched for bugs related to -z and --bwlimit, and I also searched through the release notes in case this (or an equivalent) enhancement has been applied recently.