Bug 11711 - cache de-duplication
Summary: cache de-duplication
Status: RESOLVED MOVED
Alias: None
Product: ccache
Classification: Unclassified
Component: ccache (show other bugs)
Version: 3.2.1
Hardware: All All
: P5 enhancement
Target Milestone: ---
Assignee: Joel Rosdahl
QA Contact: Joel Rosdahl
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-04 00:22 UTC by mblythester+ccache
Modified: 2016-06-11 17:02 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description mblythester+ccache 2016-02-04 00:22:38 UTC
I've found that there are many identical files stored in my ccache.  After a little investigation, there are many ways this can happen:

-adding/removing a blank line in a source file (including adding a comment, or other preprocessor directive that is removed by the C preprocessor)
-changing "useless" code (e.g. an empty namespace, an unused class)
-changing compiler options that don't affect the output (e.g. adding -Wall on clean code)
-I'm sure the list continues...

Of course, it would be unreasonable for ccache to accurately predict every possible way that an identical object file could be created.  However, I think it is reasonable for ccache to identify that it has created an identical object so it can avoid storing it more than once.

Assuming this de-duplication could be performed, better hit rates could be obtained with a ccache of the same size (alternatively, a smaller cache could maintain the same hit rate).

(my brainstorming follows, feel free to disregard)

This probably requires installing additional files in the cache for the hash of each object.  So, the flow for a miss might look like this:

run compiler
generate hash of generated files (*.o, *.d, *.stderr, others?)
lookup hash in cache
if hit
  install manifest-like file to redirect to the existing cache entry?
  copy/link that object to the proper destination
else //miss
  install object(s) in cache
  install manifest (or something)? at the location of the hash of the object
  copy/link the new object to the proper destination

If you're willing to break backwards-compatibility, it might make more sense to install the compiler products (*.o,*.d, *.stderr) in the location that corresponds with that file's own hash (instead of the preprocessor result's hash as is currently done).  Then, manifest-like files would have to be generated for both direct hits and preprocess hits, and both would refer to the product's hashes.
Comment 1 Joel Rosdahl 2016-06-11 17:02:32 UTC
Moved to https://github.com/ccache/ccache/issues/98.