I've found that there are many identical files stored in my ccache. After a little investigation, there are many ways this can happen:
-adding/removing a blank line in a source file (including adding a comment, or other preprocessor directive that is removed by the C preprocessor)
-changing "useless" code (e.g. an empty namespace, an unused class)
-changing compiler options that don't affect the output (e.g. adding -Wall on clean code)
-I'm sure the list continues...
Of course, it would be unreasonable for ccache to accurately predict every possible way that an identical object file could be created. However, I think it is reasonable for ccache to identify that it has created an identical object so it can avoid storing it more than once.
Assuming this de-duplication could be performed, better hit rates could be obtained with a ccache of the same size (alternatively, a smaller cache could maintain the same hit rate).
(my brainstorming follows, feel free to disregard)
This probably requires installing additional files in the cache for the hash of each object. So, the flow for a miss might look like this:
generate hash of generated files (*.o, *.d, *.stderr, others?)
lookup hash in cache
install manifest-like file to redirect to the existing cache entry?
copy/link that object to the proper destination
install object(s) in cache
install manifest (or something)? at the location of the hash of the object
copy/link the new object to the proper destination
If you're willing to break backwards-compatibility, it might make more sense to install the compiler products (*.o,*.d, *.stderr) in the location that corresponds with that file's own hash (instead of the preprocessor result's hash as is currently done). Then, manifest-like files would have to be generated for both direct hits and preprocess hits, and both would refer to the product's hashes.
Moved to https://github.com/ccache/ccache/issues/98.