Bug 10212 - ccache does not detect difference in #defines when precompiling header
Summary: ccache does not detect difference in #defines when precompiling header
Status: CLOSED FIXED
Alias: None
Product: ccache
Classification: Unclassified
Component: ccache (show other bugs)
Version: 3.1.9
Hardware: All All
: P5 critical
Target Milestone: 3.1.10
Assignee: Joel Rosdahl
QA Contact: Joel Rosdahl
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-10-16 10:19 UTC by tho
Modified: 2014-10-19 18:37 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description tho 2013-10-16 10:19:01 UTC
A simple test shows that ccache does not work with precompiled headers:

The source code:
% cat config.h
#define FOO

% cat precomp.h
#include "config.h"
#ifdef FOO
# define BAR 1
#else
# define BAR 0
#endif

% cat test.c
#include <stdio.h>
#include "precomp.h"

int
main()
{
  printf("%d\n", BAR);
  return 0;
}

The first compilation (with an empty cache), OK!
% ccache gcc -o precomp.h.gch precomp.h
% ccache gcc -o test -include precomp.h test.c              
% ./test 
1

Edit config.h, comment out (or remove) the #define:
% echo > config.h
% cat config.h

(empty)

Second compilation, FAIL...
% ccache gcc -o precomp.h.gch precomp.h
% ccache gcc -o test -include precomp.h test.c              
% ./test 
1

For comparision:
% gcc -o precomp.h.gch precomp.h       
% gcc -o test -include precomp.h test.c      
% ./test                               
0
Comment 1 Joel Rosdahl 2013-11-13 19:39:53 UTC
Thanks for the bug report!

This happens because ccache runs the preprocessor on the source file (in this case a header), the preprocessor removes #defines and ccache then hashes the output, which means that changes in #defines are not detected even though the resulting .gch file will have different behavior.

Here is shell script that demonstrates the problem:

  #!/bin/sh
  
  echo '#include "config.h"' >precomp.h
  echo '#define FOO' >config.h
  /usr/bin/gcc -E precomp.h | sha1sum
  echo '' >config.h
  /usr/bin/gcc -E precomp.h | sha1sum

The output is two identical hash sums.

I don't think this is fixable in ccache; we have to remove support for precompiled headers, or perhaps make it opt-in by introducing a new CCACHE_SLOPPINESS setting.

What do you think? Would a requirement of setting CCACHE_SLOPPINESS to something like "pch-defines" be reasonable to activate the preprocess support if you are prepared to get false cache hits when only a #define has changed? Or should the support just be removed?
Comment 2 tho 2013-11-15 16:09:57 UTC
I'm not sure if support for precompiled headers should be completely disabled or
configured as an opt-in option, but what is for sure is that it took me quite
some time to figure out the real origin of the issue I was having. So IMHO the
support for precompiled header should be enabled only if you can guarantee that
no false positive cache hit will occur.

But something else is striking me after your feedback on the issue: why isn't
ccache just hashing the brute source code instead of the preprocessed input (as
well as, of course, the source of any included file)?
Comment 3 Joel Rosdahl 2013-11-17 20:25:10 UTC
(In reply to comment #2)
> I'm not sure if support for precompiled headers should be completely disabled
> or configured as an opt-in option, but what is for sure is that it took me
> quite some time to figure out the real origin of the issue I was having.

Yes, I can imagine that it was tricky. Sorry about that.

> So IMHO the support for precompiled header should be enabled only if you
> can guarantee that no false positive cache hit will occur.

Right. This is a difficult trade-off, like some other things in ccache:

* There may be false cache hits when a header file has been added to a include directory (https://bugzilla.samba.org/show_bug.cgi?id=8424).
* There may be false cache hits affecting debug information (produced by -g) when compiling in two different directories.
* There may be false cache hits when the GNU assembler .incbin directive is being used.

And probably more. The common theme is that if ccache would make them opt-in, then ccache would much less useful for the absolute majority of users.

> But something else is striking me after your feedback on the issue: why isn't
> ccache just hashing the brute source code instead of the preprocessed input
> (as well as, of course, the source of any included file)?

That's a good idea, but I don't see how that could be done easily. ccache does not know how to follow include files, i.e. behave as a preprocessor (where to search for headers, whether to follow #include inside #ifdef, etc.).

On the other hand, I guess that ccache could run the preprocessor just to find out about the included files and then hash those files plus the source instead of the preprocessed output... Yes, should work, but would require some development.
Comment 4 tho 2013-11-19 10:10:19 UTC
(In reply to comment #3)
> > But something else is striking me after your feedback on the issue: why isn't
> > ccache just hashing the brute source code instead of the preprocessed input
> > (as well as, of course, the source of any included file)?
> 
> That's a good idea, but I don't see how that could be done easily. ccache does
> not know how to follow include files, i.e. behave as a preprocessor (where to
> search for headers, whether to follow #include inside #ifdef, etc.).
> 
> On the other hand, I guess that ccache could run the preprocessor just to find
> out about the included files and then hash those files plus the source instead
> of the preprocessed output... Yes, should work, but would require some
> development.

Following include files is quite trivial imho. The C (C++) preprocessor include directive is guaranteed to be something like [ \t]*#[ \t]*include[ \t]+["<"].
You need to recognize -I, -include, $CPATH and so forth but you already handle this. This does not look terribly difficult, does it? (of course, I understand that this requires some development and much testing since this is a major change).

Also, you have the "gcc -E -dD" option that would keep the #define directives, but I'm afraid this is gcc specific and is not supported by e.g. clang.

Another alternative could be to include a real C preprocessor in ccache so that you have full control on the output. For instance, I once integrated the C preprocessor from the pcc project ( http://pcc.ludd.ltu.se/ ) in a project of mine and that was quite simple (just a couple of C source files and BSD license).

In the mean time it's probably wiser to at least have precompiled header as an opt-in option. Since precompiled headers and ccache share the same objective of saving compilation time, using both approaches at the same time look quite redundant anyway :)
Comment 5 Joel Rosdahl 2014-01-08 19:44:57 UTC
(In reply to comment #4)
> Following include files is quite trivial imho. The C (C++) preprocessor include
> directive is guaranteed to be something like [ \t]*#[ \t]*include[ \t]+["<"].
> You need to recognize -I, -include, $CPATH and so forth but you already handle
> this.

No, and that's the thing: ccache doesn't handle -I, -include, -idirafter, etc. -- that's handled by the preprocessor. That way ccache doesn't have to have (as much) compiler-specific knowledge about compiler options that affect preprocessing.

> This does not look terribly difficult, does it? (of course, I understand
> that this requires some development and much testing since this is a major
> change).

Yes, it looks difficult to me. Not parsing the source code but correctly emulating different compilers' preprocessor behavior.

> Also, you have the "gcc -E -dD" option that would keep the #define directives,
> but I'm afraid this is gcc specific and is not supported by e.g. clang.

Or the SUN Studio compiler, the HP compiler, Intel's compilers and probably more.

> Another alternative could be to include a real C preprocessor in ccache so that
> you have full control on the output. For instance, I once integrated the C
> preprocessor from the pcc project ( http://pcc.ludd.ltu.se/ ) in a project of
> mine and that was quite simple (just a couple of C source files and BSD
> license).

As mentioned above, I don't think that it's that part that is difficult.

> In the mean time it's probably wiser to at least have precompiled header as an
> opt-in option. Since precompiled headers and ccache share the same objective of
> saving compilation time, using both approaches at the same time look quite
> redundant anyway :)

Right.
Comment 6 Joel Rosdahl 2014-01-08 20:30:11 UTC
Made precompiled header handling even more opt-in: https://git.samba.org/?p=ccache.git;a=commit;h=708d9110a103bd49437be7bff1e82697fff68d0b
Comment 7 Joel Rosdahl 2014-08-16 11:51:22 UTC
Reopening the bug.

As noted by Mark Fieldhouse on the ccache list: ccache needs to avoid caching a precompiled header (unless opted in), not avoid using it.
Comment 9 Joel Rosdahl 2014-10-19 18:37:18 UTC
Fix included in v3.1.10.