Forwarding bug report from Red Hat Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=610853 --- Description of problem: I had my home directory (and thus ~/.ccache) mounted on an NFS share, to share it between machines. I was then running the autoconf testsuite with high parallelism (number of active cores + 2), and noticed that I was having windows of processor utilization dropping to nearly 0%, with tests taking a LOONG time to complete. Upon investigation, I noticed that when the tests were sluggish, 'time gcc --version' would take 30 seconds. Version-Release number of selected component (if applicable): $ rpm -q gcc ccache gcc-4.4.4-10.fc14.x86_64 ccache-3.0-0.2.pre1.fc14.x86_64 How reproducible: very Steps to Reproduce: 1. Point ~/.ccache to an NFSv3 mount. 2. git clone git://git.sv.gnu.org/autoconf.git 3. cd autoconf 4. autoreconf -vfi 5. make 6. make check TESTSUITEFLAGS=-j$(($(nproc) + 2)) 7. monitor processor utilization during the exercise Actual results: During sequences where multiple processes are trying to use gcc at once (around test 250 or so in the autoconf testsuite), I noticed that processor utilization was severely dropping, and tests were taking forever to complete. Investigating partial test output to date showed that tests were getting stuck on 'gcc --version', and I was able to reproduce this in another console, with 'time gcc --version' showing 30 seconds of elapsed time. Using both strace and ltrace showed that a slow 'gcc --version' was invariably getting stuck on fcntl() call in this portion of the process: open("/home/remote/eblake/.ccache/stats", O_RDWR) = 4 fcntl(4, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1}) = 0 In other words, the act of trying to lock ~/.ccache/stats is causing lock contention over NFS, which results in long timeouts for things to serialize correctly, and as a result, ccache performance was needlessly suffering. Expected results: Testsuite should complete within a few minutes, with nearly 100% processor utilization on all cores during the test. File locking should NOT cause such a severe performance degradation, particularly for something as trivial as 'gcc --version'. Furthermore, using fcntl for file locking is inherently broken: http://0pointer.de/blog/projects/locking.html If ccache needs locking, it should use alternatives such as atomic mkdir() or symlink() calls, rather than fcntl() locking, particularly if ~/.ccache is not a local drive. Additional info: I was able to work around the issue by relocating ~/.ccache to be a symlink to a local directory, at which point NFS locking speed no longer interferes, and my autoconf testsuite completed faster.
ccache 3.1 will contain two changes to tackle the problem: 1. Update one of the 16 $CCACHE_DIR/[0-9a-f]/stats files for things like "gcc --version" in one of the 16 subdirectories (selected pseudo-randomly) instead of $CCACHE_DIR/stats. This will reduce lock contention. 2. As suggested, use symlinks for locking instead of POSIX locks.
Implemented and will be in ccache 3.1.