Обсуждение: headerscheck ccache support
Currently, headerscheck and cpluspluscheck are very slow, and they defeat use of ccache. I have fixed that, and now they are much faster. :-) The problem was (I think) that the test files are created in a randomly-named directory (`mktemp -d /tmp/$me.XXXXXX`), and this directory is named on the compiler command line, which is part of the cache key. My solution is to create the test files in the build directory. For example, for src/include/storage/ipc.h I generate headerscheck_src_include_storage_ipc_h.c (or .cpp) Now ccache works. (And it's also a bit easier to debug everything with this naming.) The observed speedup on Cirrus CI for headerscheck plus cpluspluscheck is from about 1min 20s to only 20s. In local use, the speedups are similar. (I noticed that on Cirrus CI, the first uncached run with this patch was quite a bit slower. I don't know if this was because of the additional cache population that was happening, or if it was a fluke. Maybe others can try it in their environments. Maybe it's not a problem in the long run.)
Вложения
On 2025-Nov-21, Peter Eisentraut wrote: > Currently, headerscheck and cpluspluscheck are very slow, and they defeat > use of ccache. I have fixed that, and now they are much faster. :-) Yeah, I had noticed this too. Thanks for fixing it. > My solution is to create the test files in the build directory. For > example, for src/include/storage/ipc.h I generate > > headerscheck_src_include_storage_ipc_h.c (or .cpp) > > Now ccache works. Sounds reasonable. I notice that you're cleaning this file in a `rm` line in the loop, > @@ -253,10 +249,11 @@ do > if ! $COMPILER $COMPILER_FLAGS -I $builddir -I $srcdir \ > -I $builddir/src/include -I $srcdir/src/include \ > -I $builddir/src/interfaces/libpq -I $srcdir/src/interfaces/libpq \ > - $EXTRAINCLUDES $EXTRAFLAGS -c $tmp/test.$ext -o $tmp/test.o > + $EXTRAINCLUDES $EXTRAFLAGS -c $test_file_name.$ext -o $test_file_name.o > then > exit_status=1 > fi > + rm -f "$test_file_name.$ext" "$test_file_name.o" > done but this means that if the script is interrupted halfway through, one file or two files might remain in place. Would it be possible to have the current file name in a variable, so that the `trap` line can delete them? I've been also wondering about testing whether `parallel` is installed, and use that if so. > # Verify headerscheck / cpluspluscheck succeed > # > - # - Don't use ccache, the files are uncacheable, polluting ccache's > - # cache So how bad is the effect of the cache pollution that's now going to occur? I imagine we don't really care. (I have no idea how to measure this, and probably it's a waste of time just to try.) -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "E pur si muove" (Galileo Galilei)
Hi, On 2025-11-21 11:48:10 +0100, Peter Eisentraut wrote: > Currently, headerscheck and cpluspluscheck are very slow, and they defeat > use of ccache. I have fixed that, and now they are much faster. :-) > > The problem was (I think) that the test files are created in a > randomly-named directory (`mktemp -d /tmp/$me.XXXXXX`), and this directory > is named on the compiler command line, which is part of the cache key. > > My solution is to create the test files in the build directory. For > example, for src/include/storage/ipc.h I generate > > headerscheck_src_include_storage_ipc_h.c (or .cpp) > > Now ccache works. (And it's also a bit easier to debug everything with this > naming.) I applaud this effort. I think long-term I want this stuff properly integrated into the build system, but this is a good step on its own. > The observed speedup on Cirrus CI for headerscheck plus cpluspluscheck is > from about 1min 20s to only 20s. In local use, the speedups are similar. > > (I noticed that on Cirrus CI, the first uncached run with this patch was > quite a bit slower. I don't know if this was because of the additional > cache population that was happening, or if it was a fluke. Maybe others can > try it in their environments. Maybe it's not a problem in the long run.) That doesn't surprise me - one of the major throttling factors on CI is the number of iops. Filling an empty cache drastically increases iops, due to having to create all the cache files (there are multiple for each cache entry). Hm. I wonder whether we would gain performance if we changed the linux CI image to use data=writeback instead of the default data=ordered, that can really help with metadata heavy workloads. Bilal, any chance you would try modifying the CI image generation to change the default option for the root filesystem using tune2fs -o journal_data_writeback (that's easier IME than changing the mount option, as the root filesystem is mounted very early, in the initramfs, before fstab is known). Or perhaps just do it interactively in a VM (needs a reboot though)? Greetings, Andres Freund
Hi, On 2025-11-21 13:14:18 +0100, Álvaro Herrera wrote: > On 2025-Nov-21, Peter Eisentraut wrote: > > # Verify headerscheck / cpluspluscheck succeed > > # > > - # - Don't use ccache, the files are uncacheable, polluting ccache's > > - # cache > > So how bad is the effect of the cache pollution that's now going to > occur? I imagine we don't really care. (I have no idea how to measure > this, and probably it's a waste of time just to try.) I don't think there's any cache pollution after this change - the pollution the comment was referencing was that ccache's cache would be filled with entries for files that would *never* be used and would therefore be useless. With this change the added use of ccache will just accelerate the CI runs, which I wouldn't consider polluting... Greetings, Andres Freund
On 2025-Nov-21, Andres Freund wrote: > On 2025-11-21 13:14:18 +0100, Álvaro Herrera wrote: > > So how bad is the effect of the cache pollution that's now going to > > occur? > > I don't think there's any cache pollution after this change - the > pollution the comment was referencing was that ccache's cache would be > filled with entries for files that would *never* be used and would > therefore be useless. Ah, that makes sense. > With this change the added use of ccache will just accelerate the CI > runs, which I wouldn't consider polluting... Sure. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "If you have nothing to say, maybe you need just the right tool to help you not say it." (New York Times, about Microsoft PowerPoint)
On Fri, Nov 21, 2025 at 11:48 PM Peter Eisentraut <peter@eisentraut.org> wrote: > Currently, headerscheck and cpluspluscheck are very slow, and they > defeat use of ccache. I have fixed that, and now they are much faster. :-) > > The problem was (I think) that the test files are created in a > randomly-named directory (`mktemp -d /tmp/$me.XXXXXX`), and this > directory is named on the compiler command line, which is part of the > cache key. > > My solution is to create the test files in the build directory. For > example, for src/include/storage/ipc.h I generate > > headerscheck_src_include_storage_ipc_h.c (or .cpp) > > Now ccache works. (And it's also a bit easier to debug everything with > this naming.) > > The observed speedup on Cirrus CI for headerscheck plus cpluspluscheck > is from about 1min 20s to only 20s. In local use, the speedups are similar. +1 I wrote an almost identical patch[1] and then lost it down the back of the sofa. I was wondering about parallelising it next... [1] https://www.postgresql.org/message-id/CA%2BhUKGJjQyZUvcu6udk5OKz5rnaF4a_hm5nb_VtZHYMH%2BvsN0g%40mail.gmail.com
On 22.11.25 09:54, Thomas Munro wrote: > On Fri, Nov 21, 2025 at 11:48 PM Peter Eisentraut <peter@eisentraut.org> wrote: >> Currently, headerscheck and cpluspluscheck are very slow, and they >> defeat use of ccache. I have fixed that, and now they are much faster. :-) >> >> The problem was (I think) that the test files are created in a >> randomly-named directory (`mktemp -d /tmp/$me.XXXXXX`), and this >> directory is named on the compiler command line, which is part of the >> cache key. >> >> My solution is to create the test files in the build directory. For >> example, for src/include/storage/ipc.h I generate >> >> headerscheck_src_include_storage_ipc_h.c (or .cpp) >> >> Now ccache works. (And it's also a bit easier to debug everything with >> this naming.) >> >> The observed speedup on Cirrus CI for headerscheck plus cpluspluscheck >> is from about 1min 20s to only 20s. In local use, the speedups are similar. > > +1 > > I wrote an almost identical patch[1] and then lost it down the back of > the sofa. I was wondering about parallelising it next... > > [1] https://www.postgresql.org/message-id/CA%2BhUKGJjQyZUvcu6udk5OKz5rnaF4a_hm5nb_VtZHYMH%2BvsN0g%40mail.gmail.com Ah yes, that's about the same idea. The difference is that yours requires specifying TMPDIR on the make invocation. So it wouldn't happen by default for local (non-CI) use. I think I would like an implementation that also worked out of the box locally.