Обсуждение: headerscheck ccache support

Поиск
Список
Период
Сортировка

headerscheck ccache support

От
Peter Eisentraut
Дата:
Currently, headerscheck and cpluspluscheck are very slow, and they 
defeat use of ccache.  I have fixed that, and now they are much faster. :-)

The problem was (I think) that the test files are created in a 
randomly-named directory (`mktemp -d /tmp/$me.XXXXXX`), and this 
directory is named on the compiler command line, which is part of the 
cache key.

My solution is to create the test files in the build directory.  For 
example, for src/include/storage/ipc.h I generate

headerscheck_src_include_storage_ipc_h.c (or .cpp)

Now ccache works.  (And it's also a bit easier to debug everything with 
this naming.)

The observed speedup on Cirrus CI for headerscheck plus cpluspluscheck 
is from about 1min 20s to only 20s.  In local use, the speedups are similar.

(I noticed that on Cirrus CI, the first uncached run with this patch was 
quite a bit slower.  I don't know if this was because of the additional 
cache population that was happening, or if it was a fluke.  Maybe others 
can try it in their environments.  Maybe it's not a problem in the long 
run.)

Вложения

Re: headerscheck ccache support

От
Álvaro Herrera
Дата:
On 2025-Nov-21, Peter Eisentraut wrote:

> Currently, headerscheck and cpluspluscheck are very slow, and they defeat
> use of ccache.  I have fixed that, and now they are much faster. :-)

Yeah, I had noticed this too.  Thanks for fixing it.

> My solution is to create the test files in the build directory.  For
> example, for src/include/storage/ipc.h I generate
> 
> headerscheck_src_include_storage_ipc_h.c (or .cpp)
> 
> Now ccache works.

Sounds reasonable.  I notice that you're cleaning this file in a `rm`
line in the loop,

> @@ -253,10 +249,11 @@ do
>      if ! $COMPILER $COMPILER_FLAGS -I $builddir -I $srcdir \
>          -I $builddir/src/include -I $srcdir/src/include \
>          -I $builddir/src/interfaces/libpq -I $srcdir/src/interfaces/libpq \
> -        $EXTRAINCLUDES $EXTRAFLAGS -c $tmp/test.$ext -o $tmp/test.o
> +        $EXTRAINCLUDES $EXTRAFLAGS -c $test_file_name.$ext -o $test_file_name.o
>      then
>          exit_status=1
>      fi
> +    rm -f "$test_file_name.$ext" "$test_file_name.o"
>  done

but this means that if the script is interrupted halfway through, one
file or two files might remain in place.  Would it be possible to have
the current file name in a variable, so that the `trap` line can delete
them?

I've been also wondering about testing whether `parallel` is installed,
and use that if so.

>    # Verify headerscheck / cpluspluscheck succeed
>    #
> -  # - Don't use ccache, the files are uncacheable, polluting ccache's
> -  #   cache

So how bad is the effect of the cache pollution that's now going to
occur?  I imagine we don't really care.  (I have no idea how to measure
this, and probably it's a waste of time just to try.)

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"E pur si muove" (Galileo Galilei)



Re: headerscheck ccache support

От
Andres Freund
Дата:
Hi,

On 2025-11-21 11:48:10 +0100, Peter Eisentraut wrote:
> Currently, headerscheck and cpluspluscheck are very slow, and they defeat
> use of ccache.  I have fixed that, and now they are much faster. :-)
> 
> The problem was (I think) that the test files are created in a
> randomly-named directory (`mktemp -d /tmp/$me.XXXXXX`), and this directory
> is named on the compiler command line, which is part of the cache key.
> 
> My solution is to create the test files in the build directory.  For
> example, for src/include/storage/ipc.h I generate
> 
> headerscheck_src_include_storage_ipc_h.c (or .cpp)
> 
> Now ccache works.  (And it's also a bit easier to debug everything with this
> naming.)

I applaud this effort.  I think long-term I want this stuff properly
integrated into the build system, but this is a good step on its own.


> The observed speedup on Cirrus CI for headerscheck plus cpluspluscheck is
> from about 1min 20s to only 20s.  In local use, the speedups are similar.
> 
> (I noticed that on Cirrus CI, the first uncached run with this patch was
> quite a bit slower.  I don't know if this was because of the additional
> cache population that was happening, or if it was a fluke.  Maybe others can
> try it in their environments.  Maybe it's not a problem in the long run.)

That doesn't surprise me - one of the major throttling factors on CI is the
number of iops. Filling an empty cache drastically increases iops, due to
having to create all the cache files (there are multiple for each cache
entry).

Hm. I wonder whether we would gain performance if we changed the linux CI
image to use data=writeback instead of the default data=ordered, that can
really help with metadata heavy workloads.

Bilal, any chance you would try modifying the CI image generation to change
the default option for the root filesystem using tune2fs -o
journal_data_writeback (that's easier IME than changing the mount option, as
the root filesystem is mounted very early, in the initramfs, before fstab is
known). Or perhaps just do it interactively in a VM (needs a reboot though)?

Greetings,

Andres Freund



Re: headerscheck ccache support

От
Andres Freund
Дата:
Hi,

On 2025-11-21 13:14:18 +0100, Álvaro Herrera wrote:
> On 2025-Nov-21, Peter Eisentraut wrote:
> >    # Verify headerscheck / cpluspluscheck succeed
> >    #
> > -  # - Don't use ccache, the files are uncacheable, polluting ccache's
> > -  #   cache
> 
> So how bad is the effect of the cache pollution that's now going to
> occur?  I imagine we don't really care.  (I have no idea how to measure
> this, and probably it's a waste of time just to try.)

I don't think there's any cache pollution after this change - the pollution
the comment was referencing was that ccache's cache would be filled with
entries for files that would *never* be used and would therefore be
useless. With this change the added use of ccache will just accelerate the CI
runs, which I wouldn't consider polluting...

Greetings,

Andres Freund



Re: headerscheck ccache support

От
Álvaro Herrera
Дата:
On 2025-Nov-21, Andres Freund wrote:

> On 2025-11-21 13:14:18 +0100, Álvaro Herrera wrote:

> > So how bad is the effect of the cache pollution that's now going to
> > occur?
> 
> I don't think there's any cache pollution after this change - the
> pollution the comment was referencing was that ccache's cache would be
> filled with entries for files that would *never* be used and would
> therefore be useless.

Ah, that makes sense.

> With this change the added use of ccache will just accelerate the CI
> runs, which I wouldn't consider polluting...

Sure.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"If you have nothing to say, maybe you need just the right tool to help you
not say it."                   (New York Times, about Microsoft PowerPoint)



Re: headerscheck ccache support

От
Thomas Munro
Дата:
On Fri, Nov 21, 2025 at 11:48 PM Peter Eisentraut <peter@eisentraut.org> wrote:
> Currently, headerscheck and cpluspluscheck are very slow, and they
> defeat use of ccache.  I have fixed that, and now they are much faster. :-)
>
> The problem was (I think) that the test files are created in a
> randomly-named directory (`mktemp -d /tmp/$me.XXXXXX`), and this
> directory is named on the compiler command line, which is part of the
> cache key.
>
> My solution is to create the test files in the build directory.  For
> example, for src/include/storage/ipc.h I generate
>
> headerscheck_src_include_storage_ipc_h.c (or .cpp)
>
> Now ccache works.  (And it's also a bit easier to debug everything with
> this naming.)
>
> The observed speedup on Cirrus CI for headerscheck plus cpluspluscheck
> is from about 1min 20s to only 20s.  In local use, the speedups are similar.

+1

I wrote an almost identical patch[1] and then lost it down the back of
the sofa.  I was wondering about parallelising it next...

[1] https://www.postgresql.org/message-id/CA%2BhUKGJjQyZUvcu6udk5OKz5rnaF4a_hm5nb_VtZHYMH%2BvsN0g%40mail.gmail.com



Re: headerscheck ccache support

От
Peter Eisentraut
Дата:
On 22.11.25 09:54, Thomas Munro wrote:
> On Fri, Nov 21, 2025 at 11:48 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>> Currently, headerscheck and cpluspluscheck are very slow, and they
>> defeat use of ccache.  I have fixed that, and now they are much faster. :-)
>>
>> The problem was (I think) that the test files are created in a
>> randomly-named directory (`mktemp -d /tmp/$me.XXXXXX`), and this
>> directory is named on the compiler command line, which is part of the
>> cache key.
>>
>> My solution is to create the test files in the build directory.  For
>> example, for src/include/storage/ipc.h I generate
>>
>> headerscheck_src_include_storage_ipc_h.c (or .cpp)
>>
>> Now ccache works.  (And it's also a bit easier to debug everything with
>> this naming.)
>>
>> The observed speedup on Cirrus CI for headerscheck plus cpluspluscheck
>> is from about 1min 20s to only 20s.  In local use, the speedups are similar.
> 
> +1
> 
> I wrote an almost identical patch[1] and then lost it down the back of
> the sofa.  I was wondering about parallelising it next...
> 
> [1] https://www.postgresql.org/message-id/CA%2BhUKGJjQyZUvcu6udk5OKz5rnaF4a_hm5nb_VtZHYMH%2BvsN0g%40mail.gmail.com

Ah yes, that's about the same idea.  The difference is that yours 
requires specifying TMPDIR on the make invocation.  So it wouldn't 
happen by default for local (non-CI) use.  I think I would like an 
implementation that also worked out of the box locally.