Обсуждение: Test timings are increasing too fast for cfbot
Hi,
I was working on giving at least committers, and perhaps a few more folks,
access to our custom compute setup for running CI, as I had gotten a few
reports about folks running out of CI cycles during busy periods like right
now.
I was thinking that we had enough compute credits to make that possible. Which
I luckily, prodded by a question from Melanie, did check.
Unfortunately it seems that our usage has increased substantially enough to
make that a problem, even with just cfbot.
Most of the test are quite cheap to run, however windows costs a fair bit in
licensing cost (we spend 2x on windows than we do for compute, ram and storage
for all the tests together). There's a few other things we can improve
(e.g. we spend a fair bit on cross region data transfers, which should largely
be fixable). But we do need to get the increases in windows runtime under
control.
Here's a look at the test timings. Unfortunately the trends are decidedly not
good, particularly on windows. Looking at a few random cross-branch commits
and looking only at the test times (since build times are affected by caching
etc).
CI msvc CI mingw CI linux andres-local
17 11:21 10:38 3:32 1:22
18 16:03 14:22 5:39 1:38
master 18:13 15:00 6:69 1:49
I checked this across a few commits and got similar numbers
everywhere. There's a lot of noise, but the trendline is pretty darn clear.
Due to the number of times we run the main regression tests, they have a
particularly large effect on test resources.
time cycles syscalls
17 37.13 239.2T 1.573M
18 44.27 295.7T 1.715M
master 48.19 323.5T 1.854M
Another interesting aspect is the time spent doing initdb has gone up
noticeably:
assert optimize
17 1.934 819.1
18 2.190 827.9
master 2.274 814.5
Which, given the number of times initdb is executed in the tests (even with
the template mechanism in place), probably is a factor in all this. It's
certainly interesting that somehow assertions play a major role, given that
the non-assert time hasn't regressed.
Some increase in test cycles and time is inevitable, but to me the current
rate of increase is not sustainable / faster than the rate at which hardware
speed increases.
Greetings,
Andres Freund
Hi,
On 2026-03-24 17:30:46 -0400, Andres Freund wrote:
> Due to the number of times we run the main regression tests, they have a
> particularly large effect on test resources.
>
> time cycles syscalls
> 17 37.13 239.2T 1.573M
> 18 44.27 295.7T 1.715M
> master 48.19 323.5T 1.854M
A surprisingly large source of this is psql internal queries. I first ran the
regression tests with pg_stat_statements activated [1] and noticed that a lot
of the queries, by exec and or plan time, are psql queries.
I patched psql to add "-- psql internal\n" to every internal query and then
used log_min_duration_statement=0 to see how much of the time is spent in psql
- ~13% of the reported duration of all queries.
A lot of that is because describeTbleDetails() ends up with a sequential scan:
regression[1020428][1]=# EXPLAIN ANALYZE SELECT c.oid,
n.nspname,
c.relname
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname OPERATOR(pg_catalog.~) '^(pg_class)$' COLLATE pg_catalog.default
AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 2, 3;
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN
│
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Sort (cost=324.79..324.80 rows=1 width=132) (actual time=7.192..7.194 rows=1.00 loops=1)
│
│ Sort Key: n.nspname, c.relname
│
│ Sort Method: quicksort Memory: 25kB
│
│ Buffers: shared hit=234
│
│ -> Nested Loop Left Join (cost=0.00..324.78 rows=1 width=132) (actual time=2.489..7.121 rows=1.00 loops=1)
│
│ Join Filter: (n.oid = c.relnamespace)
│
│ Buffers: shared hit=231
│
│ -> Seq Scan on pg_class c (cost=0.00..313.79 rows=1 width=72) (actual time=2.473..7.105 rows=1.00 loops=1)
│
│ Filter: ((relname ~ '^(pg_class)$'::text) AND pg_table_is_visible(oid))
│
│ Rows Removed by Filter: 2260
│
│ Buffers: shared hit=229
│
│ -> Seq Scan on pg_namespace n (cost=0.00..8.22 rows=222 width=68) (actual time=0.011..0.011 rows=1.00
loops=1)│
│ Buffers: shared hit=2
│
│ Planning:
│
│ Buffers: shared hit=6
│
│ Planning Time: 0.520 ms
│
│ Execution Time: 7.265 ms
│
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(17 rows)
It seems decidedly not optimal that "\d tablename", without any patterns, ends
up doing a seqscan. That's bad enough in the regression database, but there
are many PG instances with many many entries in pg_class.
I don't think this was always the case?
If I remove the COLLATE pg_catalog.default, a sane plan is chosen. That's
obviously not the right fix, but seemed interesting enough to mention.
That alone seems makes a surprising 4.5% improvement in an unoptimized -O0
build.
Another psql query that stands out is \d's query for publications. Not because
of runtime, but because of the planning cost. Nuking that (obviously not
correct), is worth another 3% of test time (and some test failues).
I've not analyzed the psql queries any further, but I'd bet that the rest also
do add up.
The most expensive query in the entire regression tests is:
DO $$
BEGIN
FOR r IN 1..1350 LOOP
DELETE FROM dedup_unique_test_table;
INSERT INTO dedup_unique_test_table SELECT 1;
END LOOP;
END$$;
which makes sense, as that's pretty clearly O(N^2).
There are a surprisingly large number of temporary file creations in the
regression tests. A lot of them due to shared file sets in the context of
parallelism. I wonder whether that is a partial cause for the slowness on
windows, which has very slow metadata operations and slower data caching.
The slowest test is stats_ext.sql - Not surprising, it does sequential scans
of tables with ~1000-10000 rows over and over again. I don't see why it has
to do that with as many rows as it does.
Another thing we spend a decent amount of time, distributed over many places,
is FROM generate_series(x, y), which ends up slow due to nodeFunctionscan.c
always using a tuplestore.
Code wise, the most immediately noticeable things are
1) MEMORY_CONTEXT_CHECKING
2) AssertCheckRanges() (only in the brin test, but there a very large portion
of the runtime)
3) verify_compact_attribute(), pretty spread around
Greetings,
Andres Freund
[1] which annoyingly generates a bunch of test failures due to "Query
Identifier" getting added to tests
Hi, On Tue, Mar 24, 2026 at 11:15:32PM -0400, Andres Freund wrote: > > A surprisingly large source of this is psql internal queries. I first ran the > regression tests with pg_stat_statements activated [1] and noticed that a lot > of the queries, by exec and or plan time, are psql queries. > > [1] which annoyingly generates a bunch of test failures due to "Query > Identifier" getting added to tests FWIW `compute_query_id = regress` is supposed to avoid that problem.
(forked from: Test timings are increasing too fast for cfbot) On Wed, 25 Mar 2026 at 04:15, Andres Freund <andres@anarazel.de> wrote: > It seems decidedly not optimal that "\d tablename", without any patterns, ends > up doing a seqscan. That's bad enough in the regression database, but there > are many PG instances with many many entries in pg_class. > > I don't think this was always the case? > > If I remove the COLLATE pg_catalog.default, a sane plan is chosen. That's > obviously not the right fix, but seemed interesting enough to mention. Due to a very similar problem I faced in the past[1], I thought I had a good sense of where roughly the problem was. And I indeed quickly found it. Attached is a patch that addresses this issue and starts using index scans again for \d tablename. This should be backpatched to PG18 where the regression was introduced by 85b7efa1cdd [1]: https://www.postgresql.org/message-id/flat/CAGECzQRqysy0eJMKR5he3gwtLrT87f9u5CQQua6B_XNwMnUtFA%40mail.gmail.com
Вложения
On 3/25/26 04:15, Andres Freund wrote: > ... > > The slowest test is stats_ext.sql - Not surprising, it does sequential scans > of tables with ~1000-10000 rows over and over again. I don't see why it has > to do that with as many rows as it does. > IIRC we needed to use non-trivial amounts of data to ensure building the right right type of statistics (e.g. no MCV). But I understand it can be annoyingly expensive, so I'll try to make it cheaper. > ... > 2) AssertCheckRanges() (only in the brin test, but there a very large portion > of the runtime) True. It is a very comprehensive validation of the ranges, and it was very useful - particularly during development. But I'll try to make it more targeted at the stuff actually changed / called less often. Both changes will require time (so that we don't lose test coverage), but I assume it's OK if that happens sometime after the feature freeze. regards -- Tomas Vondra
Hi, On 2026-03-25 15:38:01 +0100, Tomas Vondra wrote: > On 3/25/26 04:15, Andres Freund wrote: > > ... > > > > The slowest test is stats_ext.sql - Not surprising, it does sequential scans > > of tables with ~1000-10000 rows over and over again. I don't see why it has > > to do that with as many rows as it does. > > > > IIRC we needed to use non-trivial amounts of data to ensure building the > right right type of statistics (e.g. no MCV). But I understand it can be > annoyingly expensive, so I'll try to make it cheaper. Great. It might also be worth checking if any of the query conditions can be made cheaper without actually changing what they do test. > > ... > > 2) AssertCheckRanges() (only in the brin test, but there a very large portion > > of the runtime) > > True. It is a very comprehensive validation of the ranges, and it was > very useful - particularly during development. But I'll try to make it > more targeted at the stuff actually changed / called less often. Makes sense. There might also be some changs that make it faster without loosing any coverage. E.g. not using FunctionCall2Coll() - which initializes stuff on every call - but doing InitFunctionCallInfoData() once and then update arguments + FunctionCallInvoke() for each call. > Both changes will require time (so that we don't lose test coverage), > but I assume it's OK if that happens sometime after the feature freeze. Yea. We're not running out of credits tomorrow or something like that. Greetings, Andres Freund
On Wed, 25 Mar 2026 at 16:15, Andres Freund <andres@anarazel.de> wrote: > Code wise, the most immediately noticeable things are > 3) verify_compact_attribute(), pretty spread around We do now have TupleDescFinalize(), where those could be checked just once rather than on each call to TupleDescCompactAttr(). That's not quite as watertight a guarantee as someone could change the FormData_pg_attribute after TupleDescFinalize(). Just doing it in TupleDescFinalize() would at least still catch the places where people forget to call populate_compact_attribute() before TupleDescFinalize(). I would have expected this to be a little less overhead now since d8a859d22 removed the calls to TupleDescCompactAttr() in the main deforming routine. Maybe I should just make that change in the other deformers...? Do you have an idea of which callers of verify_compact_attribute() are causing the most overhead? David
On 2026-03-26 11:09:36 +1300, David Rowley wrote: > On Wed, 25 Mar 2026 at 16:15, Andres Freund <andres@anarazel.de> wrote: > > Code wise, the most immediately noticeable things are > > > 3) verify_compact_attribute(), pretty spread around > > We do now have TupleDescFinalize(), where those could be checked just > once rather than on each call to TupleDescCompactAttr(). That's not > quite as watertight a guarantee as someone could change the > FormData_pg_attribute after TupleDescFinalize(). Just doing it in > TupleDescFinalize() would at least still catch the places where people > forget to call populate_compact_attribute() before > TupleDescFinalize(). Maybe verify_compact_attribute() could just do an assert comparison between the underlying non-compact attribute and the compact one? Or even just an assert checking if the compact attribute is initialized, with the full checking happening in TupleDescFinalize(), as you suggest? > I would have expected this to be a little less overhead now since > d8a859d22 removed the calls to TupleDescCompactAttr() in the main > deforming routine. Maybe I should just make that change in the other > deformers...? Might be worth it. > Do you have an idea of which callers of verify_compact_attribute() are > causing the most overhead? I'm not sure how much to believe the profile when the costs are as distributed as they are here. But according to the profile it's - heap_form_tuple - heap_form_minimal_tuple - index_getattr - nocachegetattr Greetings, Andres Freund
Hi, On 2026-03-25 12:36:12 +0800, Julien Rouhaud wrote: > On Tue, Mar 24, 2026 at 11:15:32PM -0400, Andres Freund wrote: > > > > A surprisingly large source of this is psql internal queries. I first ran the > > regression tests with pg_stat_statements activated [1] and noticed that a lot > > of the queries, by exec and or plan time, are psql queries. > > > > [1] which annoyingly generates a bunch of test failures due to "Query > > Identifier" getting added to tests > > FWIW `compute_query_id = regress` is supposed to avoid that problem. Hm. I guess that should work. Thanks! It's pretty annoying that one has to manually specify it to keep the tests passing. I wonder if we should make pg_regress specify it to avoid that. Greetings, Andres Freund