Обсуждение: A serious change in performance between PG 15 and PG 16, 17, 18.

Поиск
Список
Период
Сортировка

A serious change in performance between PG 15 and PG 16, 17, 18.

От
Clive Boughton
Дата:
To whom it may concern,

The system I'm referring to has been built by my company beginning in 2019/2020 with PG 10.2.

The SQL functionality and related tables contained within the PG DB has changed little since initial delivery.
There a very few complex forms of SQL used and the tables are quite simple - the largest being ~400,000 rows based directly on ~50,000 instances of the primary key.
The functions do not update anything in this 'large' table (from which data is only read) but do produce several significantly smaller tables - the largest being ~ 40 columns x 50-80 rows about 80% of which are fixed point numbers (6 decimal places).
Initially (in 2020 before an indexing was added) the time to produce and output the overall result was 30-60 minutes.  After adding quite basic indexing, that time was reduced to 1 or 2 minutes.
 
PG 10.2 remained in place until 2023 when the DB was upgraded to PG 15.2 with no problems. In fact performance increased.

When I recently upgraded to PG 18.1 (without any changes to the DB functions or tables, or indexing) it took 7-10 hours to produce the same result - a 300-600 times decrease in performance.

My first step was to reduce the jump in versioning of PG from 15 to 18 down to 15 - 16.  This action produced the same very poor performance result as when using PG 18.1

To try and save time I employed the help of Claude Pro (an excellent AI tool for software development).  After implementing all sorts of suggestions (indexing, configuring etc.) and undertaking all sorts of checks (using available PG functions), the conclusion is that some significant regression was introduced in PG 16, and hasn't been removed in (at least) PG 18 (I ddn't try PG 17).  Of course, I do understand that bringing Claude Pro into the fray may have its own issues, but Claude's suggestions and explanations do make sense and at the very least may help a PG expert to more easily hone in on the path to take to fix the issue.

Sticking with PG 15 in the short term is fine for us.  However, the current performance of PG 16 - PG 18 is totally unacceptable in terms of expectations.

As an aside we did increase the number of elements by 30% within the largest table (mentioned above).  So that instead of ~400,000 rows there were ~550,000 rows to read from.  The number of instances of the primary key was not changed.
This action (for PG 15) only added an extra 20-30 seconds to the production of the result - more or less as expected.  Although the same actions were performed on PG 18, after several hours we simply killed the process.

BTW, the 'tests' I did were carried out in both a MacOS (Sequoia) and a Linux (Rocky Linux 9.6) environment.

For reasons of security I can't provide all the SQL code, but I can (if it is necessary) provide an outline of what the code does.

Regards.

Clive.

Dr Clive Boughton

Software Improvements

Street address:
97 Bankers Road, NSW 2621
Australia

Mobile Phone: +61 (0)410 632 055
Telephone: +61 (0)2 6230 3195




Re: A serious change in performance between PG 15 and PG 16, 17, 18.

От
Tom Lane
Дата:
Clive Boughton <clive.boughton@softimp.com.au> writes:
> When I recently upgraded to PG 18.1 (without any changes to the DB functions or tables, or indexing) it took 7-10
hoursto produce the same result - a 300-600 times decrease in performance. 

This complaint is completely un-actionable for lack of relevant
details.  There are some hints here about describing performance
problems usefully:

https://wiki.postgresql.org/wiki/Slow_Query_Questions

            regards, tom lane



Re: A serious change in performance between PG 15 and PG 16, 17, 18.

От
David Rowley
Дата:
On Mon, 16 Feb 2026 at 16:49, Clive Boughton
<clive.boughton@softimp.com.au> wrote:
> When I recently upgraded to PG 18.1 (without any changes to the DB functions or tables, or indexing) it took 7-10
hoursto produce the same result - a 300-600 times decrease in performance.
 

> For reasons of security I can't provide all the SQL code, but I can (if it is necessary) provide an outline of what
thecode does.
 

It will be necessary to provide more than you have done so far.
There's just no way anyone could guess what might be happening based
on what you've provided.

I suggest looking at the EXPLAIN output and seeing if there's a plan
change between the versions. Perhaps the newer version is considering
a plan shape that previously wasn't considered, and that turns out not
to be great due to some estimation problem. I suspect if you're seeing
300-600x slowdown, then it's pretty unlikely to be the same plan.

It'd be good if you could share the EXPLAIN (ANALYZE, BUFFERS) from
before and after the upgrade.

Perhaps the issue is just that you didn't run ANALYZE after a pg_upgrade.

David



Re: A serious change in performance between PG 15 and PG 16, 17, 18.

От
"David G. Johnston"
Дата:
On Sunday, February 15, 2026, Clive Boughton <clive.boughton@softimp.com.au> wrote:

For reasons of security I can't provide all the SQL code, but I can (if it is necessary) provide an outline of what the code does.

If you cannot provide at least explain analyze, before and after, the odds of making progress on this is greatly diminished.  A minimal reproducer you can verify and share is exponentially better though.

It’s probably worth ensuring JIT is disabled even going off this minimal information.

David J.

Re: A serious change in performance between PG 15 and PG 16, 17, 18.

От
Andrei Lepikhov
Дата:
On 16/2/26 04:49, Clive Boughton wrote:
> For reasons of security I can't provide all the SQL code, but I can (if 
> it is necessary) provide an outline of what the code does.
In principle, you can provide EXPLAIN and schema in a private email so 
we can invent a synthetic example for public discussion based on your 
problem (I wonder if it is related to one of the typical Postgres issues 
like index picking or Subplan transformation).
I already had such an experience with a company from Sydney, and it 
worked well.

-- 
regards, Andrei Lepikhov,
pgEdge



Re: A serious change in performance between PG 15 and PG 16, 17, 18.

От
Joe Conway
Дата:
On 2/15/26 22:49, Clive Boughton wrote:
> After adding quite basic indexing, that time was reduced to 1 or 2
> minutes. PG 10.2 remained in place until 2023 when the DB was
> upgraded to PG 15.2 with no problems. In fact performance increased.

> When I recently upgraded to PG 18.1 (without any changes to the DB 
> functions or tables, or indexing) it took 7-10 hours to produce the same 
> result - a *300-600 times decrease in performance*.

> My first step was to reduce the jump in versioning of PG from 15 to 18 
> down to 15 - 16.  This action produced the same very poor performance 
> result as when using PG 18.1


Are some of your indexes on collatable columns? If so, what version of 
glibc is on each of your systems (i.e. did you change OS major versions)?


-- 
Joe Conway
PostgreSQL Contributors Team
Amazon Web Services: https://aws.amazon.com



Re: A serious change in performance between PG 15 and PG 16, 17, 18.

От
David Rowley
Дата:
On Tue, 17 Feb 2026 at 04:42, Joe Conway <mail@joeconway.com> wrote:
> Are some of your indexes on collatable columns? If so, what version of
> glibc is on each of your systems (i.e. did you change OS major versions)?

For the record, Clive did send me the query and EXPLAIN output
offline. I asked him if he'd anonymise it and post it here, but no
response yet.

I'll just describe what I saw so as not to share details that Clive
isn't happy making public. The query is to a single table, no joins.
The filter is equality on an INT column. The v15 plan uses a Seq Scan
and runs in 175ms. v18 is using a Bitmap Heap Scan using the table's
primary key index. The PK must be a composite key and the WHERE clause
must be on the first column of that key. That query runs in 305ms, so
more like x2 rather than the reported 300-600x. v18 estimates slightly
fewer rows than v15, so that might be why it switched plans to the
bitmap heap scan. That could be down to random sampling from ANALYZE
finding fewer rows that match the query's WHERE clause in v18.

I'd recommend Clive to try running ANALYZE on the table and seeing if
the plan changes. Repeat that a few times to see if the Seq Scan plan
is ever picked. I'd also check if the random_page_cost and
seq_page_cost settings are the same on both instances and ensure
enable_seqscan is on.

David



Re: A serious change in performance between PG 15 and PG 16, 17, 18.

От
Clive Boughton
Дата:
David,

I've handed over further communication with you (and others) to Ji Zhang, my (better) technical colleague.
He is able to determine any issues surrounding open/closed publication.

Clive.

Dr Clive Boughton

Software Improvements

Street address:
97 Bankers Road, NSW 2621
Australia

Mobile Phone: +61 (0)410 632 055
Telephone: +61 (0)2 6230 3195




On 18 Feb 2026, at 11:01 am, David Rowley <dgrowleyml@gmail.com> wrote:

On Tue, 17 Feb 2026 at 04:42, Joe Conway <mail@joeconway.com> wrote:
Are some of your indexes on collatable columns? If so, what version of
glibc is on each of your systems (i.e. did you change OS major versions)?

For the record, Clive did send me the query and EXPLAIN output
offline. I asked him if he'd anonymise it and post it here, but no
response yet.

I'll just describe what I saw so as not to share details that Clive
isn't happy making public. The query is to a single table, no joins.
The filter is equality on an INT column. The v15 plan uses a Seq Scan
and runs in 175ms. v18 is using a Bitmap Heap Scan using the table's
primary key index. The PK must be a composite key and the WHERE clause
must be on the first column of that key. That query runs in 305ms, so
more like x2 rather than the reported 300-600x. v18 estimates slightly
fewer rows than v15, so that might be why it switched plans to the
bitmap heap scan. That could be down to random sampling from ANALYZE
finding fewer rows that match the query's WHERE clause in v18.

I'd recommend Clive to try running ANALYZE on the table and seeing if
the plan changes. Repeat that a few times to see if the Seq Scan plan
is ever picked. I'd also check if the random_page_cost and
seq_page_cost settings are the same on both instances and ensure
enable_seqscan is on.

David