Обсуждение: unstable query plan on pg 16,17,18

Поиск
Список
Период
Сортировка

unstable query plan on pg 16,17,18

От
Attila Soki
Дата:
Hi there,

When upgrading from PostgreSQL 14.4, I noticed that one of my somewhat complex analytical queries sometimes gets an
inefficientplan under PostgreSQL 16, 17, and 18. 
Under 14.4, the query runs with a stable plan and completes in 19 to 22 seconds. In newer versions, the plan seems to
beunstable, sometimes the query completes in 17 to 20 seconds, sometimes it runs for 5 to 18 minutes with the
inefficientplan. 
This also happens even if the data is not significantly changed.

When I was preparing this email last week, I saw another thread on a similar topic. This email is unrelated to that
thread.

After some experiments, I have now 4 cloned VM's one for each pg version. For pg 14.4, 16.11, 17.6, 18.2 and I can now
moreor less reproduce the plan flip. I use the vacuumdb command to trigger a flip, but under "production-like-test"
conditionsit seems to flip randomly. 
To test I created a bash script to run the analytical query via psql in an endless loop. The script prints out the
runtimeor after 28 seconds the query will be aborted. 
I also tried to get an inefficient plan on pg 14.4, for that, I must set default_statistics_target to 9 or less.
While pg 14.4 chooses (so far) always the inefficient plan with default_statistics_target < 10, pg 18.2 seems to be
ableto randomly produce the efficient plan with default_statistics_target=1 too. 

My questions are so far, where to begin, how can I find out why the plan changes and how can I find out what I must
changeto stabilize the plan. 
Any advice is welcome. I can provide some more info if needed, I can also recompile pg with debug, change settings,
etc.if that helps. 

The environment:
Postgresql runs in a development VM with 6 Core, 16 GB RAM.
OS is Debian 13: Linux d13pg18 6.12.63+deb13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.12.63-1 (2025-12-30) x86_64
GNU/Linux
Postgresql 18.2 compiled from source with the same settings as V14.4 (tried so far: 16.11, 17.6, 18.0, 18.1)
I am the only user in VM and on the Host too. During the test, only one of the VM is running.
All tables required by the query are cached. (atop: RAM Total 15.6G, Free: 12.0G, Cache: 2.7G)
All tables using the default statistics.
The database gets no changes, no other requests.
The analytical query makes no changes to the db (it only reads data, no temp tables, no materialized views, no temp
files).

Preparation:
- clone VM
- compile and install pg
- initdb
- create database, create role
- set default_statistics_target = 170 in postgresql.conf (I tried 1, 5, 9, 10, 50, 100, 170, 180, 190, 200 with no
difference.at 500 I could not get an OK plan, or not tried long enough) 
- (set other params in conf, mostly like in V14.4, or only for this test. see below)
- restart postgres
- psql < dump.sql
- vacuumdb -Upostgres -avz -j5
- restart VM


after that, when I execute "vacuumdb -Upostgres -vZ -t schema1.tbl_used_in_query db1" sometimes the plan flips from
inefficientto OK or back. 
Once flipped, it stays stable for multiple runs.
To make it flip again I must execute the above command or "vacuumdb -Upostgres -vZa -j5".

The table "schema1.tbl_used_in_query" is one of the tables used in query. To trigger the flip the vacuumed table must
beone of the tables used by the query. 

For example, a real case:

After VM restart, the query is running in endless loop. The data is not changing.

## 1x run, the first, aborted after 28 sec. (the first run after a reboot is always aborted because of uncached data)
## PLAN OK (5x runs)

vacuumdb -Upostgres -vZ -t schema1.tbl_used_in_query db1
vacuumdb: vacuuming database "db1"
INFO:  analyzing "schema1.tbl_used_in_query"
INFO:  "tbl_used_in_query": scanned 51000 of 171971 pages, containing 977840 live rows and 0 dead rows; 51000 rows in
sample,3297257 estimated total rows 
INFO:  finished analyzing table "db1.schema1.tbl_used_in_query"
avg read rate: 58.830 MB/s, avg write rate: 0.232 MB/s
buffer usage: 26748 hits, 27651 reads, 109 dirtied
WAL usage: 507 records, 101 full page images, 662069 bytes, 0 buffers full
system usage: CPU: user: 3.55 s, system: 0.09 s, elapsed: 3.67 s

## PLAN WRONG (5x runs)

vacuumdb -Upostgres -vZ -t schema1.tbl_used_in_query db1
vacuumdb: vacuuming database "db1"
INFO:  analyzing "schema1.tbl_used_in_query"
INFO:  "tbl_used_in_query": scanned 51000 of 171971 pages, containing 977837 live rows and 0 dead rows; 51000 rows in
sample,3297247 estimated total rows 
INFO:  finished analyzing table "db1.schema1.tbl_used_in_query"
avg read rate: 57.663 MB/s, avg write rate: 0.234 MB/s
buffer usage: 26319 hits, 28077 reads, 114 dirtied
WAL usage: 502 records, 101 full page images, 654978 bytes, 0 buffers full
system usage: CPU: user: 3.64 s, system: 0.14 s, elapsed: 3.80 s

## PLAN WRONG (5x runs)

vacuumdb -Upostgres -vZ -t schema1.tbl_used_in_query db1
vacuumdb: vacuuming database "db1"
INFO:  analyzing "schema1.tbl_used_in_query"
INFO:  "tbl_used_in_query": scanned 51000 of 171971 pages, containing 977527 live rows and 0 dead rows; 51000 rows in
sample,3296202 estimated total rows 
INFO:  finished analyzing table "db1.schema1.tbl_used_in_query"
avg read rate: 61.615 MB/s, avg write rate: 0.253 MB/s
buffer usage: 26604 hits, 27785 reads, 114 dirtied
WAL usage: 416 records, 10 full page images, 205744 bytes, 0 buffers full
system usage: CPU: user: 3.38 s, system: 0.12 s, elapsed: 3.52 s

## PLAN OK (22x runs)

vacuumdb -Upostgres -vZ -t schema1.tbl_used_in_query db1
vacuumdb: vacuuming database "db1"
INFO:  analyzing "schema1.tbl_used_in_query"
INFO:  "tbl_used_in_query": scanned 51000 of 171971 pages, containing 978434 live rows and 0 dead rows; 51000 rows in
sample,3299260 estimated total rows 
INFO:  finished analyzing table "db1.schema1.tbl_used_in_query"
avg read rate: 55.409 MB/s, avg write rate: 0.229 MB/s
buffer usage: 26670 hits, 27639 reads, 114 dirtied
WAL usage: 477 records, 102 full page images, 655391 bytes, 0 buffers full
system usage: CPU: user: 3.75 s, system: 0.13 s, elapsed: 3.89 s

## PLAN WRONG (43x runs)

vacuumdb -Upostgres -vZ -t schema1.tbl_used_in_query db1
vacuumdb: vacuuming database "db1"
INFO:  analyzing "schema1.tbl_used_in_query"
INFO:  "tbl_used_in_query": scanned 51000 of 171971 pages, containing 978347 live rows and 0 dead rows; 51000 rows in
sample,3298967 estimated total rows 
INFO:  finished analyzing table "db1.schema1.tbl_used_in_query"
avg read rate: 57.883 MB/s, avg write rate: 0.248 MB/s
buffer usage: 26731 hits, 27584 reads, 118 dirtied
WAL usage: 484 records, 110 full page images, 719407 bytes, 0 buffers full
system usage: CPU: user: 3.59 s, system: 0.11 s, elapsed: 3.72 s

## PLAN OK (55x runs)


#postgresql.conf of this VM
escape_string_warning = off
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'

track_activities = on

logging_collector = on
log_directory = '/usr/local/foo/logs/pg_logs'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_min_messages = info
log_min_error_statement = warning
log_temp_files = 1

huge_pages=try
shared_buffers = 768MB
temp_buffers = 512MB
effective_cache_size = 4GB
work_mem = 768MB
hash_mem_multiplier = 2.5

maintenance_work_mem = 1GB
max_stack_depth = 4MB
max_locks_per_transaction=256

wal_buffers = -1

jit = off
max_worker_processes = 4
max_parallel_workers_per_gather = 4
max_parallel_workers = 4
effective_io_concurrency = 16
checkpoint_completion_target = 0.9

seq_page_cost = 1.0
random_page_cost = 1.2
default_statistics_target = 170

vacuum_cost_limit = 2000


Thanks

Regards,
Attila Soki





Re: unstable query plan on pg 16,17,18

От
Laurenz Albe
Дата:
On Mon, 2026-02-23 at 10:37 +0100, Attila Soki wrote:
> When upgrading from PostgreSQL 14.4, I noticed that one of my somewhat complex
> analytical queries sometimes gets an inefficient plan under PostgreSQL 16, 17, and 18.
> Under 14.4, the query runs with a stable plan and completes in 19 to 22 seconds.
> In newer versions, the plan seems to be unstable, sometimes the query completes
> in 17 to 20 seconds, sometimes it runs for 5 to 18 minutes with the inefficient plan.
> This also happens even if the data is not significantly changed.

This is very likely owing to a bad estimate.

Could you turn on "track_io_timing" and send us the EXPLAIN (ANALYZE, BUFFERS) output
for both the good and the bad plan?

Yours,
Laurenz Albe



Re: unstable query plan on pg 16,17,18

От
Attila Soki
Дата:
> On 23 Feb 2026, at 10:41, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Mon, 2026-02-23 at 10:37 +0100, Attila Soki wrote:
>> When upgrading from PostgreSQL 14.4, I noticed that one of my somewhat complex
>> analytical queries sometimes gets an inefficient plan under PostgreSQL 16, 17, and 18.
>> Under 14.4, the query runs with a stable plan and completes in 19 to 22 seconds.
>> In newer versions, the plan seems to be unstable, sometimes the query completes
>> in 17 to 20 seconds, sometimes it runs for 5 to 18 minutes with the inefficient plan.
>> This also happens even if the data is not significantly changed.
>
> This is very likely owing to a bad estimate.
>
> Could you turn on "track_io_timing" and send us the EXPLAIN (ANALYZE, BUFFERS) output
> for both the good and the bad plan?

Hi Laurenz,

Thank you for your reply. Here are the two explains.
In order to be able to publish the plans here, I have obfuscated the table and field names, but this is reversible, so
Ican provide more info if needed. 

plan-ok:
https://explain.depesz.com/s/hQvM

plan-wrong:
https://explain.depesz.com/s/uLvl

Best regards,
Attila




Re: unstable query plan on pg 16,17,18

От
Andrei Lepikhov
Дата:
On 23/2/26 10:41, Laurenz Albe wrote:
> On Mon, 2026-02-23 at 10:37 +0100, Attila Soki wrote:
>> When upgrading from PostgreSQL 14.4, I noticed that one of my somewhat complex
>> analytical queries sometimes gets an inefficient plan under PostgreSQL 16, 17, and 18.
>> Under 14.4, the query runs with a stable plan and completes in 19 to 22 seconds.
>> In newer versions, the plan seems to be unstable, sometimes the query completes
>> in 17 to 20 seconds, sometimes it runs for 5 to 18 minutes with the inefficient plan.
>> This also happens even if the data is not significantly changed.
> 
> This is very likely owing to a bad estimate.
> 
> Could you turn on "track_io_timing" and send us the EXPLAIN (ANALYZE, BUFFERS) output
> for both the good and the bad plan?

Since PG16, the optimiser changed a lot. So, there are plenty of 
possibilities that might happen - table statistics update, for example. 
So, we need at least EXPLAIN ANALYSE for 'good' and 'bad' cases to begin 
a discussion.

-- 
regards, Andrei Lepikhov,
pgEdge



Re: unstable query plan on pg 16,17,18

От
Attila Soki
Дата:
On 23 Feb 2026, at 16:54, Andrei Lepikhov <lepihov@gmail.com> wrote:

On 23/2/26 10:41, Laurenz Albe wrote:
On Mon, 2026-02-23 at 10:37 +0100, Attila Soki wrote:
When upgrading from PostgreSQL 14.4, I noticed that one of my somewhat complex
analytical queries sometimes gets an inefficient plan under PostgreSQL 16, 17, and 18.
Under 14.4, the query runs with a stable plan and completes in 19 to 22 seconds.
In newer versions, the plan seems to be unstable, sometimes the query completes
in 17 to 20 seconds, sometimes it runs for 5 to 18 minutes with the inefficient plan.
This also happens even if the data is not significantly changed.
This is very likely owing to a bad estimate.
Could you turn on "track_io_timing" and send us the EXPLAIN (ANALYZE, BUFFERS) output
for both the good and the bad plan?

Since PG16, the optimiser changed a lot. So, there are plenty of possibilities that might happen - table statistics update, for example. So, we need at least EXPLAIN ANALYSE for 'good' and 'bad' cases to begin a discussion.

Hi Andrei,

see my previous answer:

but here are the plans again:
In order to be able to publish the plans here, I have obfuscated the table and field names, but this is reversible, so I can provide more info if needed.

plan-ok:
https://explain.depesz.com/s/hQvM

plan-wrong:
https://explain.depesz.com/s/uLvl


regards,
Attila

Re: unstable query plan on pg 16,17,18

От
Laurenz Albe
Дата:
On Mon, 2026-02-23 at 16:10 +0100, Attila Soki wrote:
> > On 23 Feb 2026, at 10:41, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> >
> > On Mon, 2026-02-23 at 10:37 +0100, Attila Soki wrote:
> > > When upgrading from PostgreSQL 14.4, I noticed that one of my somewhat complex
> > > analytical queries sometimes gets an inefficient plan under PostgreSQL 16, 17, and 18.
> > > Under 14.4, the query runs with a stable plan and completes in 19 to 22 seconds.
> > > In newer versions, the plan seems to be unstable, sometimes the query completes
> > > in 17 to 20 seconds, sometimes it runs for 5 to 18 minutes with the inefficient plan.
> > > This also happens even if the data is not significantly changed.
> >
> > This is very likely owing to a bad estimate.
> >
> > Could you turn on "track_io_timing" and send us the EXPLAIN (ANALYZE, BUFFERS) output
> > for both the good and the bad plan?
>
> Thank you for your reply. Here are the two explains.
> In order to be able to publish the plans here, I have obfuscated the table and field names, but this is reversible,
soI can provide more info if needed. 
>
> plan-ok:
> https://explain.depesz.com/s/hQvM
>
> plan-wrong:
> https://explain.depesz.com/s/uLvl

Thanks.

The difference in the plans is under the "Subquery Scan on odg", starting with
plan node 50 (everything under the "Sort").  I suspect that the mis-estimate
that is at the root of the problem is here:

->  Index Scan using table_k_late_spec_dp_end_dat_key on schema1.table_k kal  (... rows=196053 ...) (... rows=471.00
...)
      Index Cond: (kal.dp_end_dat < ('now'::cstring)::date)
      Index Searches: 1
      Buffers: shared hit=230 read=49
      I/O Timings: shared read=0.142

PostgreSQL overestimates the row count by a factor of over 400.
Try to fix that estimate and see if that gets PostgreSQL to do the right thing.

Perhaps a simple ANALYZE on the table can do the trick.

The right side of the comparison looks awkward, as if you wrote 'now'::text::date
My experiments show that PostgreSQL v18 estimates well even with such a weird
condition, but perhaps if you write "current_date" instead, you'd get better results.

I'd play just with a query like

  EXPLAIN (ANALYZE)
  SELECT * FROM schema1.table_k AS kal
  WHERE dp_end_dat < current_date;

until I get a good estimate.

Yours,
Laurenz Albe



Re: unstable query plan on pg 16,17,18

От
Andrei Lepikhov
Дата:
On 23/2/26 18:03, Attila Soki wrote:
>> On 23 Feb 2026, at 16:54, Andrei Lepikhov <lepihov@gmail.com> wrote:
> see my previous answer:
> https://www.postgresql.org/message-id/1695A676-062B-47C5- 
> B302-91E2357DC874%40gmx.net <https://www.postgresql.org/message- 
> id/1695A676-062B-47C5-B302-91E2357DC874%40gmx.net>
> 
> but here are the plans again:
> In order to be able to publish the plans here, I have obfuscated the 
> table and field names, but this is reversible, so I can provide more 
> info if needed.
> 
> plan-ok:
> https://explain.depesz.com/s/hQvM <https://explain.depesz.com/s/hQvM>
> 
> plan-wrong:
> https://explain.depesz.com/s/uLvl <https://explain.depesz.com/s/uLvl>

Thanks. But I meant your 'good' plan built by the PG14. I think a new 
feature invented later has added some problems.

Current conjecture is the following. As I see, the main problem is with 
Right Hash Join:
-> Hash Right Join (cost=210369.25..210370.30 rows=8 width=99)
Its inner side (Hash table) is rebuilt multiple times (around 1k) due to 
an external parameter (gauf_1.id) in the subtree. It looks like a 
disaster, and before I thought we don't build hash tables over 
parameterised query trees at all.

So, let me discover a little more, but your PG14 explain could add more 
details here.

-- 
regards, Andrei Lepikhov,
pgEdge



Re: unstable query plan on pg 16,17,18

От
Attila Soki
Дата:
On 23 Feb 2026, at 20:59, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Mon, 2026-02-23 at 16:10 +0100, Attila Soki wrote:
On 23 Feb 2026, at 10:41, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Mon, 2026-02-23 at 10:37 +0100, Attila Soki wrote:
When upgrading from PostgreSQL 14.4, I noticed that one of my somewhat complex
analytical queries sometimes gets an inefficient plan under PostgreSQL 16, 17, and 18.
Under 14.4, the query runs with a stable plan and completes in 19 to 22 seconds.
In newer versions, the plan seems to be unstable, sometimes the query completes
in 17 to 20 seconds, sometimes it runs for 5 to 18 minutes with the inefficient plan.
This also happens even if the data is not significantly changed.

This is very likely owing to a bad estimate.

Could you turn on "track_io_timing" and send us the EXPLAIN (ANALYZE, BUFFERS) output
for both the good and the bad plan?

Thank you for your reply. Here are the two explains.
In order to be able to publish the plans here, I have obfuscated the table and field names, but this is reversible, so I can provide more info if needed.

plan-ok:
https://explain.depesz.com/s/hQvM

plan-wrong:
https://explain.depesz.com/s/uLvl

Thanks.

The difference in the plans is under the "Subquery Scan on odg", starting with
plan node 50 (everything under the "Sort").  I suspect that the mis-estimate
that is at the root of the problem is here:

->  Index Scan using table_k_late_spec_dp_end_dat_key on schema1.table_k kal  (... rows=196053 ...) (... rows=471.00 ...)
     Index Cond: (kal.dp_end_dat < ('now'::cstring)::date)
     Index Searches: 1
     Buffers: shared hit=230 read=49
     I/O Timings: shared read=0.142

PostgreSQL overestimates the row count by a factor of over 400.
Try to fix that estimate and see if that gets PostgreSQL to do the right thing.

Perhaps a simple ANALYZE on the table can do the trick.

In the examples I used table_k to flip the plan with
vacuumed -Upostgres -vZ -t schema1.tbl_used_in_query db1
in the explain output schema1.tbl_used_in_query is table_k

The right side of the comparison looks awkward, as if you wrote 'now'::text::date
My experiments show that PostgreSQL v18 estimates well even with such a weird
condition, but perhaps if you write "current_date" instead, you'd get better results.

I didn't realize that made a difference. I will replace all occurrences. It also looks more clean with current_date.


I'd play just with a query like

 EXPLAIN (ANALYZE)
 SELECT * FROM schema1.table_k AS kal
 WHERE dp_end_dat < current_date;

until I get a good estimate.

I will try to set custom statistics for dp_end_dat and the fields used by the table_k_late_spec_dp_end_dat_key index.
Let’s see if that helps.

I am on UTC+1. I will try all of this tomorrow and get back to you with the results later.

Thank you

regards,
Attila


Re: unstable query plan on pg 16,17,18

От
Laurenz Albe
Дата:
On Mon, 2026-02-23 at 21:42 +0100, Attila Soki wrote:
> > > plan-ok:
> > > https://explain.depesz.com/s/hQvM
> > >
> > > plan-wrong:
> > > https://explain.depesz.com/s/uLvl
> >
> > Thanks.
> >
> > The difference in the plans is under the "Subquery Scan on odg", starting with
> > plan node 50 (everything under the "Sort").  I suspect that the mis-estimate
> > that is at the root of the problem is here:
> >
> > ->  Index Scan using table_k_late_spec_dp_end_dat_key on schema1.table_k kal  (... rows=196053 ...) (...
rows=471.00...) 
> >      Index Cond: (kal.dp_end_dat < ('now'::cstring)::date)
> >      Index Searches: 1
> >      Buffers: shared hit=230 read=49
> >      I/O Timings: shared read=0.142
> >
> > PostgreSQL overestimates the row count by a factor of over 400.
> > Try to fix that estimate and see if that gets PostgreSQL to do the right thing.
> >
> > Perhaps a simple ANALYZE on the table can do the trick.
>
>
> In the examples I used table_k to flip the plan with
> vacuumed -Upostgres -vZ -t schema1.tbl_used_in_query db1
> in the explain output schema1.tbl_used_in_query is table_k

I cannot understand that.

> > The right side of the comparison looks awkward, as if you wrote 'now'::text::date
> > My experiments show that PostgreSQL v18 estimates well even with such a weird
> > condition, but perhaps if you write "current_date" instead, you'd get better results.
>
> I didn't realize that made a difference. I will replace all occurrences. It also looks more clean with current_date.

It *didn't* make a difference when I played with that...

> > I'd play just with a query like
> >
> >  EXPLAIN (ANALYZE)
> >  SELECT * FROM schema1.table_k AS kal
> >  WHERE dp_end_dat < current_date;
> >
> > until I get a good estimate.
>
> I will try to set custom statistics for dp_end_dat and the fields used by the table_k_late_spec_dp_end_dat_key index.
> Let’s see if that helps.

For a simple condition like that, extended statistics won't help.

That's why I suggested a plain ANALYZE.
I am not sure why that is estimated so badly.

Yours,
Laurenz Albe



Re: unstable query plan on pg 16,17,18

От
Attila Soki
Дата:
> On 23 Feb 2026, at 21:54, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>>> The difference in the plans is under the "Subquery Scan on odg", starting with
>>> plan node 50 (everything under the "Sort").  I suspect that the mis-estimate
>>> that is at the root of the problem is here:
>>>
>>> ->  Index Scan using table_k_late_spec_dp_end_dat_key on schema1.table_k kal  (... rows=196053 ...) (...
rows=471.00...) 
>>>      Index Cond: (kal.dp_end_dat < ('now'::cstring)::date)
>>>      Index Searches: 1
>>>      Buffers: shared hit=230 read=49
>>>      I/O Timings: shared read=0.142
>>>
>>> PostgreSQL overestimates the row count by a factor of over 400.
>>> Try to fix that estimate and see if that gets PostgreSQL to do the right thing.
>>>
>>> Perhaps a simple ANALYZE on the table can do the trick.
>>
>>
>> In the examples I used table_k to flip the plan with
>> vacuum -Upostgres -vZ -t schema1.tbl_used_in_query db1
>> in the explain output schema1.tbl_used_in_query is table_k
>
> I cannot understand that.


sorry for being unclear.
To "switch" between the ok and wrong plans I execute
vacuumdb -Upostgres -vZ -t schema1.table_k db1
multiple times (1x-3x) until the plan flips. So I thought the table should already have been analyzed.
In my first email I called table_k as the "tbl_used_in_query"
or have I misunderstood that with the analyze?

Thanks

Regards,
Attila


Re: unstable query plan on pg 16,17,18

От
Laurenz Albe
Дата:
On Mon, 2026-02-23 at 22:19 +0100, Attila Soki wrote:
> To "switch" between the ok and wrong plans I execute
> vacuumdb -Upostgres -vZ -t schema1.table_k db1
> multiple times (1x-3x) until the plan flips. So I thought the table should already have been analyzed.
> In my first email I called table_k as the "tbl_used_in_query"

Ok, I see.  Then a simple ANALYZE won't do.

Perhaps the column needs more detailed statistics?

  ALTER TABLE schema1.table_k ALTER dp_end_dat SET STATISTICS 1000;
  ANALYZE schema1.table_k (dp_end_dat);

> or have I misunderstood that with the analyze?

No; I missed that point.

Yours,
Laurenz Albe



Re: unstable query plan on pg 16,17,18

От
Andrei Lepikhov
Дата:
On 23/2/26 21:25, Andrei Lepikhov wrote:
> On 23/2/26 18:03, Attila Soki wrote:
> So, let me discover a little more, but your PG14 explain could add more 
> details here.
It seems much more interesting than just a trivial accumulation of cost 
estimation errors. Look:

...
->  Hash  (cost=86.59..86.59 rows=8 width=67)
        (actual time=0.136..0.136 rows=44.56 loops=21798)
     Buckets: 2048 (originally 1024)  Batches: 1 (originally 1) ...
     ->  Nested Loop  (cost=1.12..86.59 rows=8 width=67)
    (actual time=0.017..0.126 rows=44.56 loops=21798)
     ...

This hash table has been rescanned multiple times. And on each rescan, 
it was rebuilt as well (the number of loops in the underlying Join was 
also 21798). It is the first time I have seen such a query plan. And 
discovering how rescan reckons in the cost model, this Hash table 
rebuilding == subtree rescanning, you may find the following:

cost_rescan():

case T_HashJoin:
   /*
    * If it's a single-batch join, we don't need to rebuild the hash
    * table during a rescan.
    */
    if (((HashPath *) path)->num_batches == 1)
    {
      /* Startup cost is exactly the cost of hash table building */
      *rescan_startup_cost = 0;
      *rescan_total_cost = path->total_cost - path->startup_cost;
    }
    ...

That means (if I read the code correctly) we don't take into account the 
cost=86.59 of subtree rescanning and htab rebuilding at all!
So, it looks like a rare cost model bug.
To learn more, I still need your PG14 EXPLAIN. Can you also share your 
SQL so we can understand which combination of SQL structures led to this 
unusual query plan?

-- 
regards, Andrei Lepikhov,
pgEdge



Re: unstable query plan on pg 16,17,18

От
Attila Soki
Дата:
On 23 Feb 2026, at 21:25, Andrei Lepikhov <lepihov@gmail.com> wrote:
Thanks. But I meant your 'good' plan built by the PG14. I think a new feature invented later has added some problems.

Current conjecture is the following. As I see, the main problem is with Right Hash Join:
-> Hash Right Join (cost=210369.25..210370.30 rows=8 width=99)
Its inner side (Hash table) is rebuilt multiple times (around 1k) due to an external parameter (gauf_1.id) in the subtree. It looks like a disaster, and before I thought we don't build hash tables over parameterised query trees at all.

So, let me discover a little more, but your PG14 explain could add more details here.

Sorry, I misunderstood that.
here is the pg14.4 explain


Regards,
Attila

Re: unstable query plan on pg 16,17,18

От
Andrei Lepikhov
Дата:
On 24/2/26 10:16, Attila Soki wrote:
> On 23 Feb 2026, at 21:25, Andrei Lepikhov <lepihov@gmail.com> wrote:
>> Thanks. But I meant your 'good' plan built by the PG14. I think a new 
>> feature invented later has added some problems.
>>
>> Current conjecture is the following. As I see, the main problem is 
>> with Right Hash Join:
>> -> Hash Right Join (cost=210369.25..210370.30 rows=8 width=99)
>> Its inner side (Hash table) is rebuilt multiple times (around 1k) due 
>> to an external parameter (gauf_1.id) in the subtree. It looks like a 
>> disaster, and before I thought we don't build hash tables over 
>> parameterised query trees at all.
>>
>> So, let me discover a little more, but your PG14 explain could add 
>> more details here.
> 
> Sorry, I misunderstood that.
> here is the pg14.4 explain

This update gives us more useful details. In PG14, the join search 
problem involved at most 9 relations. In PG19, the maximum is now 18 
joins. Do you know what your join_collapse_limit is set to? It looks 
like subplan pull-ups have made things more complex.
First, we should look into any possible 'rescan cost' issues on our side 
as developers.
On your end, please check the join_collapse_limit setting. If needed, 
try increasing it to around 20. This might help.

-- 
regards, Andrei Lepikhov,
pgEdge



Re: unstable query plan on pg 16,17,18

От
Attila Soki
Дата:
On 24 Feb 2026, at 12:09, Andrei Lepikhov <lepihov@gmail.com> wro
> This update gives us more useful details. In PG14, the join search problem involved at most 9 relations. In PG19, the
maximumis now 18 joins. Do you know what your join_collapse_limit is set to? It looks like subplan pull-ups have made
thingsmore complex. 
> First, we should look into any possible 'rescan cost' issues on our side as developers.
> On your end, please check the join_collapse_limit setting. If needed, try increasing it to around 20. This might
help.

join_collapse_limit is not set, so it is the default, 8
I will try if something around 20 helps.

thanks

btw: I working on providing more details to the query

regards,
Attila


Re: unstable query plan on pg 16,17,18

От
Attila Soki
Дата:
On 24 Feb 2026, at 12:37, Attila Soki <atiware@gmx.net> wrote:
> On 24 Feb 2026, at 12:09, Andrei Lepikhov <lepihov@gmail.com> wro
>> This update gives us more useful details. In PG14, the join search problem involved at most 9 relations. In PG19,
themaximum is now 18 joins. Do you know what your join_collapse_limit is set to? It looks like subplan pull-ups have
madethings more complex. 
>> First, we should look into any possible 'rescan cost' issues on our side as developers.
>> On your end, please check the join_collapse_limit setting. If needed, try increasing it to around 20. This might
help.
>
> join_collapse_limit is not set, so it is the default, 8
> I will try if something around 20 helps.

I tried it with join_collapse_limit 10,15, 20 with no success,
but with join_collapse_limit 1, 5, 7 I get the good plan.

Then I set join_collapse_limit back to default and flipped the plan again with vacuumdb until i get the good plan with
thedefault join_collapse_limit. 
Then I tried to increase join_collapse_limit until the query (or planning) runs longer than 28 sec.
I tried 1, 5 ,7, 8, 9 ,10, 15, 20, 40, 100 but the query runs stable with 17-20 sec runtime. so there is still
somethingweird with the statistics. 

Now with join_collapse_limit=7 works for me and I am not able to flip the plan. makes that sense?

should I still test with increased statistic on table_k.dp_end_dat as Laurenz suggested?

I could now share some general infos about the query, if you still interested.

thanks.

regards
Attila




Re: unstable query plan on pg 16,17,18

От
Andrei Lepikhov
Дата:
On 24/2/26 16:50, Attila Soki wrote:
> Now with join_collapse_limit=7 works for me and I am not able to flip the plan. makes that sense?

It is almost a game of chance. But if it solves your problem - why not ;)?

> 
> should I still test with increased statistic on table_k.dp_end_dat as Laurenz suggested?

Yes, it may provide us additional info for developing.

> 
> I could now share some general infos about the query, if you still interested.
Yes, we still have a problem. So, any additional info that can let us 
build a repro is appreciated.

-- 
regards, Andrei Lepikhov,
pgEdge



Re: unstable query plan on pg 16,17,18

От
Attila Soki
Дата:
> On 24 Feb 2026, at 16:57, Andrei Lepikhov <lepihov@gmail.com> wrote:
>
> On 24/2/26 16:50, Attila Soki wrote:
>> Now with join_collapse_limit=7 works for me and I am not able to flip the plan. makes that sense?
>
> It is almost a game of chance. But if it solves your problem - why not ;)?
>
>> should I still test with increased statistic on table_k.dp_end_dat as Laurenz suggested?
>
> Yes, it may provide us additional info for developing.
ok i try that and come with more info.
Should I set join_collapse_limit back to default for this test?

>
>> I could now share some general infos about the query, if you still interested.
> Yes, we still have a problem. So, any additional info that can let us build a repro is appreciated.


The query involves many tables, views, functions and data. Maybe we can build something simpler to reproduce the
problem.

I still checking whether it is okay to share the SQL. Until then, here are some general details.
It is basically a per order report on the availability of the inventory for a rental service.
Striped down, it builds an in memory calendar and calculates some sums for overlapping time periods, where some of the
timeperiods has no end. (delayed return: late) 

There are projects (table_e), the project has several orders (table_a). The order has a renting period, measured in
daysbetween start and end. 
The period is not limited but somewhere between 1 - 365, usually between 5-15 days.
An order can have multiple quotes (auf_oos), but only one of them is active at a time.
A quote lists multiple items (table_k). Usually 10-500 items per quote, where an item can have one or more accessories
tiedto it. 
This list is the customer's order.

There is also a helper table that stores one row for each day of the order: (table_a_dtg) table_a.id and a date. In
thistable 5 day rental order has 5 rows. (this is stored on disk and maintained via trigger, because using
generate_serieswas too slow). 

There is a list of possible products/items (table_s), which stores the company's own portion of the inventory. The
otherportion of the inventory can be temporarily held (rented long-term) and is stored in (al_ast). 
Another portion of the inventory may come from other suppliers as needed. These stock items are rented for the order
andare tied to it. The rent-order is located in al_zm, and the rent-order's item list is in al_zm_kal. 

In addition, there are items that were not returned on time, were lost, or are temporarily blocked due to repairs. This
rendersthe stock level of a future day-x not predictable. 

The items required for a quote are manually allocated from one of the three possible sources. This allocation is stored
intable_k. 
The info that the rented items are fully or partly returned is also stored in table_k.

The primary function of this query is to list one row per quote, with multiple flags indicating the various possible
summarizedstates of the quote. It also computes the total volume and weight of the order. 
Some of the flags indicates various working phases or possible conflict states.
For example when inventory levels are insufficient, configured as "fulfilled from stock" but there is not enough items
onstock, when items are configured as "fulfilled from rental" but no rental order has been placed, or the rental order
hasnot the right amount of items, or when the required items are not returned as planned, or are returned damaged or
arelost. 
Because of resource limits, the flags are built only for quotes where the renting period includes a date between now
andnow+3 months. 

To achieve this, the query builds a stock availability report for all items of the quotes on the fly for each item/day.

gauf_1 is table_a in a view is where an item stored in table_k will be combined with the rental period and so gets one
rowfor each day for the rental period. This row is representing the required stock quantity per day for that and order. 
These rows building the base of this query and will be combined to build the daily overall demand and state.
the view is simple:

select
...
from table_a_dtg gdt -- one row per order_rental_date, eg. 5 rows for a 5 days rental
left join table_a gauf on gauf.id=gdt.au_id -- one row per order
left join auf_oos goftr on goftr.au_id=gauf.id -- one row per quote
left join table_k gkal on gkal.oo_id=gofrt.id -- one row per quote item
where gdt.datum >= ('now'::text::date - '7 days'::interval)::date;

eg. a 3 day rental order with 2x1 items gets 6 rows
order1, item1, day1, 1 pcs
order1, item1, day2, 1 pcs
order1, item1, day3, 1 pcs
order1, item2, day1, 1 pcs
order1, item2, day2, 1 pcs
order1, item2, day3, 1 pcs

I can provide more details on other parts of the query too, if that helps.

regards,
Attila




Re: unstable query plan on pg 16,17,18

От
Andrei Lepikhov
Дата:
On 24/2/26 17:48, Attila Soki wrote:
>> On 24 Feb 2026, at 16:57, Andrei Lepikhov <lepihov@gmail.com> wrote:
>> On 24/2/26 16:50, Attila Soki wrote:
> I can provide more details on other parts of the query too, if that helps.

Only query and reproduction make sense for me to discover more deeply at
the moment.
It looks like we have managed to reproduce the potential 'Hash over
parameterised subtree' issue. Please check the attachment: there are two
plans. One plan has a longer execution time and more blocks hit, but its
cost estimate is four times lower. The EXPLAIN output does not show any
obvious estimation errors. This suggests there may be a bug in the cost
model.


--
regards, Andrei Lepikhov,
pgEdge

Вложения

Re: unstable query plan on pg 16,17,18

От
Attila Soki
Дата:
On 24 Feb 2026, at 17:48, Attila Soki <atiware@gmx.net> wrote:
>
>> On 24 Feb 2026, at 16:57, Andrei Lepikhov <lepihov@gmail.com> wrote:
>>
>> On 24/2/26 16:50, Attila Soki wrote:
>>> Now with join_collapse_limit=7 works for me and I am not able to flip the plan. makes that sense?
>>
>> It is almost a game of chance. But if it solves your problem - why not ;)?
>>
>>> should I still test with increased statistic on table_k.dp_end_dat as Laurenz suggested?
>>
>> Yes, it may provide us additional info for developing.
> ok i try that and come with more info.
> Should I set join_collapse_limit back to default for this test?
>

Increasing the statistics of dp_end_dat did not helped. With statistic 1000 I was not able to get a good plan without
settingjoin_collapse_limit=7 

regards,
Attila


Re: unstable query plan on pg 16,17,18

От
Attila Soki
Дата:
On 24 Feb 2026, at 20:20, Andrei Lepikhov <lepihov@gmail.com> wrote:
>
> On 24/2/26 17:48, Attila Soki wrote:
>>> On 24 Feb 2026, at 16:57, Andrei Lepikhov <lepihov@gmail.com> wrote:
>>> On 24/2/26 16:50, Attila Soki wrote:
>> I can provide more details on other parts of the query too, if that helps.
>
> Only query and reproduction make sense for me to discover more deeply at
> the moment.
> It looks like we have managed to reproduce the potential 'Hash over
> parameterised subtree' issue. Please check the attachment: there are two
> plans. One plan has a longer execution time and more blocks hit, but its
> cost estimate is four times lower. The EXPLAIN output does not show any
> obvious estimation errors. This suggests there may be a bug in the cost
> model.
>

I looked your repro, and I tried to find the corresponding part in my query.
If that not the right place is, please point me to the part in explain, so I can compare your repro and that part of my
query.

As far as I can identify, there are two candidates:
The first one because of "lateral", but I think this is not the problematic part
    select
    from
    left join lateral ()

    somewhere around this line:
    Output: dim_kal.oo_id, dim_stamm.dmn_gew, dim_kal.art_vk, dim_stamm.dmn_anz, dim_kal.art_bl, dim_kal.art_dp,
dim_stamm.dmn_vol,dim_ext_dd.table_d_id, dim_ext_dd_dpe.enabled, dim_kal.rti_id, dim_stamm.ist_divers_rti,
dim_stamm.ist_psa_rti


The second one because of your prior comment about "odg" and "rebuilt multiple times (around 1k) due to an external
parameter(gauf_1.id)" 
    gauf_1 refers to a view and this view is used multiple times in the query. see my previous mail for more details
aboutgauf_1. 

    select
    ...
    from (
        with (
            select
            ...
            from table_k kal
            where ...
                AND not ( exists (
                    select oo_id from "view_gauf_1" gdt_2
                    where gdt_2.rti_id = kal.rti_id ... AND gdt_2.datum >= ('now'::cstring)::date) .. and gauf_2....
                ))
        ) spaet
        select
        ...
        from view
        left join spaet on spaet.rti_id::text = akd.dp_rti_id::text
        left join lateral ( select from where )
        left join lateral ( select from where )
        ...

        somewhere around this line:
        "Filter: ((ext_dd.table_d_id IS NULL) OR ((ext_dd.table_d_id)::text = 'schema1'::text) OR (NOT
COALESCE(ext_dd_dpe.enabled,false)))" 

    ) table_k_dly


Thank you.

Regards,
Attila




Re: unstable query plan on pg 16,17,18

От
Laurenz Albe
Дата:
On Thu, 2026-02-26 at 17:15 +0100, Attila Soki wrote:
> Increasing the statistics of dp_end_dat did not helped. With statistic 1000
> I was not able to get a good plan without setting join_collapse_limit=7

Reducing "join_collapse_limit" dumbs down the optimizer, so you are getting a good plan
by accident.  I mean, you can try to rewrite the query so that the tables are written
in the order in which they should be joined in the good plan, then set "join_collapse_limit"
to 1.  That may be a solution if you cannot find a better one.

Still, the bad estimate that I indicated in [1] is worrysome, and I don't quite
understand it.  Could you show the result of the simplified query that I suggested?

  EXPLAIN (ANALYZE, BUFFERS, SETTINGS)
  SELECT * FROM schema1.table_k AS kal
  WHERE dp_end_dat < current_date;

If I were you, I'd focus on getting PostgreSQL to estimate that correctly.

Yours,
Laurenz Albe

 [1]: https://postgr.es/m/b2e372392b8a022da81b95b7c823a5729d7fd70f.camel%40cybertec.at



Re: unstable query plan on pg 16,17,18

От
Andrei Lepikhov
Дата:
On 26/2/26 17:22, Attila Soki wrote:
> On 24 Feb 2026, at 20:20, Andrei Lepikhov <lepihov@gmail.com> wrote:
>>
>> On 24/2/26 17:48, Attila Soki wrote:
>>>> On 24 Feb 2026, at 16:57, Andrei Lepikhov <lepihov@gmail.com> wrote:
>>>> On 24/2/26 16:50, Attila Soki wrote:
>>> I can provide more details on other parts of the query too, if that helps.
>>
>> Only query and reproduction make sense for me to discover more deeply at
>> the moment.
>> It looks like we have managed to reproduce the potential 'Hash over
>> parameterised subtree' issue. Please check the attachment: there are two
>> plans. One plan has a longer execution time and more blocks hit, but its
>> cost estimate is four times lower. The EXPLAIN output does not show any
>> obvious estimation errors. This suggests there may be a bug in the cost
>> model.
>>
> 
> I looked your repro, and I tried to find the corresponding part in my query.
> If that not the right place is, please point me to the part in explain, so I can compare your repro and that part of
myquery.
 
No problem. Issues with your query plan starts in exactly the following 
line:

->  Hash Right Join  (cost=210369.25..210370.30 rows=8 width=99)
    (actual time=150.790..150.853 rows=44.56 loops=21798)

Schema of this part of the query tree is as the following:

Hash Right Join  (loops=21798)
   │
   ├─ [Left/Probe] GroupAggregate (loops=14426)
   │    └─ Merge Right Anti Join
   │         └─ Merge Join
   │              └─ Index Only Scan on table_k gkal_2  (loops=14426)
   │
   └─ [Right/Build = Hash] Nested Loop (loops=21798)
        ├─ Index Scan on table_o goftr_1 (loops=21798)
        │    Index Cond: goftr_1.au_id = gauf_1.id
        └─ Index Scan on table_k gkal_1
             Index Cond: gkal_1.oo_id = goftr_1.id

So, the hash table is rebuilt each rescan based on the changed 
'gauf_1.id' external parameter.
Without the query, it is hard to say exactly what the trigger of this 
problem is. Having a reproduction, we could use planner advising 
extensions and see how additional knowledge of true cardinalities 
rebuilds the query plan. Sometimes, additional LATERAL restriction, 
added by the planner to pull-up subplan, restricts the join search scope 
badly, but I doubt if we have this type of problem here.

-- 
regards, Andrei Lepikhov,
pgEdge



Re: unstable query plan on pg 16,17,18

От
Attila Soki
Дата:
On 27 Feb 2026, at 09:15, Andrei Lepikhov <lepihov@gmail.com> wrote:

->  Hash Right Join  (cost=210369.25..210370.30 rows=8 width=99)
(actual time=150.790..150.853 rows=44.56 loops=21798)

Schema of this part of the query tree is as the following:

Hash Right Join  (loops=21798)
 │
 ├─ [Left/Probe] GroupAggregate (loops=14426)
 │    └─ Merge Right Anti Join
 │         └─ Merge Join
 │              └─ Index Only Scan on table_k gkal_2  (loops=14426)
 │
 └─ [Right/Build = Hash] Nested Loop (loops=21798)
      ├─ Index Scan on table_o goftr_1 (loops=21798)
      │    Index Cond: goftr_1.au_id = gauf_1.id
      └─ Index Scan on table_k gkal_1
           Index Cond: gkal_1.oo_id = goftr_1.id

So, the hash table is rebuilt each rescan based on the changed 'gauf_1.id' external parameter.
Without the query, it is hard to say exactly what the trigger of this problem is. Having a reproduction, we could use planner advising extensions and see how additional knowledge of true cardinalities rebuilds the query plan. Sometimes, additional LATERAL restriction, added by the planner to pull-up subplan, restricts the join search scope badly, but I doubt if we have this type of problem here.

I searched for the condition kal.dp_end_dat < current_date, then realized that this part of the explain is misleading.

Index Scan using table_k_late_spec_dp_end_dat_key on schema1.table_k kal  (cost=0.28..122468.46 rows=196053 width=24) (actual time=0.039..0.614 rows=471.00 loops=1)
   Output: kal.dp_rti_id, kal.art_dp_res, kal.oo_id
   Index Cond: (kal.dp_end_dat < ('now'::cstring)::date)
   Index Searches: 1
   Buffers: shared hit=230 read=49
   I/O Timings: shared read=0.142

The definiton of the index table_k_late_spec_dp_end_dat_key is:
CREATE INDEX table_k_late_spec_dp_end_dat_key
  ON schema1.table_k
  USING btree
  (dp_end_dat)
  WHERE dp_st_dat IS NOT NULL AND dp_end_dat IS NOT NULL AND dp_status IS NOT NULL AND dp_status > 0 AND oo_id IS NOT NULL AND COALESCE(art_rtd, 0.0000) < (COALESCE(art_grt, 0.0000) + COALESCE(art_grt_j2j, 0.0000));

This, because the where in index corresponds the where in query. so the simplified query is:
SELECT * FROM schema1.table_k AS kal
WHERE dp_end_dat < current_date AND dp_st_dat IS NOT NULL AND dp_end_dat IS NOT NULL AND dp_status IS NOT NULL AND dp_status > 0 AND oo_id IS NOT NULL AND COALESCE(art_rtd, 0.0000) < (COALESCE(art_grt, 0.0000) + COALESCE(art_grt_j2j, 0.0000));


The surrounding query part of the view is below, where the part with "dp_end_dat < current_date" is in the "with late as ()":

WITH late AS (
    SELECT kal.dp_rti_id AS rti_id,
        sum(COALESCE(kal.art_dp_res, 0.0000)) AS sum_art_dp_late
    FROM schema1.table_k kal
    WHERE kal.dp_status IS NOT NULL AND kal.dp_status > 0 AND COALESCE(kal.art_rtd, 0.0000) < (COALESCE(kal.art_grt, 0.0000) + COALESCE(kal.art_grt_j2j, 0.0000)) AND kal.dp_st_dat IS NOT NULL AND kal.dp_end_dat IS NOT NULL AND kal.dp_end_dat < 'now'::text::date AND kal.oo_id IS NOT NULL
    AND NOT (EXISTS (
        SELECT akdt_late.oo_id
        FROM schema1.table_k_dtg akdt_late — ------ this is a view
        WHERE akdt_late.dp_rti_id::text = kal.dp_rti_id::text AND akdt_late.oo_id IS NOT NULL
        AND (akdt_late.art_prov_res > 0.0000 OR akdt_late.dp_status > 0 AND akdt_late.art_dp_res > 0.0000)
        AND akdt_late.datum >= 'now'::text::date
        AND (akdt_late.a_status::text = ANY (ARRAY['d'::character varying::text, 'v'::character varying::text, 'i'::character varying::text]))
        AND akdt_late.ih_flag AND kal.oo_id::text = akdt_late.oo_id::text
    ))
    GROUP BY kal.dp_rti_id
)
SELECT akd.oo_id,
    akd.dp_rti_id AS rti_id,
    akd.datum,
    akd.lgaagng AS auf_lgaagng,
    akd.rueday_def,
    akd.rettag_def,
    COALESCE(min(COALESCE(sum_ast_per_day.sum_per_day, 0.0000)), 0.0000) AS sum_ast_per_day,
    COALESCE(max(COALESCE(sum_red_per_day.sum_per_day, 0.0000)), 0.0000) AS sum_red_per_day,
    CASE
        WHEN akd.datum > 'now'::text::date THEN COALESCE(late.sum_art_dp_late, 0.0000)
        ELSE 0.0000
    END AS sum_art_dp_late
FROM schema1.table_k_future_dt akd — ------ this is a view
LEFT JOIN schema1.dd_ext ext_dd ON ext_dd.id::text = akd.ext::text
LEFT JOIN schema1.dp_epkt ext_dd_dpe ON ext_dd_dpe.id::text = ext_dd.table_d_id::text
LEFT JOIN late ON late.rti_id::text = akd.dp_rti_id::text
LEFT JOIN LATERAL (
    SELECT COALESCE(sum(COALESCE(stk.anz, 0.0000)), 0.0000) AS sum_per_day
    FROM schema1.al_ast stk
    WHERE stk.rti_id::text = akd.dp_rti_id::text AND stk.von <= akd.datum AND stk.bis >= akd.datum
    GROUP BY akd.datum
) sum_ast_per_day ON (
        EXISTS (
            SELECT al_ast.rti_id
            FROM schema1.al_ast
            WHERE al_ast.rti_id::text = akd.dp_rti_id::text
        )
    )
LEFT JOIN LATERAL (
    SELECT COALESCE(sum(COALESCE(alred.anz, 0.0000)), 0.0000) AS sum_per_day
    FROM schema1.al_red alred
    WHERE alred.rti_id::text = akd.dp_rti_id::text AND alred.von <= akd.datum AND alred.bis >= akd.datum
    GROUP BY akd.datum
) sum_red_per_day ON (
        EXISTS (
            SELECT al_red.rti_id
            FROM schema1.al_red
            WHERE al_red.rti_id::text = akd.dp_rti_id::text
        )
    )
WHERE (ext_dd.table_d_id IS NULL OR ext_dd.table_d_id::text = 'schema1'::text OR NOT COALESCE(ext_dd_dpe.enabled, false))
AND akd.rti_id::text !~~ 'P%'::text AND akd.dp_rti_id::text !~~ 'P%'::text
AND (akd.art_dp_res > 0.0000 OR akd.art_prov_res > 0.0000 OR akd.art_dp_zm > 0.0000)
AND (akd.lgaagng IS NULL OR akd.lgaagng::date >= 'now'::text::date AND akd.lgaagng::date <= ('now'::text::date + '3 mons'::interval)::date)
GROUP BY akd.oo_id, akd.dp_rti_id, akd.datum, akd.lgaagng, akd.rueday_def, akd.rettag_def, late.sum_art_dp_late


"gauf" is in one of the table_k_* views, and looks like below. There are multiple variants, they differ mostly in "where" part.

SELECT gdt.datum,
    gkal.rti_id,
    gdt.au_id,
    gkal.oo_id,
    gkal.id AS kal_id,
    gauf.status AS a_status,
    goftr_1.token AS ih_flag,
    gdt.prov,
    gdt.def,
    gkal.dp_status,
    gkal.ext,
    gdt.rueday_def,
    gdt.rettag_def,
    gdt.rueday_prov,
    gdt.rettag_prov,
    gauf.lgaagng,
    gauf.lgaein,
    gkal.art_dp_res,
    gkal.art_prov_res,
    gkal.art_dp_zm,
    gkal.rti,
    gkal.art_dp_extern,
    gkal.dp_rti_id,
    gkal.art_dp_lga,
    gkal.set_fix_vkpt
   FROM schema1.table_a_dtg gdt
     LEFT JOIN schema1.table_a gauf ON gauf.id::text = gdt.au_id::text
     LEFT JOIN schema1.auf_oos goftr_1 ON goftr_1.au_id::text = gauf.id::text
     LEFT JOIN schema1.table_k gkal ON gkal.oo_id::text = goftr_1.id::text
  WHERE gdt.datum >= ('now'::text::date - '7 days'::interval)::date;;

I tried to change the statistics of dp_end_dat and also all of the fields in "where" from 10 to 1500 increased in increments of 10.
One field at once then all fields together. the estimate got not better, actual rows 471, planned rows somewhere between 180000 and 195000.

then i checked the same query on pg 14, the estimate is the same as on pg 18.

ALTER TABLE scema1.table_k ALTER dp_end_dat SET STATISTICS 140;
ALTER TABLE
ALTER TABLE scema1.table_k ALTER dp_status SET STATISTICS 140;
ALTER TABLE
ALTER TABLE scema1.table_k ALTER oo_id SET STATISTICS 140;
ALTER TABLE
ALTER TABLE scema1.table_k ALTER art_rtd SET STATISTICS 140;
ALTER TABLE
ALTER TABLE scema1.table_k ALTER art_grt SET STATISTICS 140;
ALTER TABLE
ALTER TABLE scema1.table_k ALTER art_grt_j2j SET STATISTICS 140;
ALTER TABLE
ANALYZE scema1.table_k;

pg14 at statistics 140:
EXPLAIN (ANALYZE, BUFFERS, SETTINGS) SELECT * FROM schema1.table_k AS kal WHERE dp_end_dat < current_date AND dp_st_dat IS NOT NULL AND dp_end_dat IS NOT NULL AND dp_status IS NOT NULL AND dp_status > 0 AND oo_id IS NOT NULL AND COALESCE(art_rtd, 0.0000) < (COALESCE(art_grt, 0.0000) + COALESCE(art_grt_j2j, 0.0000));


 Index Scan using table_k_late_spec_dp_end_dat_key on table_k kal  (cost=0.28..122750.89 rows=193091 width=614) (actual time=0.010..0.261 rows=471 loops=1)
   Index Cond: (dp_end_dat < CURRENT_DATE)
   Buffers: shared hit=279
 Settings: hash_mem_multiplier = '2.5', jit = 'off', max_parallel_workers = '4', max_parallel_workers_per_gather = '4', random_page_cost = '1.2', temp_buffers = '512MB', work_mem = '768MB'
 Planning:
   Buffers: shared hit=1459
 Planning Time: 3.101 ms
 Execution Time: 0.325 ms


 pg18 at statistics 140:
 EXPLAIN (ANALYZE, BUFFERS, SETTINGS) SELECT * FROM schema1.table_k AS kal WHERE dp_end_dat < current_date AND dp_st_dat IS NOT NULL AND dp_end_dat IS NOT NULL AND dp_status IS NOT NULL AND dp_status > 0 AND oo_id IS NOT NULL AND COALESCE(art_rtd, 0.0000) < (COALESCE(art_grt, 0.0000) + COALESCE(art_grt_j2j, 0.0000));


 Index Scan using table_k_late_spec_dp_end_dat_key on table_k kal  (cost=0.28..122561.69 rows=195550 width=624) (actual time=0.021..0.514 rows=471.00 loops=1)
   Index Cond: (dp_end_dat < CURRENT_DATE)
   Index Searches: 1
   Buffers: shared hit=279
 Settings: temp_buffers = '512MB', work_mem = '768MB', hash_mem_multiplier = '2.5', jit = 'off', max_parallel_workers_per_gather = '4', max_parallel_workers = '4', random_page_cost = '1.2'
 Planning:
   Buffers: shared hit=1508
 Planning Time: 3.123 ms
 Execution Time: 0.639 ms
(9 rows)


I hope I have selected the correct parts of the query, as it is not really possible to share the entire query with all its dependencies.

The first version of this query was written for PostgreSQL 8.3; since then, it has been refactored and optimized a few times when necessary. I will check if it is possible to reorder the query without rewriting everything.

I dont know the inner workings of analyze, is that normal that executing analyze on unchanged data can flip the plan? Does analyze select a random set of rows?

Thanks.

regards,
Attila



Re: unstable query plan on pg 16,17,18

От
Andrei Lepikhov
Дата:
On 27/2/26 16:00, Attila Soki wrote:
> On 27 Feb 2026, at 09:15, Andrei Lepikhov <lepihov@gmail.com> wrote:
> I dont know the inner workings of analyze, is that normal that executing 
> analyze on unchanged data can flip the plan? Does analyze select a 
> random set of rows?

Yes, this is completely normal because ANALYZE works in a random way.
You can try using the Join-Order-Benchmark to see how running the 
ANALYZE command can change the execution time of the same query, 
sometimes by as much as ten times.

-- 
regards, Andrei Lepikhov,
pgEdge