Обсуждение: BUG #19076: Generic query plan is extremely slow

Поиск
Список
Период
Сортировка

BUG #19076: Generic query plan is extremely slow

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      19076
Logged by:          Premal Patel
Email address:      premal.patel@veeva.com
PostgreSQL version: 17.6
Operating system:   macOS 15.6.1 (observed on other Linux based OS)
Description:

I have an application which uses a database driver that creates prepared
statements. I am noticing that for some queries, the first 5 executions
after the database connection is created are performant. After this, the
prepared statement uses the generic query plan and is incredibly slow. This
is even if I do not change the parameters and simple re-run the same query
several times with the same arguments. I have tried re-running ANALYZE on
the tables and setting STATISTICS to 1000 on the rows in question with no
luck. Here is what my query looks like:

SELECT accounts_contacts.account_id, count(contacts.id) AS count
    FROM accounts_contacts JOIN contacts ON contacts.id =
accounts_contacts.contact_id
    WHERE accounts_contacts.account_id IN (...) AND contacts.tenant_id =
$1::UUID AND contacts.status = $2::VARCHAR
GROUP BY accounts_contacts.account_id

Here is the EXPLAIN output from the first 5 executions using the custom
query plan:

Finalize HashAggregate  (cost=47034.52..47131.24 rows=9672 width=24) (actual
time=103.173..103.822 rows=1000 loops=1)
      Group Key: accounts_contacts.account_id
      Batches: 1  Memory Usage: 465kB
      ->  Gather  (cost=44906.68..46937.80 rows=19344 width=24) (actual
time=102.801..103.506 rows=2326 loops=1)
            Workers Planned: 2
            Workers Launched: 2
            ->  Partial HashAggregate  (cost=43906.68..44003.40 rows=9672
width=24) (actual time=93.682..93.757 rows=775 loops=3)
                  Group Key: accounts_contacts.account_id
                  Batches: 1  Memory Usage: 465kB
                  Worker 0:  Batches: 1  Memory Usage: 465kB
                  Worker 1:  Batches: 1  Memory Usage: 465kB
                  ->  Hash Join  (cost=6066.79..43819.26 rows=17484
width=32) (actual time=12.902..92.674 rows=11509 loops=3)
                        Hash Cond: (contacts.id =
accounts_contacts.contact_id)
                        ->  Parallel Seq Scan on contacts
(cost=0.00..36177.50 rows=373366 width=16) (actual time=0.435..55.023
rows=288122 loops=3)
                              Filter: ((tenant_id =
'0199729c-3f1e-712f-8b71-f6545a96b225'::uuid) AND ((status)::text =
'ACTIVE'::text))
                              Rows Removed by Filter: 162518
                        ->  Hash  (cost=5275.46..5275.46 rows=63307
width=32) (actual time=12.053..12.054 rows=37282 loops=3)
                              Buckets: 65536  Batches: 1  Memory Usage:
2843kB
                              ->  Index Only Scan using
uq_accounts_contacts_account_id_contact_id on accounts_contacts
(cost=0.42..5275.46 rows=63307 width=32) (actual time=0.335..5.220
rows=37282 loops=3)
                                    Index Cond: (account_id = ANY (...

Here is the EXPLAIN output from executions 6+ using the generic query plan:
GroupAggregate  (cost=5821.21..5821.56 rows=20 width=24) (actual
time=7178.328..7179.934 rows=1000 loops=1)
      Group Key: accounts_contacts.account_id
      ->  Sort  (cost=5821.21..5821.26 rows=20 width=32) (actual
time=7178.314..7178.943 rows=34526 loops=1)
            Sort Key: accounts_contacts.account_id
            Sort Method: quicksort  Memory: 2984kB
            ->  Nested Loop  (cost=10.26..5820.77 rows=20 width=32) (actual
time=34.187..7174.937 rows=34526 loops=1)
                  ->  Bitmap Heap Scan on contacts  (cost=9.83..2490.09
rows=354 width=16) (actual time=34.071..193.216 rows=864365 loops=1)
                        Recheck Cond: (tenant_id = $1)
                        Filter: ((status)::text = ($2)::text)
                        Rows Removed by Filter: 102519
                        Heap Blocks: exact=26858
                        ->  Bitmap Index Scan on ix_contacts_tenant_id
(cost=0.00..9.74 rows=709 width=0) (actual time=32.114..32.114 rows=966884
loops=1)
                              Index Cond: (tenant_id = $1)
                  ->  Index Scan using ix_accounts_contacts_contact_id on
accounts_contacts  (cost=0.42..9.40 rows=1 width=32) (actual
time=0.008..0.008 rows=0 loops=864365)
                        Index Cond: (contact_id = contacts.id)
                        Filter: (account_id = ANY (ARRAY[...]))
                        Rows Removed by Filter: 1

(please note I omitted the account_ids list, but it is of length 1000 in
both cases)

I have seen others report the same issue, as seen in this forum:
https://dba.stackexchange.com/questions/289130/postgresql-changes-same-query-plan-at-runtime


Re: BUG #19076: Generic query plan is extremely slow

От
Laurenz Albe
Дата:
On Tue, 2025-10-07 at 20:32 +0000, PG Bug reporting form wrote:
> PostgreSQL version: 17.6
> Operating system:   macOS 15.6.1 (observed on other Linux based OS)
>
> I have an application which uses a database driver that creates prepared
> statements. I am noticing that for some queries, the first 5 executions
> after the database connection is created are performant. After this, the
> prepared statement uses the generic query plan and is incredibly slow. This
> is even if I do not change the parameters and simple re-run the same query
> several times with the same arguments. I have tried re-running ANALYZE on
> the tables and setting STATISTICS to 1000 on the rows in question with no
> luck. Here is what my query looks like:
>
> SELECT accounts_contacts.account_id, count(contacts.id) AS count
>     FROM accounts_contacts JOIN contacts ON contacts.id =
> accounts_contacts.contact_id
>     WHERE accounts_contacts.account_id IN (...) AND contacts.tenant_id =
> $1::UUID AND contacts.status = $2::VARCHAR
> GROUP BY accounts_contacts.account_id
>
> Here is the EXPLAIN output from the first 5 executions using the custom
> query plan:
>
> [hash join]
>
> Here is the EXPLAIN output from executions 6+ using the generic query plan:
>
> [nested loop join with over 800000 iterations]

I don't think that is a bug.  The logic that decides whether to use a
generic plan or not is a heuristic, and heuristics are bound to get it
wrong occasionally.

I'd guess that the first five executions happened to use values that
had few "contacts".

Either don't use a prepared statement for this statement, or make sure
that "plan_cache_mode" is set to "force_custom_plan" for that single
prepared statement.

Yours,
Laurenz Albe



Re: BUG #19076: Generic query plan is extremely slow

От
David Rowley
Дата:
On Wed, 8 Oct 2025 at 21:21, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> Either don't use a prepared statement for this statement, or make sure
> that "plan_cache_mode" is set to "force_custom_plan" for that single
> prepared statement.

That might be the best idea.  Going by the following fragment, I'm
suspecting that it's a multi-tenanted system and the query is running
on a larger than average tenant. At best when planning the generic
plan the planner can only assume there's going to be <estimated rows
in table> divided by <number of distinct tenant_ids in table> matching
"tenant_id = $1".

                        ->  Bitmap Index Scan on ix_contacts_tenant_id
(cost=0.00..9.74 rows=709 width=0) (actual time=32.114..32.114
rows=966884 loops=1)
                              Index Cond: (tenant_id = $1)

In this case that estimates to be 709, but in reality, for this
tenant, it's 966884. That results in the parameterized nested loop
having to do over 1000x more inner scans than planned.

Another option would be to check if pg_stats reports that n_distinct
is roughly accurate for this table. (see if it's close to select
count(distinct tenant_id) from contacts;). If that's set too high then
lowering it might help. Artificially lowering it could also be an
option, but that could lead to bad plans in other areas, so the
plan_cache_mode idea might be better.

David



Re: BUG #19076: Generic query plan is extremely slow

От
Premal Patel
Дата:
I see, thank you both. This gives me a better understanding of where this originates from.

I just have one follow up question: I have noticed similar issues in a few other queries made in my application. Would
itbe unsafe to set “plan_cache_mode” to “force_custom_plan” for the entirety of the connection? 

> On Oct 8, 2025, at 5:14 AM, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Wed, 8 Oct 2025 at 21:21, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>> Either don't use a prepared statement for this statement, or make sure
>> that "plan_cache_mode" is set to "force_custom_plan" for that single
>> prepared statement.
>
> That might be the best idea.  Going by the following fragment, I'm
> suspecting that it's a multi-tenanted system and the query is running
> on a larger than average tenant. At best when planning the generic
> plan the planner can only assume there's going to be <estimated rows
> in table> divided by <number of distinct tenant_ids in table> matching
> "tenant_id = $1".
>
>                        ->  Bitmap Index Scan on ix_contacts_tenant_id
> (cost=0.00..9.74 rows=709 width=0) (actual time=32.114..32.114
> rows=966884 loops=1)
>                              Index Cond: (tenant_id = $1)
>
> In this case that estimates to be 709, but in reality, for this
> tenant, it's 966884. That results in the parameterized nested loop
> having to do over 1000x more inner scans than planned.
>
> Another option would be to check if pg_stats reports that n_distinct
> is roughly accurate for this table. (see if it's close to select
> count(distinct tenant_id) from contacts;). If that's set too high then
> lowering it might help. Artificially lowering it could also be an
> option, but that could lead to bad plans in other areas, so the
> plan_cache_mode idea might be better.
>
> David



Re: BUG #19076: Generic query plan is extremely slow

От
Laurenz Albe
Дата:
On Wed, 2025-10-08 at 07:17 -0400, Premal Patel wrote:
> Would it be unsafe to set “plan_cache_mode” to “force_custom_plan”
> for the entirety of the connection?

No, that's not unsafe at all, but you'd lose the performance advantage
of reusing plans, which can be substantial for short statements.

Yours,
Laurenz Albe



Re: BUG #19076: Generic query plan is extremely slow

От
Andrei Lepikhov
Дата:
On 8/10/2025 11:13, David Rowley wrote:
> On Wed, 8 Oct 2025 at 21:21, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>> Either don't use a prepared statement for this statement, or make sure
>> that "plan_cache_mode" is set to "force_custom_plan" for that single
>> prepared statement.We already have a couple of proposals that would alleviate this 
recurring issue.
The first one [1] introduces 'referenced' planning for generic plans. 
There, the optimiser may use current values for incoming parameters to 
predict estimations, rather than using magic constants like 0.3 and 
0.005, which seem too conservative in many cases. To my knowledge, SQL 
Server, Oracle, and some other systems employ this approach. Although 
not the ideal solution, it is optional and may help people choose a 
proper planning strategy.

The second feature [2] is an addition to the first one. It slightly 
changes the behaviour of plan_cache_mode: it prioritises the strategy 
declared in the CachedPlanSource::cursor_options over plan_cache_mode. 
It provides users with the option to select a specific query and set a 
generic/custom/auto strategy to mitigate severe performance degradation.

With later changes in the master branch, these features open the door 
for extension developers to introduce 'execution-based' metrics and 
choose a plan type for a specific plan cache entry.

[1] 
https://www.postgresql.org/message-id/19919494-92a8-4905-a250-6cf17b89f7c3@gmail.com
[2] 
https://www.postgresql.org/message-id/458ace73-4827-43e1-8a30-734a93d4720f%40gmail.com

-- 
regards,
Andrei Lepikhov,
pgEdge