Обсуждение: BUG #19076: Generic query plan is extremely slow
The following bug has been logged on the website: Bug reference: 19076 Logged by: Premal Patel Email address: premal.patel@veeva.com PostgreSQL version: 17.6 Operating system: macOS 15.6.1 (observed on other Linux based OS) Description: I have an application which uses a database driver that creates prepared statements. I am noticing that for some queries, the first 5 executions after the database connection is created are performant. After this, the prepared statement uses the generic query plan and is incredibly slow. This is even if I do not change the parameters and simple re-run the same query several times with the same arguments. I have tried re-running ANALYZE on the tables and setting STATISTICS to 1000 on the rows in question with no luck. Here is what my query looks like: SELECT accounts_contacts.account_id, count(contacts.id) AS count FROM accounts_contacts JOIN contacts ON contacts.id = accounts_contacts.contact_id WHERE accounts_contacts.account_id IN (...) AND contacts.tenant_id = $1::UUID AND contacts.status = $2::VARCHAR GROUP BY accounts_contacts.account_id Here is the EXPLAIN output from the first 5 executions using the custom query plan: Finalize HashAggregate (cost=47034.52..47131.24 rows=9672 width=24) (actual time=103.173..103.822 rows=1000 loops=1) Group Key: accounts_contacts.account_id Batches: 1 Memory Usage: 465kB -> Gather (cost=44906.68..46937.80 rows=19344 width=24) (actual time=102.801..103.506 rows=2326 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial HashAggregate (cost=43906.68..44003.40 rows=9672 width=24) (actual time=93.682..93.757 rows=775 loops=3) Group Key: accounts_contacts.account_id Batches: 1 Memory Usage: 465kB Worker 0: Batches: 1 Memory Usage: 465kB Worker 1: Batches: 1 Memory Usage: 465kB -> Hash Join (cost=6066.79..43819.26 rows=17484 width=32) (actual time=12.902..92.674 rows=11509 loops=3) Hash Cond: (contacts.id = accounts_contacts.contact_id) -> Parallel Seq Scan on contacts (cost=0.00..36177.50 rows=373366 width=16) (actual time=0.435..55.023 rows=288122 loops=3) Filter: ((tenant_id = '0199729c-3f1e-712f-8b71-f6545a96b225'::uuid) AND ((status)::text = 'ACTIVE'::text)) Rows Removed by Filter: 162518 -> Hash (cost=5275.46..5275.46 rows=63307 width=32) (actual time=12.053..12.054 rows=37282 loops=3) Buckets: 65536 Batches: 1 Memory Usage: 2843kB -> Index Only Scan using uq_accounts_contacts_account_id_contact_id on accounts_contacts (cost=0.42..5275.46 rows=63307 width=32) (actual time=0.335..5.220 rows=37282 loops=3) Index Cond: (account_id = ANY (... Here is the EXPLAIN output from executions 6+ using the generic query plan: GroupAggregate (cost=5821.21..5821.56 rows=20 width=24) (actual time=7178.328..7179.934 rows=1000 loops=1) Group Key: accounts_contacts.account_id -> Sort (cost=5821.21..5821.26 rows=20 width=32) (actual time=7178.314..7178.943 rows=34526 loops=1) Sort Key: accounts_contacts.account_id Sort Method: quicksort Memory: 2984kB -> Nested Loop (cost=10.26..5820.77 rows=20 width=32) (actual time=34.187..7174.937 rows=34526 loops=1) -> Bitmap Heap Scan on contacts (cost=9.83..2490.09 rows=354 width=16) (actual time=34.071..193.216 rows=864365 loops=1) Recheck Cond: (tenant_id = $1) Filter: ((status)::text = ($2)::text) Rows Removed by Filter: 102519 Heap Blocks: exact=26858 -> Bitmap Index Scan on ix_contacts_tenant_id (cost=0.00..9.74 rows=709 width=0) (actual time=32.114..32.114 rows=966884 loops=1) Index Cond: (tenant_id = $1) -> Index Scan using ix_accounts_contacts_contact_id on accounts_contacts (cost=0.42..9.40 rows=1 width=32) (actual time=0.008..0.008 rows=0 loops=864365) Index Cond: (contact_id = contacts.id) Filter: (account_id = ANY (ARRAY[...])) Rows Removed by Filter: 1 (please note I omitted the account_ids list, but it is of length 1000 in both cases) I have seen others report the same issue, as seen in this forum: https://dba.stackexchange.com/questions/289130/postgresql-changes-same-query-plan-at-runtime
On Tue, 2025-10-07 at 20:32 +0000, PG Bug reporting form wrote: > PostgreSQL version: 17.6 > Operating system: macOS 15.6.1 (observed on other Linux based OS) > > I have an application which uses a database driver that creates prepared > statements. I am noticing that for some queries, the first 5 executions > after the database connection is created are performant. After this, the > prepared statement uses the generic query plan and is incredibly slow. This > is even if I do not change the parameters and simple re-run the same query > several times with the same arguments. I have tried re-running ANALYZE on > the tables and setting STATISTICS to 1000 on the rows in question with no > luck. Here is what my query looks like: > > SELECT accounts_contacts.account_id, count(contacts.id) AS count > FROM accounts_contacts JOIN contacts ON contacts.id = > accounts_contacts.contact_id > WHERE accounts_contacts.account_id IN (...) AND contacts.tenant_id = > $1::UUID AND contacts.status = $2::VARCHAR > GROUP BY accounts_contacts.account_id > > Here is the EXPLAIN output from the first 5 executions using the custom > query plan: > > [hash join] > > Here is the EXPLAIN output from executions 6+ using the generic query plan: > > [nested loop join with over 800000 iterations] I don't think that is a bug. The logic that decides whether to use a generic plan or not is a heuristic, and heuristics are bound to get it wrong occasionally. I'd guess that the first five executions happened to use values that had few "contacts". Either don't use a prepared statement for this statement, or make sure that "plan_cache_mode" is set to "force_custom_plan" for that single prepared statement. Yours, Laurenz Albe
On Wed, 8 Oct 2025 at 21:21, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > Either don't use a prepared statement for this statement, or make sure > that "plan_cache_mode" is set to "force_custom_plan" for that single > prepared statement. That might be the best idea. Going by the following fragment, I'm suspecting that it's a multi-tenanted system and the query is running on a larger than average tenant. At best when planning the generic plan the planner can only assume there's going to be <estimated rows in table> divided by <number of distinct tenant_ids in table> matching "tenant_id = $1". -> Bitmap Index Scan on ix_contacts_tenant_id (cost=0.00..9.74 rows=709 width=0) (actual time=32.114..32.114 rows=966884 loops=1) Index Cond: (tenant_id = $1) In this case that estimates to be 709, but in reality, for this tenant, it's 966884. That results in the parameterized nested loop having to do over 1000x more inner scans than planned. Another option would be to check if pg_stats reports that n_distinct is roughly accurate for this table. (see if it's close to select count(distinct tenant_id) from contacts;). If that's set too high then lowering it might help. Artificially lowering it could also be an option, but that could lead to bad plans in other areas, so the plan_cache_mode idea might be better. David
I see, thank you both. This gives me a better understanding of where this originates from. I just have one follow up question: I have noticed similar issues in a few other queries made in my application. Would itbe unsafe to set “plan_cache_mode” to “force_custom_plan” for the entirety of the connection? > On Oct 8, 2025, at 5:14 AM, David Rowley <dgrowleyml@gmail.com> wrote: > > On Wed, 8 Oct 2025 at 21:21, Laurenz Albe <laurenz.albe@cybertec.at> wrote: >> Either don't use a prepared statement for this statement, or make sure >> that "plan_cache_mode" is set to "force_custom_plan" for that single >> prepared statement. > > That might be the best idea. Going by the following fragment, I'm > suspecting that it's a multi-tenanted system and the query is running > on a larger than average tenant. At best when planning the generic > plan the planner can only assume there's going to be <estimated rows > in table> divided by <number of distinct tenant_ids in table> matching > "tenant_id = $1". > > -> Bitmap Index Scan on ix_contacts_tenant_id > (cost=0.00..9.74 rows=709 width=0) (actual time=32.114..32.114 > rows=966884 loops=1) > Index Cond: (tenant_id = $1) > > In this case that estimates to be 709, but in reality, for this > tenant, it's 966884. That results in the parameterized nested loop > having to do over 1000x more inner scans than planned. > > Another option would be to check if pg_stats reports that n_distinct > is roughly accurate for this table. (see if it's close to select > count(distinct tenant_id) from contacts;). If that's set too high then > lowering it might help. Artificially lowering it could also be an > option, but that could lead to bad plans in other areas, so the > plan_cache_mode idea might be better. > > David
On Wed, 2025-10-08 at 07:17 -0400, Premal Patel wrote: > Would it be unsafe to set “plan_cache_mode” to “force_custom_plan” > for the entirety of the connection? No, that's not unsafe at all, but you'd lose the performance advantage of reusing plans, which can be substantial for short statements. Yours, Laurenz Albe
On 8/10/2025 11:13, David Rowley wrote: > On Wed, 8 Oct 2025 at 21:21, Laurenz Albe <laurenz.albe@cybertec.at> wrote: >> Either don't use a prepared statement for this statement, or make sure >> that "plan_cache_mode" is set to "force_custom_plan" for that single >> prepared statement.We already have a couple of proposals that would alleviate this recurring issue. The first one [1] introduces 'referenced' planning for generic plans. There, the optimiser may use current values for incoming parameters to predict estimations, rather than using magic constants like 0.3 and 0.005, which seem too conservative in many cases. To my knowledge, SQL Server, Oracle, and some other systems employ this approach. Although not the ideal solution, it is optional and may help people choose a proper planning strategy. The second feature [2] is an addition to the first one. It slightly changes the behaviour of plan_cache_mode: it prioritises the strategy declared in the CachedPlanSource::cursor_options over plan_cache_mode. It provides users with the option to select a specific query and set a generic/custom/auto strategy to mitigate severe performance degradation. With later changes in the master branch, these features open the door for extension developers to introduce 'execution-based' metrics and choose a plan type for a specific plan cache entry. [1] https://www.postgresql.org/message-id/19919494-92a8-4905-a250-6cf17b89f7c3@gmail.com [2] https://www.postgresql.org/message-id/458ace73-4827-43e1-8a30-734a93d4720f%40gmail.com -- regards, Andrei Lepikhov, pgEdge