[PATCH] Add a guc parameter to control limit clause adjust path cost.
| От | Haiyang Li |
|---|---|
| Тема | [PATCH] Add a guc parameter to control limit clause adjust path cost. |
| Дата | |
| Msg-id | b64fb4cc-1fbe-4796-a79b-001b362138dc.mohen.lhy@alibaba-inc.com обсуждение исходный текст |
| Ответы |
Re: [PATCH] Add a guc parameter to control limit clause adjust path cost.
Re: [PATCH] Add a guc parameter to control limit clause adjust path cost. |
| Список | pgsql-hackers |
Hi all,
[1] https://pganalyze.com/blog/5mins-postgres-planner-order-by-limit
When a query contains LIMIT/OFFSET clauses, the optimizer uses this
information to influence path generation. For example, it may consider
the path with the lowest startup cost and adjust the cost of paths
accordingly.
In most cases, these behaviors are beneficial and result in a more optimal
path. However, there are some scenarios where these behaviors can lead to
a bad plan. One typical scenario is when a query contains both
ORDER BY and LIMIT clauses, and the optimizer chooses an incorrect index scan.
PostgreSQL makes very optimistic predictions about the path that uses an
ordered index to satisfy the row count requirement. When the actual scenario
differs from the prediction, PostgreSQL may generate a bad plan. One case
provided by Lukas [1] hits this issue. And another case i met recently is:
```
-- bad plan
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1.00..611.58 rows=200 width=419) (actual time=3848.321..3867.253 rows=200 loops=1)
Buffers: shared hit=5957080 (main=5957080 vm=0 fsm=0) read=1 (main=1 vm=0 fsm=0)
I/O Timings: shared/local read=0.518
Read Timings: total=0.534 ms buffer alloc=0.010 read io=0.519 page replay=0.002
-> Nested Loop (cost=1.00..1556085.69 rows=509704 width=419) (actual time=3848.320..3867.231 rows=200 loops=1)
Buffers: shared hit=5957080 (main=5957080 vm=0 fsm=0) read=1 (main=1 vm=0 fsm=0)
I/O Timings: shared/local read=0.518
Read Timings: total=0.534 ms buffer alloc=0.010 read io=0.519 page replay=0.002
-> Index Scan using "idx_outDetail_channel_id" on pmc_outcome_detail d (cost=0.43..369798.26 rows=641998 width=411) (actual time=0.042..758.725 rows=1172313 loops=1)
Index Cond: ((channel_id = ANY ('{7000,2000,4000}'::numeric[])) AND (channel_id = '4000'::numeric))
Filter: ((voucher_file_name IS NULL) AND (payment_type = ANY ('{7,8}'::numeric[])) AND (status = '3'::numeric))
Rows Removed by Filter: 586208
Buffers: shared hit=95516 (main=95516 vm=0 fsm=0)
-> Index Scan using pmc_outcome_pkey on pmc_outcome o (cost=0.56..1.85 rows=1 width=18) (actual time=0.002..0.002 rows=0 loops=1172313)
Index Cond: (id = d.outcome_id)
Filter: (user_type = '1'::numeric)
Rows Removed by Filter: 1
Buffers: shared hit=5861564 (main=5861564 vm=0 fsm=0) read=1 (main=1 vm=0 fsm=0)
I/O Timings: shared/local read=0.518
Read Timings: total=0.534 ms buffer alloc=0.010 read io=0.519 page replay=0.002
Planning:
Buffers: shared hit=20 (main=16 vm=4 fsm=0)
Planning Time: 0.269 ms
Execution Time: 3867.322 ms
(24 rows)
-- good plan
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=15964.44..595103.67 rows=200 width=419) (actual time=809.243..818.407 rows=200 loops=1)
Buffers: shared hit=5957257 (main=5957257 vm=0 fsm=0) read=5 (main=5 vm=0 fsm=0)
I/O Timings: shared/local read=1.219
Read Timings: total=1.243 ms buffer alloc=0.015 read io=1.219 page replay=0.004
-> Gather (cost=15964.44..595103.67 rows=509704 width=419) (actual time=809.242..818.381 rows=200 loops=1)
Workers Planned: 6
Workers Launched: 6
Buffers: shared hit=5957257 (main=5957257 vm=0 fsm=0) read=5 (main=5 vm=0 fsm=0)
I/O Timings: shared/local read=1.219
Read Timings: total=1.243 ms buffer alloc=0.015 read io=1.219 page replay=0.004
-> Nested Loop (cost=14964.44..543133.27 rows=84951 width=419) (actual time=801.757..803.199 rows=31 loops=7)
Buffers: shared hit=5957257 (main=5957257 vm=0 fsm=0) read=5 (main=5 vm=0 fsm=0)
I/O Timings: shared/local read=1.219
Read Timings: total=1.243 ms buffer alloc=0.015 read io=1.219 page replay=0.004
-> Parallel Bitmap Heap Scan on pmc_outcome_detail d (cost=14963.87..345418.09 rows=107000 width=411) (actual time=107.837..210.957 rows=167477 loops=7)
Recheck Cond: ((channel_id = ANY ('{7000,2000,4000}'::numeric[])) AND (channel_id = '4000'::numeric))
Filter: ((voucher_file_name IS NULL) AND (payment_type = ANY ('{7,8}'::numeric[])) AND (status = '3'::numeric))
Rows Removed by Filter: 83745
Heap Blocks: exact=13247
Buffers: shared hit=95541 (main=95541 vm=0 fsm=0) read=5 (main=5 vm=0 fsm=0)
I/O Timings: shared/local read=1.219
Read Timings: total=1.243 ms buffer alloc=0.015 read io=1.219 page replay=0.004
-> Bitmap Index Scan on "idx_outDetail_channel_id" (cost=0.00..14803.38 rows=1072436 width=0) (actual time=101.764..101.764 rows=1820498 loops=1)
Index Cond: ((channel_id = ANY ('{7000,2000,4000}'::numeric[])) AND (channel_id = '4000'::numeric))
Buffers: shared hit=6016 (main=6016 vm=0 fsm=0) read=5 (main=5 vm=0 fsm=0)
I/O Timings: shared/local read=1.219
Read Timings: total=1.243 ms buffer alloc=0.015 read io=1.219 page replay=0.004
-> Index Scan using pmc_outcome_pkey on pmc_outcome o (cost=0.56..1.85 rows=1 width=18) (actual time=0.003..0.003 rows=0 loops=1172342)
Index Cond: (id = d.outcome_id)
Filter: (user_type = '1'::numeric)
Rows Removed by Filter: 1
Buffers: shared hit=5861716 (main=5861716 vm=0 fsm=0)
Planning:
Buffers: shared hit=20 (main=16 vm=4 fsm=0)
Planning Time: 0.299 ms
Execution Time: 818.499 ms
(36 rows)
```
In such cases, users do not have a good way to force the optimizer to behave differently.
To address this, I have added a GUC parameter to control the
optimizer's use of LIMIT clauses for adjusting the cost of paths,
providing a means to force intervention. We have also considered similar
scenarios, such as MIN/MAX. Note that we still retain the optimizer's
use of LIMIT clauses to adjust the number of rows, as this is always
reasonable. The patch is provided in attachment(not add test case yet).
Any thoughts?
Regards,
Haiyang Li
Вложения
В списке pgsql-hackers по дате отправления: