Re: Improvement discussion of custom and generic plans

Поиск
Список
Период
Сортировка
От Quan Zongliang
Тема Re: Improvement discussion of custom and generic plans
Дата
Msg-id f50cadde-6cf2-4dee-a2a9-75f274d4c0a0@yeah.net
обсуждение исходный текст
Ответ на Improvement discussion of custom and generic plans  (Quan Zongliang <quanzongliang@yeah.net>)
Ответы Re: Improvement discussion of custom and generic plans  (Quan Zongliang <quanzongliang@yeah.net>)
Список pgsql-hackers

On 2023/11/3 15:27, Quan Zongliang wrote:
> Hi
> 
> We have one such problem. A table field has skewed data. Statistics:
> n_distinct             | -0.4481973
> most_common_vals       | {5f006ca25b52ed78e457b150ee95a30c}
> most_common_freqs      | {0.5518474}
> 
> 
> Data generation:
> 
> CREATE TABLE s_user (
>       user_id varchar(32) NOT NULL,
>       corp_id varchar(32),
>       status int NOT NULL
>   );
> 
> insert into s_user
> select md5('user_id ' || a), md5('corp_id ' || a),
>       case random()<0.877675 when true then 1 else -1 end
>    FROM generate_series(1,10031) a;
> 
> insert into s_user
> select md5('user_id ' || a), md5('corp_id 10032'),
>       case random()<0.877675 when true then 1 else -1 end
>    FROM generate_series(10031,22383) a;
> 
> CREATE INDEX s_user_corp_id_idx ON s_user USING btree (corp_id);
> 
> analyze s_user;
> 
> 
> 1. First, define a PREPARE statement
> prepare stmt as select count(*) from s_user where status=1 and corp_id = 
> $1;
> 
> 2. Run it five times. Choose the custom plan.
> explain (analyze,buffers) execute stmt('5f006ca25b52ed78e457b150ee95a30c');
> 
> Here's the plan:
>   Aggregate  (cost=639.84..639.85 rows=1 width=8) (actual 
> time=4.653..4.654 rows=1 loops=1)
>     Buffers: shared hit=277
>     ->  Seq Scan on s_user  (cost=0.00..612.76 rows=10830 width=0) 
> (actual time=1.402..3.747 rows=10836 loops=1)
>           Filter: ((status = 1) AND ((corp_id)::text = 
> '5f006ca25b52ed78e457b150ee95a30c'::text))
>           Rows Removed by Filter: 11548
>           Buffers: shared hit=277
>   Planning Time: 0.100 ms
>   Execution Time: 4.674 ms
> (8 rows)
> 
> 3.From the sixth time. Choose generic plan.
> We can see that there is a huge deviation between the estimate and the 
> actual value:
>   Aggregate  (cost=11.83..11.84 rows=1 width=8) (actual 
> time=4.424..4.425 rows=1 loops=1)
>     Buffers: shared hit=154 read=13
>     ->  Bitmap Heap Scan on s_user  (cost=4.30..11.82 rows=2 width=0) 
> (actual time=0.664..3.371 rows=10836 loops=1)
>           Recheck Cond: ((corp_id)::text = $1)
>           Filter: (status = 1)
>           Rows Removed by Filter: 1517
>           Heap Blocks: exact=154
>           Buffers: shared hit=154 read=13
>           ->  Bitmap Index Scan on s_user_corp_id_idx  (cost=0.00..4.30 
> rows=2 width=0) (actual time=0.635..0.635 rows=12353 loops=1)
>                 Index Cond: ((corp_id)::text = $1)
>                 Buffers: shared read=13
>   Planning Time: 0.246 ms
>   Execution Time: 4.490 ms
> (13 rows)
> 
> This is because in the choose_custom_plan function, the generic plan is 
> attempted after executing the custom plan five times.
> 
>      if (plansource->num_custom_plans < 5)
>          return true;
> 
> The generic plan uses var_eq_non_const to estimate the average selectivity.
> 
> These are facts that many people already know. So a brief introduction.
> 
> 
> Our users actually use such parameter conditions in very complex PREPARE 
> statements. Once they use the generic plan for the sixth time. The 
> execution time will change from 5 milliseconds to 5 minutes.
> 
> 
> To improve this problem. The following approaches can be considered:
> 
> 1. Determine whether data skew exists in the PREPARE statement parameter 
> conditions based on the statistics.
> However, there is no way to know if the user will use the skewed parameter.
> 
> 2.When comparing the cost of the generic plan with the average cost of 
> the custom plan(function choose_custom_plan). Consider whether the 
> maximum cost of a custom plan executed is an order of magnitude 
> different from the cost of a generic plan.
> If the first five use a small selectivity condition. And after the sixth 
> use a high selectivity condition. Problems will still arise.
> 
> 3.Trace the execution time of the PREPARE statement. When an execution 
> time is found to be much longer than the average execution time, the 
> custom plan is forced to run.
> 
> 
> Is there any better idea?
> 
I tried to do a demo. Add a member paramid to Const. When Const is 
generated by Param, the Const is identified as coming from Param. Then 
check in var_eq_const to see if the field in the condition using this 
parameter is skewed. If so, choose_custom_plan returns true every time, 
forcing custom_plan to be used.
Only conditional expressions such as var eq param or param eq var can be 
supported.
If it makes sense. Continue to improve this patch.

> -- 
> Quan Zongliang
> 
>
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Kuzmenkov
Дата:
Сообщение: Re: Incorrect cost for MergeAppend
Следующее
От: "Andrey M. Borodin"
Дата:
Сообщение: Re: UUID v7