Re: Improve choose_custom_plan for initial partition prune case

Поиск
Список
Период
Сортировка
От Amit Langote
Тема Re: Improve choose_custom_plan for initial partition prune case
Дата
Msg-id CA+HiwqHHuESUk7veyfvcpqpBZ=LN_VVx=dO3o76nm+O+mjC7VQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Improve choose_custom_plan for initial partition prune case  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Ответы Re: Improve choose_custom_plan for initial partition prune case  (Andy Fan <zhihui.fan1213@gmail.com>)
Список pgsql-hackers
Hi Ashutosh,

On Wed, Oct 7, 2020 at 8:40 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> On Wed, Oct 7, 2020 at 11:20 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
> > On Mon, Oct 5, 2020 at 9:27 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
> >> What happens when we
> >> execute plans with values that have estimates similar to the generic
> >> plan later when we moderate generic plan costs based on the custom
> >> plans?
> >>
> >
> > The example at the beginning of this thread,  I used the exact same values
> > every time, the custom plan will be chosen all the time, which is bad,
> > The main reason is the custom plan knows the exact value in Execute
> > message, so it run plan time partition prune, then the total cost is low, however
> > for the generic plan the partition prune happens at Runtime initial_partition prune
> > stage,  so the cost of the partitions which can be pruned at that stage is still
> > included the total cost,  so generic plans can't be chosen. that would be the
> > thing I want to fix.
>
> Something is wrong in the generic plan costing then.

That's right.

> IIUC, the
> selectivity estimate for only a single partition should come out >= 1.
> For all the other partitions, it should be 1 and that too because we
> clamp the row counts. So the final costs for generic and custom plans
> shouldn't be far off unless there's large deviation in the selectivity
> of a partition key. I am assuming that there's an equality condition
> on a partition key. That's what I meant by the paragraph below.
> >
> >> If the table has good distribution of a partition key, which also
> >> results in good distribution of data across partitions, generic plan
> >> cost will be similar to the custom plan costs. If not that's something
> >> we want to fix.
>
> Can you please investigate on these lines?

The planner currently costs a generic plan assuming that *all*
partitions will be scanned, which is not the case, because run-time
pruning will ensure that scan nodes of all but one partition will be
discarded, assuming a simple query like `select * from parted_table
where a = $1`.  With N partitions, a generic plan for such a query is
N times as expensive as a custom plan, so will never win.

For example:

create table bar (a int, b int, c int) partition by range (a);
create table bar_1 partition of bar for values from (1) to (100001);
create table bar_2 partition of bar for values from (100001) to (200001);
create table bar_3 partition of bar for values from (200001) to (300001);
create table bar_4 partition of bar for values from (300001) to (400001);
create table bar_5 partition of bar for values from (400001) to (500001);
create table bar_6 partition of bar for values from (500001) to (600001);
create table bar_7 partition of bar for values from (600001) to (700001);
create table bar_8 partition of bar for values from (700001) to (800001);
create table bar_9 partition of bar for values from (800001) to (900001);
create table bar_10 partition of bar for values from (900001) to (1000001);
insert into bar select generate_series(1, 1000000);
create index on bar (a);
analyze bar;

set plan_cache_mode to force_custom_plan ;
prepare q as select * from bar where a = $1;
explain execute q (1);
                                  QUERY PLAN
------------------------------------------------------------------------------
 Index Scan using bar_1_a_idx on bar_1 bar  (cost=0.29..8.31 rows=1 width=12)
   Index Cond: (a = 1)
(2 rows)

deallocate q;
set plan_cache_mode to force_generic_plan ;
prepare q as select * from bar where a = $1;
explain execute q (1);
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Append  (cost=0.29..83.15 rows=10 width=12)
   Subplans Removed: 9
   ->  Index Scan using bar_1_a_idx on bar_1  (cost=0.29..8.31 rows=1 width=12)
         Index Cond: (a = $1)
(4 rows)

As you can see, the cost of the generic plan is 83.13 which is 8.31 *
10, even though only one partition is actually scanned, so the actual
cost is the same as a custom plan.  The planner fails to consider that
the those 9 subplans will be removed during execution.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: pg_dump bug for extension owned tables