Re: optimizer picks smaller table to drive nested loops?

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: optimizer picks smaller table to drive nested loops?
Дата
Msg-id 10251.1058205875@sss.pgh.pa.us
обсуждение исходный текст
Ответ на optimizer picks smaller table to drive nested loops?  (Greg Stark <gsstark@mit.edu>)
Ответы Re: optimizer picks smaller table to drive nested loops?  (Greg Stark <gsstark@mit.edu>)
Список pgsql-performance
Greg Stark <gsstark@mit.edu> writes:
> slo=> explain analyze select * from region, (select 1 union all select 2) as x;
>                                                        QUERY PLAN
  
>
------------------------------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=0.00..11162.00 rows=5534 width=108) (actual time=0.13..541.19 rows=5534 loops=1)
>    ->  Subquery Scan x  (cost=0.00..2.00 rows=2 width=0) (actual time=0.03..0.08 rows=2 loops=1)
>          ->  Append  (cost=0.00..2.00 rows=2 width=0) (actual time=0.02..0.05 rows=2 loops=1)
>                ->  Subquery Scan "*SELECT* 1"  (cost=0.00..1.00 rows=1 width=0) (actual time=0.01..0.02 rows=1
loops=1)
>                      ->  Result  (cost=0.00..1.00 rows=1 width=0) (actual time=0.01..0.01 rows=1 loops=1)
>                ->  Subquery Scan "*SELECT* 2"  (cost=0.00..1.00 rows=1 width=0) (actual time=0.01..0.02 rows=1
loops=1)
>                      ->  Result  (cost=0.00..1.00 rows=1 width=0) (actual time=0.01..0.01 rows=1 loops=1)
>    ->  Seq Scan on region  (cost=0.00..2813.00 rows=2767 width=104) (actual time=0.03..123.44 rows=2767 loops=2)
>  Total runtime: 566.24 msec
> (9 rows)

> Wouldn't it be faster to drive the nested loop the other way around?

You seem to be using a rather wacko value of cpu_tuple_cost; those
Result nodes ought to be costed at 0.01 not 1.00.  With the default
cost settings I get an other-way-around plan for a similar test.
(I used tenk1 from the regression database as the outer table.)

However, it looks to me like the subquery-scan-outside plan probably
is the faster one, on both my machine and yours.  I get

regression=# explain analyze select * from tenk1, (select 1 union all select 2) as x;
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..858.00 rows=20000 width=248) (actual time=0.42..3648.61 rows=20000 loops=1)
   ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244) (actual time=0.23..199.97 rows=10000 loops=1)
   ->  Subquery Scan x  (cost=0.00..0.02 rows=2 width=0) (actual time=0.07..0.24 rows=2 loops=10000)
         ->  Append  (cost=0.00..0.02 rows=2 width=0) (actual time=0.05..0.17 rows=2 loops=10000)
               ->  Subquery Scan "*SELECT* 1"  (cost=0.00..0.01 rows=1 width=0) (actual time=0.03..0.06 rows=1
loops=10000)
                     ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.02 rows=1 loops=10000)
               ->  Subquery Scan "*SELECT* 2"  (cost=0.00..0.01 rows=1 width=0) (actual time=0.03..0.06 rows=1
loops=10000)
                     ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.02 rows=1 loops=10000)
 Total runtime: 3807.39 msec
(9 rows)

regression=# set cpu_tuple_cost = 1;
SET
regression=# explain analyze select * from tenk1, (select 1 union all select 2) as x;
                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..40718.00 rows=20000 width=248) (actual time=0.39..1214.42 rows=20000 loops=1)
   ->  Subquery Scan x  (cost=0.00..2.00 rows=2 width=0) (actual time=0.10..0.31 rows=2 loops=1)
         ->  Append  (cost=0.00..2.00 rows=2 width=0) (actual time=0.06..0.22 rows=2 loops=1)
               ->  Subquery Scan "*SELECT* 1"  (cost=0.00..1.00 rows=1 width=0) (actual time=0.05..0.08 rows=1 loops=1)
                     ->  Result  (cost=0.00..1.00 rows=1 width=0) (actual time=0.03..0.04 rows=1 loops=1)
               ->  Subquery Scan "*SELECT* 2"  (cost=0.00..1.00 rows=1 width=0) (actual time=0.05..0.08 rows=1 loops=1)
                     ->  Result  (cost=0.00..1.00 rows=1 width=0) (actual time=0.02..0.03 rows=1 loops=1)
   ->  Seq Scan on tenk1  (cost=0.00..10358.00 rows=10000 width=244) (actual time=0.17..188.37 rows=10000 loops=2)
 Total runtime: 1371.17 msec
(9 rows)

The flipover point between the two plans is cpu_tuple_cost = 0.04 in
my tests.

It looks to me like we've neglected to charge any cost associated with
Subquery Scan or Append nodes.  Certainly Subquery Scan ought to charge
at least a cpu_tuple_cost per row.  Perhaps Append ought to as well ---
although since it doesn't do selection or projection, I'm not quite sure
where the time is going in that case.  (Hmmm... time to get out the
profiler...)

            regards, tom lane

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Nick Fankhauser"
Дата:
Сообщение: Re: Tunning FreeeBSD and PostgreSQL
Следующее
От: Greg Stark
Дата:
Сообщение: Re: optimizer picks smaller table to drive nested loops?