Re: Optimizing "exists"

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Optimizing "exists"
Дата
Msg-id 7267.1190135371@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Optimizing "exists"  (Steve Crawford <scrawford@pinpointresearch.com>)
Список pgsql-general
Steve Crawford <scrawford@pinpointresearch.com> writes:
> If the sub-select returns a large result set, will there be any benefit
> to adding "limit 1" to the sub-select or does the query planner
> automatically deduce that "limit 1" is the correct interpretation?

It does, although poking at it I notice a bit of a bug:

regression=# explain select * from tenk1 a join tenk1 b on a.unique1 = b.ten;
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Merge Join  (cost=2287.89..2438.58 rows=10000 width=488)
   Merge Cond: (a.unique1 = b.ten)
   ->  Index Scan using tenk1_unique1 on tenk1 a  (cost=0.00..1702.22 rows=10000 width=244)
   ->  Materialize  (cost=2287.89..2412.89 rows=10000 width=244)
         ->  Sort  (cost=2287.89..2312.89 rows=10000 width=244)
               Sort Key: b.ten
               ->  Seq Scan on tenk1 b  (cost=0.00..458.00 rows=10000 width=244)
(7 rows)

regression=# explain select exists(select * from tenk1 a join tenk1 b on a.unique1 = b.ten);
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Result  (cost=4822.00..4822.01 rows=1 width=0)
   InitPlan
     ->  Nested Loop  (cost=0.00..4822.00 rows=10000 width=488)
           ->  Seq Scan on tenk1 b  (cost=0.00..458.00 rows=10000 width=244)
           ->  Index Scan using tenk1_unique1 on tenk1 a  (cost=0.00..0.42 rows=1 width=244)
                 Index Cond: (a.unique1 = b.ten)
(6 rows)

The second case is correctly choosing a fast-start plan, but it's
reporting the sub-select's cost up to the next plan level as being the
full runtime instead of the expected partial runtime.  That has no
bad effect here, but might in a more complex situation where the
estimated subselect cost affected upper join order or some such.
If you were up against such a situation, an explicit LIMIT 1 would
probably help:

regression=# explain select exists(select * from tenk1 a join tenk1 b on a.unique1 = b.ten limit 1);
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Result  (cost=0.48..0.49 rows=1 width=0)
   InitPlan
     ->  Limit  (cost=0.00..0.48 rows=1 width=488)
           ->  Nested Loop  (cost=0.00..4822.00 rows=10000 width=488)
                 ->  Seq Scan on tenk1 b  (cost=0.00..458.00 rows=10000 width=244)
                 ->  Index Scan using tenk1_unique1 on tenk1 a  (cost=0.00..0.42 rows=1 width=244)
                       Index Cond: (a.unique1 = b.ten)
(7 rows)

Same subplan, but a saner cost estimate at the upper level...

            regards, tom lane

В списке pgsql-general по дате отправления:

Предыдущее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: update command question?
Следующее
От: Christian Schröder
Дата:
Сообщение: Performance Issues (was: "like" vs "substring" again)