EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions

Поиск
Список
Период
Сортировка
От Merlin Moncure
Тема EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions
Дата
Msg-id CAHyXU0yTy1Q7Qe-rozHfejkTYg0LHWkHT7k7A+3H=xmjbs6PmQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hello hackers,

Observe the following test case (apologies if this is a well
understood problem):

create temp table foo as select generate_series(1,1000000) id;
create index on foo(id);

create temp table bar as select id, id % 100000 = 0 as good from
generate_series(1,1000000) id;
create index on bar(good);

analyze foo;
analyze bar;

explain analyze select * from foo where false or exists (select 1 from
bar where good and foo.id = bar.id);  -- A
explain analyze select * from foo where exists (select 1 from bar
where good and foo.id = bar.id);  -- B

These queries are trivially verified as identical but give very different plans.
A gives                                                         QUERY PLAN

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────Seq
Scanon foo  (cost=0.00..4459425.00 rows=500000 width=4) (actual
 
time=13.299..130.271 rows=10 loops=1)  Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)  Rows Removed by Filter:
999990 SubPlan 1    ->  Index Scan using bar_good_idx on bar  (cost=0.42..4.45 rows=1
 
width=0) (never executed)          Index Cond: (good = true)          Filter: (good AND (foo.id = id))  SubPlan 2    ->
Index Scan using bar_good_idx on bar bar_1  (cost=0.42..4.44
 
rows=1 width=4) (actual time=0.024..0.055 rows=10 loops=1)          Index Cond: (good = true)          Filter:
goodPlanningtime: 0.103 msExecution time: 130.312 ms
 

B gives                                                         QUERY PLAN

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────Nested
Loop (cost=4.87..12.91 rows=1 width=4) (actual
 
time=0.075..0.161 rows=10 loops=1)  ->  HashAggregate  (cost=4.45..4.46 rows=1 width=4) (actual
time=0.058..0.060 rows=10 loops=1)        Group Key: bar.id        ->  Index Scan using bar_good_idx on bar
(cost=0.42..4.44
rows=1 width=4) (actual time=0.018..0.045 rows=10 loops=1)              Index Cond: (good = true)              Filter:
good ->  Index Only Scan using foo_id_idx on foo  (cost=0.42..8.44
 
rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=10)        Index Cond: (id = bar.id)        Heap Fetches:
10Planningtime: 0.193 msExecution time: 0.187 ms
 

This is a general problem to OR expressions while AND expressions will
generally pass the optimization through.   The 'old school'
optimization approach is to rewrite the OR expressions to UNION ALL
but this can have unpleasant downstream effects on the query in real
world scenarios.  The question is: can the one time filter logic be
expanded such the first query can be functionally be written into the
second one?  This type of query happens a lot when trying to mix
multiple different filtering expressions (a 'filter mode' if you will)
in a single query based on a user supplied switch.  Food for thought.

merlin

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: MultiXactId error after upgrade to 9.3.4
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Reviewing freeze map code