Pushing down join clauses into subqueries

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Pushing down join clauses into subqueries
Дата
Msg-id 7105c8f3-e16f-ee54-9ce9-c4ac2384791e@iki.fi
обсуждение исходный текст
Список pgsql-hackers
It would be nice, if we could push down join quals into subqueries. For 
example:

create table small_table (i int);
create table big_table (i int, j int);

insert into small_table values (1), (2); -- two rows
insert into big_table select g/10, g from generate_series(1, 100000) g; 
-- million rows

create index on big_table(i);

select *
from small_table,
  (select i, count(*) from big_table group by i) as sq
where small_table.i = sq.i;

Currently, we will fully materialize the subquery, and then filter the 
rows find the rows that match the outer query:

                                     QUERY PLAN 

-----------------------------------------------------------------------------------
  Hash Join  (cost=1949.50..2017.08 rows=2550 width=16)
    Hash Cond: (small_table.i = big_table.i)
    ->  Seq Scan on small_table  (cost=0.00..35.50 rows=2550 width=4)
    ->  Hash  (cost=1947.00..1947.00 rows=200 width=12)
          ->  HashAggregate  (cost=1943.00..1945.00 rows=200 width=12)
                Group Key: big_table.i
                ->  Seq Scan on big_table  (cost=0.00..1443.00 
rows=100000 width=4)
(7 rows)

For this query, a nested loop join, with an index scan on big_table, 
would be a better plan. Doing that would require creating parameterized 
paths, planning the subquery multiple times, with and without the 
pushed-down join quals.

However, there's one special case: What if the subquery is LATERAL? In 
that case, the subquery is already parameterized. We could push down 
join quals, which refer the same "other" relations that are already 
laterally referenced, without making it any more lateral.

Attached is a patch to do that (the first patch). I'm not too familiar 
with this code; does it look sane?

One thing that I wasn't clear on, is the intended behavior of 
ReplaceVarsFromTargetList(). In the patch, when a join qual is pushed 
down, the Vars in the join clause that refer to the subquery's output, 
are replaced with the subquery's target list entries. That's the same 
thing we do when pushing down regular, non-join quals, too. But when 
pushing down a join qual, the qual will also include Vars for other 
relations on the same level, not just Vars for the subquery. When the 
qual is pushed down to the subquery, the Vars referring to other 
relations need to have their varlevelsup incremented. In the patch, I 
solved this by first calling IncrementVarSublevelsUp() on the qual, to 
bump up varlevelsup for all Vars, and then ReplaceVarsFromTargetList(), 
with sublevels_up = 1. But that way, when ReplaceVarsFromTargetList() 
replaces Vars with the target list entries, it also adjusts varlevelsup 
in any Vars in the target list entry. I changed it to not do that 
anymore. All the existing callers call ReplaceVarsFromTargetList() with 
sublevels_up == 0, so they shouldn't be affected, but I wonder what the 
original intention here was?


Attached is also a second, much more work-in-progress patch, that 
expands the push-down support to non-LATERAL subqueries. Yes, that means 
that the subquery is planned multiple times. I think that needs some 
further heuristics, or perhaps we could pass the list of "potential" 
join clauses to the subquery when it's planned for the first time, and 
have subquery_planner() tell which of them might be useful, e.g. because 
there are indexes to back them.

- Heikki

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: I'd like to discuss scaleout at PGCon
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: buildfarm vs code