Обсуждение: join ordering

Поиск
Список
Период
Сортировка

join ordering

От
Robert Haas
Дата:
I have a query that performs poorly which can be simplified to the
following test case (v8.3.6).

CREATE TABLE foo (id integer, primary key (id));
INSERT INTO foo SELECT generate_series(1,10);
CREATE TABLE bar (id integer, foo_id integer not null references foo (id),   PRIMARY KEY (id));
INSERT INTO bar SELECT g, g % 10 + 1 FROM generate_series(1,10000) g;

ANALYZE;

EXPLAIN ANALYZE
SELECT v.id FROM (VALUES (1, 1)) v (id, bar_id)   LEFT JOIN (bar JOIN foo ON bar.foo_id = foo.id) ON v.bar_id =
bar.id;
                                                    QUERY PLAN

--------------------------------------------------------------------------------------------------------------------Nested
LoopLeft Join  (cost=1.23..408.74 rows=1 width=4) (actual
 
time=0.405..63.585 rows=1 loops=1)  Join Filter: ("*VALUES*".column2 = bar.id)  ->  Values Scan on "*VALUES*"
(cost=0.00..0.01rows=1 width=8)
 
(actual time=0.015..0.017 rows=1 loops=1)  ->  Hash Join  (cost=1.23..283.73 rows=10000 width=4) (actual
time=0.367..49.029 rows=10000 loops=1)        Hash Cond: (bar.foo_id = foo.id)        ->  Seq Scan on bar
(cost=0.00..145.00rows=10000 width=8)
 
(actual time=0.042..15.562 rows=10000 loops=1)        ->  Hash  (cost=1.10..1.10 rows=10 width=4) (actual
time=0.143..0.143 rows=10 loops=1)              ->  Seq Scan on foo  (cost=0.00..1.10 rows=10 width=4)
(actual time=0.086..0.105 rows=10 loops=1)Total runtime: 63.893 ms
(9 rows)

This isn't a very good plan.  What we should do is first join the
values expression against bar, and then join the resulting rows
against foo.  The optimizer doesn't want to do that, and I think the
reason is because it knows that the left join might introduce null
values into the result of (VALUES (...) LEFT JOIN bar) which would
then cause the join against foo to produce different results.  But in
practice, since foo.id is not null and = is strict, it's equivalent to
the following, which the planner handles much better.

EXPLAIN ANALYZE
SELECT v.id FROM (VALUES (1, 1)) v (id, bar_id)   LEFT JOIN (bar LEFT JOIN foo ON bar.foo_id = foo.id) ON v.bar_id =
bar.id;                                                      QUERY PLAN
 

--------------------------------------------------------------------------------------------------------------------------Nested
LoopLeft Join  (cost=0.00..8.57 rows=1 width=4) (actual
 
time=0.079..0.150 rows=1 loops=1)  ->  Nested Loop Left Join  (cost=0.00..8.29 rows=1 width=8) (actual
time=0.058..0.120 rows=1 loops=1)        ->  Values Scan on "*VALUES*"  (cost=0.00..0.01 rows=1
width=8) (actual time=0.006..0.008 rows=1 loops=1)        ->  Index Scan using bar_pkey on bar  (cost=0.00..8.27
rows=1
width=8) (actual time=0.039..0.044 rows=1 loops=1)              Index Cond: ("*VALUES*".column2 = bar.id)  ->  Index
Scanusing foo_pkey on foo  (cost=0.00..0.27 rows=1
 
width=4) (actual time=0.012..0.015 rows=1 loops=1)        Index Cond: (bar.foo_id = foo.id)Total runtime: 0.312 ms
(8 rows)

...Robert


Re: join ordering

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> This isn't a very good plan.  What we should do is first join the
> values expression against bar, and then join the resulting rows
> against foo.  The optimizer doesn't want to do that, and I think the
> reason is because it knows that the left join might introduce null
> values into the result of (VALUES (...) LEFT JOIN bar) which would
> then cause the join against foo to produce different results.

Exactly.  Inner and outer joins don't commute in general.

> But in
> practice, since foo.id is not null and = is strict, it's equivalent to
> the following, which the planner handles much better.

Nonsense; those conditions are not sufficient to prove what you wish.
I think it is actually true given that the foreign key relationship
together with the not null on foo_id (NOT foo.id) implies that every row
of bar must have a join partner in foo; but not without that.

If we had any FK analysis in the optimizer (which we don't at present)
I think the deduction you'd really want is that foo can be removed from
the query altogether, because actually every row of bar must have
*exactly* one join partner in foo, and we don't care about the values of
foo otherwise.
        regards, tom lane


Re: join ordering

От
Robert Haas
Дата:
On Mon, Apr 13, 2009 at 7:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> This isn't a very good plan.  What we should do is first join the
>> values expression against bar, and then join the resulting rows
>> against foo.  The optimizer doesn't want to do that, and I think the
>> reason is because it knows that the left join might introduce null
>> values into the result of (VALUES (...) LEFT JOIN bar) which would
>> then cause the join against foo to produce different results.
>
> Exactly.  Inner and outer joins don't commute in general.
>
>> But in
>> practice, since foo.id is not null and = is strict, it's equivalent to
>> the following, which the planner handles much better.
>
> Nonsense; those conditions are not sufficient to prove what you wish.
> I think it is actually true given that the foreign key relationship
> together with the not null on foo_id (NOT foo.id) implies that every row
> of bar must have a join partner in foo; but not without that.

Yeah, good point.

> If we had any FK analysis in the optimizer (which we don't at present)
> I think the deduction you'd really want is that foo can be removed from
> the query altogether, because actually every row of bar must have
> *exactly* one join partner in foo, and we don't care about the values of
> foo otherwise.

The way I set up this particular example, that's true, but suppose foo
had another column which the SELECT pulled into the output.  In that
case, the FK analysis wouldn't permit removing the join altogether,
but it would permit reordering it.  I think that:

A inner join B on Pab = A leftjoin B on Pab

...given that Pab is a set of equality constraints setting columns of
A equal to the columns of B to which they are mapped by a foreign key
constraint, and given further that at least one of these columns is
NOT NULL in A.

In some cases this can be a big win, because it means that this join
can commute with either inner joins or left joins (but once we commute
it with a left join it turns into a plain left join, and we can't go
back to handling it as an inner join).

...Robert