Re: Removing unneeded self joins

Поиск

Список

Период

Сортировка

От	David Rowley
Тема	Re: Removing unneeded self joins
Дата	14 марта 2019 г. 11:21:29
Msg-id	CAKJS1f8VV4YvsC_YSNHVtKfmqTAnrfcDoH21gVgAJNAFmwK6zA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Removing unneeded self joins (Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>)
Ответы	Re: Removing unneeded self joins
Список	pgsql-hackers

Дерево обсуждения

On Fri, 1 Mar 2019 at 03:09, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:
> New version is attached.

I started looking over v11 and I'm wondering why you really need to
know which unique index proved the join unique?

I removed that check and I see it causes the following to remove the self-join:

create unique index on t1(a);
create unique index on t1(b);
explain select * from t1 inner join t1 t2 on t1.a=t2.b;
                       QUERY PLAN
--------------------------------------------------------
 Seq Scan on t1 t2  (cost=0.00..38.25 rows=11 width=16)
   Filter: (a = b)
(2 rows)

I had thought that the loop in is_unique_self_join() would have
rejected this, but looking more closely it's only checking the list of
join quals matched to each index match each other. Shouldn't it be
testing that the expressions on either side of the OpExprs are the
same after aligning the varnos? At the moment you're testing
innerrel_is_unique() for either side of the join, if you made this
change then couldn't you just test one rel and look for a unique index
on the join quals that match both sides of the join at the same time?
Probably relation_has_unique_index_for() is not the right tool for
that job, so you might need to write another function that does the
additional checks. Maybe it would be possible to split out some of the
code into helper functions so you don't have to copy the bulk of it.

The reason I mention this is that rel_is_distinct_for() also
considered subqueries. This allows the LEFT JOIN removal code to
remove joins like:

SELECT t1.* FROM t1 LEFT JOIN (SELECT DISTINCT a FROM t2) t2 on t1.a=t2.a;

In this case, the DISTINCT clause was used for unique proofs, not a
unique index.

It does not seem unreasonable that someone one day might want to
extend the self join removal code to do the same, e.g:

SELECT * FROM t1 INNER JOIN (SELECT DISTINCT a FROM t1) t2 on t1.a = t2.a;

and you're not setting it up to be very easy to do that because you're
insisting the proofs are unique indexes only.

What do you think?

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Removing unneeded self joins