Re: Optimization idea: merging multiple EXISTS(...) with constraint-based join removal

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Optimization idea: merging multiple EXISTS(...) with constraint-based join removal
Дата
Msg-id CAKJS1f9iUXKnkUU6fv5QLgN74r33_cTpo+QCTxzHJ=CMkBqvdg@mail.gmail.com
обсуждение исходный текст
Ответ на Optimization idea: merging multiple EXISTS(...) with constraint-based join removal  (Frédéric TERRAZZONI <frederic.terrazzoni@gmail.com>)
Список pgsql-hackers
On 28 July 2015 at 09:37, Frédéric TERRAZZONI <frederic.terrazzoni@gmail.com> wrote:

    SELECT * FROM t1
    WHERE EXISTS(
        SELECT 1 FROM t2, t3, t4
        WHERE t2.id = t1.t2_id
        AND t3.id = t2.t3_id
        AND t4.id = t3.t4_id
        AND t4.val = 'XYZ'
    ) AND EXISTS(
        SELECT 1 FROM t2, t3, t5
        WHERE t2.id = t1.t2_id
        AND t3.id = t2.t3_id
        AND t5.id = t3.t5_id
        AND t5.val = 'Blablabla'
    ) AND EXISTS(
        SELECT 1 FROM t6
        WHERE t6.id = t1.t6_id
        AND t6.val = 'Hello'
    )

...
 

The resulting query is:

    SELECT * FROM t1
        WHERE EXISTS(
            SELECT 1 FROM t2 t2_a, t3 t3_a, t4 t4_a, t2 t2_b, t3 t3_b, t5, t6
            WHERE t2_a.id = t1.t2_id
            AND t3_a.id = t2_a.t3_id
            AND t4_a.id = t3_a.t4_id
            AND t4_a.val = 'XYZ'
            AND t2_b.id = t1.t2_id
            AND t3_b.id = t2_b.t3_id
            AND t5.id = t3_b.t5_id
            AND t5.val = 'Blablabla'
            AND t6.id = t1.t6_id
            AND t6.val = 'Hello'
        )

My questions are:
- Does PostgreSQL already supports this optimization ? Maybe it's not enabled in my case only?

No, there's nothing which supports that currently.
 
- If yes, is my reasoning incorrect ? Can you point me my mistake ?

It sounds reasonable to me.
 
- Otherwise is there any plan to add this optimization to PostgreSQL ?


I did propose the idea here http://www.postgresql.org/message-id/CAApHDvopmWq4i2BCf0VqU4mYmxzHCYwfnUMat9TWuKzdvo7=8w@mail.gmail.com but I didn't focus just with semi-joins. Without re-reading, I think I was talking about any join that could be proved to not duplicate rows could be "consolidated".

I do believe that this optimisation would be useful in more cases than most people might think, for example:

UPDATE t1 SET col1 = (SELECT col1 FROM t2 WHERE t1.id=t2.id), col2 = (SELECT col2 FROM t2 WHERE t1.id=t2.id), ...;

Of course, this query could have been written using UPDATE/FROM, (non-standard), or UPDATE t1 SET (col1,col2) = (SELECT ...), which was only added recently.

There's also the case of the view which just has 1 column missing, so the consumer joins a table that's already been joined to in the view.

I think it would be quite nice to have this, and I don't think it would be all that expensive for the planner to detect this.

I think the planner would have to do something like:

1. Scan simple_rte_array looking for relids which are the same as another entry's.
2. If found, is the join condition the same as the other one?
3. Is there a unique index to prove that joining to this does not duplicate rows, or is it a semi-join?
4. Remove relation and replace all Vars from the removed relation with the one from the other table, mark relation as REL_DEAD.

I think 1 is quite cheap to perform, so normal queries wouldn't suffer much of a slow-down from these extra checks, as most queries won't have self joins.

Are you thinking of working on it?

Regards

David Rowley

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: WIP: Make timestamptz_out less slow.
Следующее
От: Jim Nasby
Дата:
Сообщение: Re: Autonomous Transaction is back