Обсуждение: Join optimization

Поиск
Список
Период
Сортировка

Join optimization

От
luis.roberto@siscobra.com.br
Дата:
Hi!

Recently on a blogpost [1] I saw that Oracle was able to "optimize" a join strategy by completely eliminating access to a table.

Heres the execution on Oracle 18c [2] and PostgreSQL 13 (beta) [3].

Is there a fundamental reason why PG can't do the same? 

Thanks!
 

Re: Join optimization

От
Fabrízio de Royes Mello
Дата:


Em sáb, 11 de jul de 2020 às 14:20, <luis.roberto@siscobra.com.br> escreveu:
Hi!

Recently on a blogpost [1] I saw that Oracle was able to "optimize" a join strategy by completely eliminating access to a table.

Heres the execution on Oracle 18c [2] and PostgreSQL 13 (beta) [3].

Is there a fundamental reason why PG can't do the same? 

It does... did you see the “never executed” notice on the Postgres explain output?

Regards,


--
   Fabrízio de Royes Mello         Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: Join optimization

От
luis.roberto@siscobra.com.br
Дата:
I'm sorry for the bad example.


I don't understand oracle's execution plan very well, but it doesn't seem to be hitting the Users table...


De: "Fabrízio de Royes Mello" <fabrizio@timbira.com.br>
Para: "luis.roberto" <luis.roberto@siscobra.com.br>
Cc: "pgsql-general" <pgsql-general@lists.postgresql.org>
Enviadas: Sábado, 11 de julho de 2020 15:24:04
Assunto: Re: Join optimization


Em sáb, 11 de jul de 2020 às 14:20, <luis.roberto@siscobra.com.br> escreveu:
Hi!

Recently on a blogpost [1] I saw that Oracle was able to "optimize" a join strategy by completely eliminating access to a table.

Heres the execution on Oracle 18c [2] and PostgreSQL 13 (beta) [3].

Is there a fundamental reason why PG can't do the same? 

It does... did you see the “never executed” notice on the Postgres explain output?

Regards,


--
   Fabrízio de Royes Mello         Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: Join optimization

От
David Rowley
Дата:
On Sun, 12 Jul 2020 at 06:59, <luis.roberto@siscobra.com.br> wrote:
>
> I'm sorry for the bad example.
>
> Here is another, with some data on PG:
https://dbfiddle.uk/?rdbms=postgres_13&fiddle=ccfd1c4fa291e74a6db9db1772e2b5ac and Oracle:
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=21a98f499065ad4e2c35ff4bd1487e14.

I believe what you're talking about is join removals.  It appears as
though Oracle is able to remove the inner join to the users table as
the join only serves to check the user record exists. No columns are
being selected.  The record must exist due to the foreign key
referencing users.

PostgreSQL currently can only remove left joins. Likely what you could
do here is just change your inner join into a left join. If you're
happy enough that the user record will always exist then that should
allow it to work.

The reason PostgreSQL does not currently support inner join is that by
default, foreign key constraints are only triggered at the end of the
query, (or if deferred, at the end of the transaction). WIth
PostgreSQL, it's possible for a SELECT query to see a violated foreign
key constraint.  This can happen if your SELECT query calls a function
which updates a referenced record.  The cascade of the foreign key
won't occur until the end of the statement, so the select may stumble
upon a violated foreign key.

Here's a quick example of this case:
drop table t1,t2;
create table t1 (a int primary key);
create table t2 (a int references t1 on update cascade);

insert into t1 values(1),(2);
insert into t2 values(2),(2);
create or replace function update_t1 (p_a int) returns int as $$ begin
update t1 set a = a + 1 where a = p_a; return p_a; end; $$ language
plpgsql volatile;

-- in theory, this should never return any rows as we're asking for
all rows that
-- don't exist in the referenced table. However, we do get a result
since the function
-- call updates t1 setting the row with a=2 to a=3. The cascade to t2
does not occur
-- until the end of the statement.
select update_t1(a+1),t1.a from t1 where not exists(select 1 from t2
where t1.a=t2.a);
 update_t1 | a
-----------+---
         2 | 1
(1 row)

If you're happy that you don't have any functions like that which
could momentarily cause the foreign key to appear violated, then there
shouldn't be any harm in changing the INNER JOIN on users to a LEFT
JOIN.  PostgreSQL will be able to remove the join in that case.

David



Re: Join optimization

От
luis.roberto@siscobra.com.br
Дата:
----- Mensagem original -----
De: "David Rowley" <dgrowleyml@gmail.com>
Para: "luis.roberto" <luis.roberto@siscobra.com.br>
Cc: "Fabrízio de Royes Mello" <fabrizio@timbira.com.br>, "pgsql-general" <pgsql-general@lists.postgresql.org>
Enviadas: Domingo, 12 de julho de 2020 5:29:08
Assunto: Re: Join optimization

On Sun, 12 Jul 2020 at 06:59, <luis.roberto@siscobra.com.br> wrote:
>
> I'm sorry for the bad example.
>
> Here is another, with some data on PG:
https://dbfiddle.uk/?rdbms=postgres_13&fiddle=ccfd1c4fa291e74a6db9db1772e2b5ac and Oracle:
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=21a98f499065ad4e2c35ff4bd1487e14.

I believe what you're talking about is join removals.  It appears as
though Oracle is able to remove the inner join to the users table as
the join only serves to check the user record exists. No columns are
being selected.  The record must exist due to the foreign key
referencing users.

PostgreSQL currently can only remove left joins. Likely what you could
do here is just change your inner join into a left join. If you're
happy enough that the user record will always exist then that should
allow it to work.

The reason PostgreSQL does not currently support inner join is that by
default, foreign key constraints are only triggered at the end of the
query, (or if deferred, at the end of the transaction). WIth
PostgreSQL, it's possible for a SELECT query to see a violated foreign
key constraint.  This can happen if your SELECT query calls a function
which updates a referenced record.  The cascade of the foreign key
won't occur until the end of the statement, so the select may stumble
upon a violated foreign key.

Here's a quick example of this case:
drop table t1,t2;
create table t1 (a int primary key);
create table t2 (a int references t1 on update cascade);

insert into t1 values(1),(2);
insert into t2 values(2),(2);
create or replace function update_t1 (p_a int) returns int as $$ begin
update t1 set a = a + 1 where a = p_a; return p_a; end; $$ language
plpgsql volatile;

-- in theory, this should never return any rows as we're asking for
all rows that
-- don't exist in the referenced table. However, we do get a result
since the function
-- call updates t1 setting the row with a=2 to a=3. The cascade to t2
does not occur
-- until the end of the statement.
select update_t1(a+1),t1.a from t1 where not exists(select 1 from t2
where t1.a=t2.a);
 update_t1 | a
-----------+---
         2 | 1
(1 row)

If you're happy that you don't have any functions like that which
could momentarily cause the foreign key to appear violated, then there
shouldn't be any harm in changing the INNER JOIN on users to a LEFT
JOIN.  PostgreSQL will be able to remove the join in that case.

David

-----

Thanks for the reply David! I understand it better now.