Re: Removing unneeded self joins

Поиск

Список

Период

Сортировка

От	David Rowley
Тема	Re: Removing unneeded self joins
Дата	23 сентября 2020 г. 07:23:27
Msg-id	CAApHDvpggnFMC4yP-jUO7PKN=fXeErW5bOxisvJ0HvkHQEY=Ww@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Removing unneeded self joins (Andrey Lepikhov <a.lepikhov@postgrespro.ru>)
Ответы	Re: Removing unneeded self joins
Список	pgsql-hackers

Дерево обсуждения

On Fri, 3 Apr 2020 at 17:43, Andrey Lepikhov <a.lepikhov@postgrespro.ru> wrote:
> v.23 in attachment:

Hi,

This is only a partial review as I see the patch still does not
incorporate the self join detection method I mentioned in March 2019
and the one the patch only partially works.

Here's the partial review which contains the details:

1. Change to aset.c not relevant to this patch.


--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -1485,7 +1485,6 @@ AllocSetCheck(MemoryContext context)

  chsize = chunk->size; /* aligned chunk size */
  dsize = chunk->requested_size; /* real data */
-
  /*
  * Check chunk size
  */


2. Why GUC_NOT_IN_SAMPLE?

+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_NO_SHOW_ALL | GUC_NOT_IN_SAMPLE

3. I'd expect this to remove the join.

create table t1 (
  a int not null unique,
  b int not null unique,
  c int not null unique,
  d int not null,
  e int not null
);

explain select * from t1 inner join t1 t2 on t1.a=t2.a and t1.b=t2.c;
                              QUERY PLAN
----------------------------------------------------------------------
 Hash Join  (cost=52.50..88.42 rows=1 width=40)
   Hash Cond: ((t1.a = t2.a) AND (t1.b = t2.c))
   ->  Seq Scan on t1  (cost=0.00..27.00 rows=1700 width=20)
   ->  Hash  (cost=27.00..27.00 rows=1700 width=20)
         ->  Seq Scan on t1 t2  (cost=0.00..27.00 rows=1700 width=20)
(5 rows)

This one seems to work.

This one *does* seem to work.

explain select * from t1 inner join t1 t2 on t1.a=t2.a and t1.d=t2.e;

You should likely implement the method I wrote in the final paragraph in [1]

The basic idea here is you first process the join quals:

t1.a=t2.a and t1.b=t2.c

and keep just the ones that use the same varattno on either side
(t1.a=t2.a). Perhaps it's not worth thinking about Exprs for now, or
if you think it is you can normalise the varnos to 1 and equal()

Then just pass the remaining quals to relation_has_unique_index_for().
If it returns true then there's no need to perform the opposite check
for the other relation. It would just return the same thing.

This will allow you to get rid of using the following as proofs:

/* We must have the same unique index for both relations. */
if (outerinfo->index->indexoid != innerinfo->index->indexoid)
return false;

... as I've shown above.  This only works in some cases and that's not
really good enough.

Doing thing the way I describe will allow you to get rid of all the
UniqueRelInfo stuff.

4. Should be ok for partitioned tables though:

/*
* This optimization won't work for tables that have inheritance
* children.
*/
if (rte->inh)
continue;

David

[1] https://www.postgresql.org/message-id/CAKJS1f8p-KiEujr12k-oa52JNWWaQUjEjNg%2Bo1MGZk4mHBn_Rg%40mail.gmail.com

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Thomas Munro
Дата: 23 сентября 2020 г., 07:11:42
Сообщение: Re: Syncing pg_multixact directories

Следующее

От: "k.jamison@fujitsu.com"
Дата: 23 сентября 2020 г., 07:23:29
Сообщение: RE: [Patch] Optimize dropping of relation buffers using dlist

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Removing unneeded self joins

Предыдущее

Следующее