Re: Removing unneeded self joins

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Removing unneeded self joins
Дата
Msg-id CAApHDvpggnFMC4yP-jUO7PKN=fXeErW5bOxisvJ0HvkHQEY=Ww@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Removing unneeded self joins  (Andrey Lepikhov <a.lepikhov@postgrespro.ru>)
Ответы Re: Removing unneeded self joins
Список pgsql-hackers
On Fri, 3 Apr 2020 at 17:43, Andrey Lepikhov <a.lepikhov@postgrespro.ru> wrote:
> v.23 in attachment:

Hi,

This is only a partial review as I see the patch still does not
incorporate the self join detection method I mentioned in March 2019
and the one the patch only partially works.

Here's the partial review which contains the details:

1. Change to aset.c not relevant to this patch.


--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -1485,7 +1485,6 @@ AllocSetCheck(MemoryContext context)

  chsize = chunk->size; /* aligned chunk size */
  dsize = chunk->requested_size; /* real data */
-
  /*
  * Check chunk size
  */


2. Why GUC_NOT_IN_SAMPLE?

+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_NO_SHOW_ALL | GUC_NOT_IN_SAMPLE

3. I'd expect this to remove the join.

create table t1 (
  a int not null unique,
  b int not null unique,
  c int not null unique,
  d int not null,
  e int not null
);

explain select * from t1 inner join t1 t2 on t1.a=t2.a and t1.b=t2.c;
                              QUERY PLAN
----------------------------------------------------------------------
 Hash Join  (cost=52.50..88.42 rows=1 width=40)
   Hash Cond: ((t1.a = t2.a) AND (t1.b = t2.c))
   ->  Seq Scan on t1  (cost=0.00..27.00 rows=1700 width=20)
   ->  Hash  (cost=27.00..27.00 rows=1700 width=20)
         ->  Seq Scan on t1 t2  (cost=0.00..27.00 rows=1700 width=20)
(5 rows)

This one seems to work.

This one *does* seem to work.

explain select * from t1 inner join t1 t2 on t1.a=t2.a and t1.d=t2.e;

You should likely implement the method I wrote in the final paragraph in [1]

The basic idea here is you first process the join quals:

t1.a=t2.a and t1.b=t2.c

and keep just the ones that use the same varattno on either side
(t1.a=t2.a). Perhaps it's not worth thinking about Exprs for now, or
if you think it is you can normalise the varnos to 1 and equal()

Then just pass the remaining quals to relation_has_unique_index_for().
If it returns true then there's no need to perform the opposite check
for the other relation. It would just return the same thing.

This will allow you to get rid of using the following as proofs:

/* We must have the same unique index for both relations. */
if (outerinfo->index->indexoid != innerinfo->index->indexoid)
return false;

... as I've shown above.  This only works in some cases and that's not
really good enough.

Doing thing the way I describe will allow you to get rid of all the
UniqueRelInfo stuff.

4. Should be ok for partitioned tables though:

/*
* This optimization won't work for tables that have inheritance
* children.
*/
if (rte->inh)
continue;

David

[1] https://www.postgresql.org/message-id/CAKJS1f8p-KiEujr12k-oa52JNWWaQUjEjNg%2Bo1MGZk4mHBn_Rg%40mail.gmail.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: Syncing pg_multixact directories
Следующее
От: "k.jamison@fujitsu.com"
Дата:
Сообщение: RE: [Patch] Optimize dropping of relation buffers using dlist