Re: Removing unneeded self joins

Поиск
Список
Период
Сортировка
От Alexander Kuzmenkov
Тема Re: Removing unneeded self joins
Дата
Msg-id 23f6cf6d-32ca-faa2-f4ec-da690ba42482@postgrespro.ru
обсуждение исходный текст
Ответ на Re: Removing unneeded self joins  (David Rowley <david.rowley@2ndquadrant.com>)
Ответы Re: Removing unneeded self joins  (Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>)
Список pgsql-hackers
El 08/11/18 a las 08:59, David Rowley escribió:
On 19 October 2018 at 01:47, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:
Here is a version that compiles.
I had a quick read through this and I think its missing about a 1-page
comment section detailing when we can and when we cannot remove these
self joins, and what measures we must take when we do remove them.

I added some explanation to the comment for remove_useless_joins. This is probably still not clear enough, so if you have any particular questions I'll cover them too. While improving the comments, I found some bugs around the handling of join clauses and broken ECs, so I fixed them and added the tests.


Apart from that, I noted the following during my read:

1. I don't think this is the right way to do this. There are other
places where we alter the varnoold. For example:
search_indexed_tlist_for_var(). So you should likely be doing that too
rather than working around it.

Fixed.


2. Surely the following loop is incorrect:

for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
{
int attno = i - toKeep->min_attr;
toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
toRemove->attr_needed[attno]);
}

What if toRemove has a lower min_attr or higher max_attr?

This shouldn't happen because this is always the same relation, and max_attr is its physical number of attributes. There is an assertion about this in remove_self_joins_one_group:
            /* A sanity check: the relations have the same Oid. */
            Assert(root->simple_rte_array[relids[i]]->relid == root->simple_rte_array[relids[o]]->relid);



3. "wind" -> "find"

+ * When we wind such a join, we mark one of the participating relation as

Fixed.


4. I think the following shouldn't be happening:

+------------------------------------------------ Result   One-Time Filter: false
-(2 rows)
+   ->  Index Scan using parent_pkey on parent x
+         Index Cond: (k = 1)
+(4 rows)

This happens because for join rels, we make some effort to prove that they are empty and not make any paths for them, and we don't do this for base rels. When we remove the join, this difference is exposed. Compare to this query:

postgres=# explain select * from parent where k = 1 and k = 2;
                                   QUERY PLAN                                  
────────────────────────────────────────────────────────────────────────────────
 Result  (cost=0.15..8.17 rows=1 width=8)
   One-Time Filter: false
   ->  Index Scan using parent_pkey on parent  (cost=0.15..8.17 rows=1 width=8)
         Index Cond: (k = 1)
(4 rows)


5. I'd have thought the opposite. Surely there are more chances of
this being useful with more joins?

+ /* Limit the number of joins we process to control the quadratic behavior. */
+ if (n > join_collapse_limit)
+ break;

That is true, but we also have to think about the overhead when we don't find any joins to remove. Without this cutoff, we have to examine every pair of self-joins, so the run time grows quadratically with the number of such joins in the query. I don't have a better idea on how to control this.


6. In remove_self_joins_one_level() I think you should collect the
removed relations in a Relids rather than a list.

Done.

-- 
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Banck
Дата:
Сообщение: Re: Online verification of checksums
Следующее
От: Alexander Kuzmenkov
Дата:
Сообщение: Re: [HACKERS] PoC: full merge join on comparison clause