Re: [HACKERS] PoC: full merge join on comparison clause

Поиск
Список
Период
Сортировка
От Alexander Kuzmenkov
Тема Re: [HACKERS] PoC: full merge join on comparison clause
Дата
Msg-id 22906fd3-6f56-2473-9897-21a9532e6da3@postgrespro.ru
обсуждение исходный текст
Ответ на Re: [HACKERS] PoC: full merge join on comparison clause  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Ответы Re: [HACKERS] PoC: full merge join on comparison clause  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Список pgsql-hackers

Hi Ashutosh,

Thanks for the review.

Jeff, I'm copying you because this is relevant to our discussion about what to do with mergeopfamilies when adding new merge join types.

You have renamed RestrictInfo member mergeopfamilies as
equivopfamilies. I don't think that's a good name; it doesn't convey
that these are opfamilies containing merge operators. The changes in
check_mergejoinable() suggest that an operator may act as equality
operator in few operator families and comparison operator in others.
That looks odd. Actually an operator family contains operators other
than equality operators, so you may want to retain this member and add
a new member to specify whether the clause is an equality clause or
not.

For mergeopfamilies, I'm not sure what is the best thing to do. I'll try to explain my understanding of the situation, please correct me if I'm wrong.

Before the patch, mergeopfamilies was used for two things: creating equivalence classes and performing merge joins.

For equivalence classes: we look at the restriction clauses, and if they have mergeopfamilies set, it means that these clause are based on an equality operator, and the left and right variables must be equal. To record this fact, we create an equivalence class. The variables might be equal for one equality operator and not equal for another, so we record the particular operator families to which our equality operator belongs.

For merge join: we look at the join clauses, and if they have mergeopfamilies set, it means that these clauses are based on an equality operator, and we can try performing this particular join as merge join. These opfamilies are also used beforehand to create the equivalence classes for left and right variables. The equivalence classes are used to match the join clauses to pathkeys describing the ordering of join inputs.

So, if we want to start doing merge joins for operators other than equality, we still need to record their opfamilies, but only use them for the second case and not the first. I chose to put these opfamilies to different variables, and
name the one used for equivalence classes 'equivopfamilies' and the one used for merge joins 'mergeopfamilies'. The equality operators are used for both cases, so we put their opfamilies into both of these variables.

I agree this might look confusing. Indeed, we could keep a single variable for opfamilies, and add separate flags that show how they can be used, be that for equivalence classes, merge joins, range joins or some combination of them. This is similar to what Jeff did in his range merge join patch [1]. I will think more about this and try to produce an updated patch.


In mergejoinscansel() you have just removed Assert(op_strategy ==
BTEqualStrategyNumber); Probably this function is written considering
on equality operators. But now that we are using this for all other
operators, we will need more changes to this function. That may be the
reason why INNER join in your earlier example doesn't choose right
costing.

I changed mergejoinscansel() slightly to reflect the fact that the inner relation is scanned from the beginning if we have an inequality merge clause.


The comment change in final_cost_mergejoin() needs more work. n1, n2,
n3 are number of rows on inner side with values 1, 2, 3 resp. So n1 +
n2 + n3 + ... = size of inner relation is correct. In that context I
am not able to understand your change
+    * If the merge clauses contain inequality, (n1 + n2 + ...) ~=
+    * (size of inner relation)^2.

I extended the comment in final_cost_mergejoin(). Not sure if that approximation makes any sense, but this is the best I could think of.

Style problems are fixed.

Attached please find the new version of the patch that addresses all the review comments except mergeopfamilies.

The current commitfest is ending, but I'd like to continue working on this patch, so I am moving it to the next one.


[1] https://www.postgresql.org/message-id/flat/CAMp0ubfwAFFW3O_NgKqpRPmm56M4weTEXjprb2gP_NrDaEC4Eg%40mail.gmail.com#CAMp0ubfwAFFW3O_NgKqpRPmm56M4weTEXjprb2gP_NrDaEC4Eg@mail.gmail.com
-- 
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Maksim Milyutin
Дата:
Сообщение: Re: [HACKERS] Partitions: \d vs \d+
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: [HACKERS] Domains and arrays and composites, oh my