Re: Performance improvement for joins where outer side is unique

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Performance improvement for joins where outer side is unique
Дата
Msg-id 56718B15.6060507@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Performance improvement for joins where outer side is unique  (David Rowley <david.rowley@2ndquadrant.com>)
Ответы Re: Performance improvement for joins where outer side is unique  (David Rowley <david.rowley@2ndquadrant.com>)
Список pgsql-hackers
Hi,

On 12/16/2015 01:27 AM, David Rowley wrote:
>
> I've attached a rebased patch against current master as there were
> some conflicts from the recent changes to LATERAL join.

Thanks. I've looked at the rebased patch and have a few minor comments.

0) I know the patch does not tweak costing - any plans in this   direction? Would it be possible to simply use the
costingused by   semijoin?
 

1) nodeHashjoin.c (and other join nodes)

I've noticed we have this in the ExecHashJoin() method:
 /*  * When the inner side is unique or we're performing a  * semijoin, we'll consider returning the first match, but
*after that we're done with this outer tuple.  */ if (node->js.unique_inner)     node->hj_JoinState =
HJ_NEED_NEW_OUTER;

That seems a bit awkward because the comment speaks about unique joins 
*OR* semijoins, but the check only references unique_inner. That of 
course works because we do this in ExecInitHashJoin():
switch (node->join.jointype){    case JOIN_SEMI:        hjstate->js.unique_inner = true;        /* fall through */

Either some semijoins may not be unique - in that case the naming is 
misleading at best, and we either need to find a better name or simply 
check two conditions like this:
 if (node->js.unique_inner || node->join.type == JOIN_SEMI)    node->hj_JoinState = HJ_NEED_NEW_OUTER;

Or all semijoins are unique joins, and in that case the comment may need 
rephrasing. But more importantly, it begs the question why we're 
detecting this in the executor and not in the planner? Because if we 
detect it in executor, we can't use this bit in costing, for example.

FWIW the misleading naming was actually mentioned in the thread by 
Horiguchi-san, but I don't see any response to that (might have missed 
it though).

2) analyzejoins.c

I see that this code in join_is_removable()
    innerrel = find_base_rel(root, innerrelid);
    if (innerrel->reloptkind != RELOPT_BASEREL)        return false;

was rewritten like this:
    innerrel = find_base_rel(root, innerrelid);
    Assert(innerrel->reloptkind == RELOPT_BASEREL);

That suggests that previously the function expected cases where 
reloptkind may not be RELOPT_BASEREL, but now it'll error out int such 
cases. I haven't noticed any changes in the surrounding code that'd 
guarantee this won't happen, but I also haven't been able to come up 
with an example triggering the assert (haven't been trying too hard). 
How do we know the assert() makes sense?


3) joinpath.c

- either "were" or "been" seems redundant

/* left joins were already been analyzed for uniqueness in 
mark_unique_joins() */


4) analyzejoins.c

comment format broken

/* * mark_unique_joinsAnalyze joins in order to determine if their inner side is unique basedon the join condition. */

5) analyzejoins.c

missing line before the comment

}
/* * rel_supports_distinctness *        Returns true if rel has some properties which can prove the relation *
tobe unique over some set of columns. *
 

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stas Kelvich
Дата:
Сообщение: Re: Cube extension kNN support
Следующее
От: Tom Lane
Дата:
Сообщение: Re: fix for readline terminal size problems when window is resized with open pager