Re: Performance improvement for joins where outer side is unique
От | Simon Riggs |
---|---|
Тема | Re: Performance improvement for joins where outer side is unique |
Дата | |
Msg-id | CANP8+j+vxfEf3dqhnzKAnDopnnUM-Zjb90gNT=haEQNQePj9Mw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Performance improvement for joins where outer side is unique (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Ответы |
Re: Performance improvement for joins where outer side is unique
(David Rowley <david.rowley@2ndquadrant.com>)
|
Список | pgsql-hackers |
On 17 December 2015 at 00:17, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
--
I'd be very surprised it that had any measurable impact.1) nodeHashjoin.c (and other join nodes)
I've noticed we have this in the ExecHashJoin() method:
/*
* When the inner side is unique or we're performing a
* semijoin, we'll consider returning the first match, but
* after that we're done with this outer tuple.
*/
if (node->js.unique_inner)
node->hj_JoinState = HJ_NEED_NEW_OUTER;
That seems a bit awkward because the comment speaks about unique
joins *OR* semijoins, but the check only references unique_inner.
That of course works because we do this in ExecInitHashJoin():
switch (node->join.jointype)
{
case JOIN_SEMI:
hjstate->js.unique_inner = true;
/* fall through */
Either some semijoins may not be unique - in that case the naming is
misleading at best, and we either need to find a better name or
simply check two conditions like this:
if (node->js.unique_inner || node->join.type == JOIN_SEMI)
node->hj_JoinState = HJ_NEED_NEW_OUTER;
Or all semijoins are unique joins, and in that case the comment may
need rephrasing. But more importantly, it begs the question why
we're detecting this in the executor and not in the planner? Because
if we detect it in executor, we can't use this bit in costing, for
example.
The reason not to detect that in the planner is simply that unique_join
is meant to mean that the relation on the inner side of the join will at
most contain a single Tuple which matches each outer relation tuple,
based on the join condition. I've added no detection for this in semi
joins, and I'd rather not go blindly setting the flag to true in the
planner as it simply may not be true for the semi join. At the moment
that might not matter as we're only using the unique_join flag as an
optimization in the join nodes, but I'd prefer not to do this as its
likely we'll want to do more with this flag later, and I'd rather we
keep the meaning well defined. You might argue that this is also true
for semi joins, but if down the road somewhere we want to perform some
checks on the inner relation before the join takes place, and in that
case the Tuples of the relation might not have the same properties we
claim they do.
But you're right that reusing the flag in the join nodes is not ideal,
and the comment is not that great either. I'd really rather go down the
path of either renaming the variable, or explaining this better in the
comment. It seems unnecessary to check both for each tuple being joined.
I'd imagine that might add a little overhead to joins which are not unique.How about changing the comment to:
/*
* In the event that the inner side has been detected to be
* unique, as an optimization we can skip searching for any
* subsequent matching inner tuples, as none will exist.
* For semijoins unique_inner will always be true, although
* in this case we don't match another inner tuple as this
* is the required semi join behavior.
*/
Alternatively or additionally we can rename the variable in the executor
state, although I've not thought of a name yet that I don't find overly
verbose: unique_inner_or_semi_join, match_first_tuple_only.
I'd go with match_first_tuple_only.
+1
unique_inner is a state that has been detected, match_first_tuple_only is the action we take as a result.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: