Re: Design notes for EquivalenceClasses

Поиск
Список
Период
Сортировка
От Gavin Sherry
Тема Re: Design notes for EquivalenceClasses
Дата
Msg-id Pine.LNX.4.58.0701181052040.25798@linuxworld.com.au
обсуждение исходный текст
Ответ на Design notes for EquivalenceClasses  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Design notes for EquivalenceClasses  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Wed, 17 Jan 2007, Tom Lane wrote:

> strict.  However, we also allow equivalence clauses that appear below the
> nullable side of an outer join to form EquivalenceClasses; for these
> classes, the interpretation is that either all the values are equal, or
> all (except pseudo-constants) have gone to null.  (This requires a
> limitation that non-constant members be strict, else they might not go
> to null when the other members do.)  Consider for example
>
>     SELECT *
>       FROM a LEFT JOIN
>            (SELECT * FROM b JOIN c ON b.y = c.z WHERE b.y = 10) ss
>            ON a.x = ss.y
>       WHERE a.x = 42;
>
> We can form the below-outer-join EquivalenceClass {b.y c.z 10} and thereby
> apply c.z = 10 while scanning c.  (The reason we disallow outerjoin-delayed
> clauses from forming EquivalenceClasses is exactly that we want to be able
> to push any derived clauses as far down as possible.)  But once above the
> outer join it's no longer necessarily the case that b.y = 10, and thus we
> cannot use such EquivalenceClasses to conclude that sorting is unnecessary
> (see discussion of PathKeys below).  In this example, notice also that
> a.x = ss.y (really a.x = b.y) is not an equivalence clause because its
> applicability to b is restricted by the outer join; thus we do not make
> the mistake of concluding b.y = 42, even though we do have an equivalence
> class for {a.x 42}.

This seems to be correct. A nice improvement.

> With a little further thought, it becomes apparent that nestloop joins
> can also produce sorted output.  For example, if we do a nestloop join
> between outer relation A and inner relation B, then any pathkeys relevant
> to A are still valid for the join result: we have not altered the order of
> the tuples from A.  Even more interesting, if there was an equivalence clause
> A.X=B.Y, and A.X was a pathkey for the outer relation A, then we can assert
> that B.Y is a pathkey for the join result; X was ordered before and still
> is, and the joined values of Y are equal to the joined values of X, so Y
> must now be ordered too.  This is true even though we used neither an
> explicit sort nor a mergejoin on Y.  (Note: hash joins cannot be counted
> on to preserve the order of their outer relation, because the executor
> might decide to "batch" the join, so we always set pathkeys to NIL for
> a hashjoin path.)  Exception: a RIGHT or FULL join doesn't preserve the
> ordering of its outer relation, because it might insert nulls at random
> points in the ordering.

I was thinking about this, but in relation to hash joins. A hash join
cannot be guaranteed to produce output sorted according to the pathkey of
the outer relation (as explained in the existing README). I wonder,
however, if it might be useful for hash join to pass a hint that the
output is known ordered (i.e., the join was not split into multiple
batches).

Thanks,

Gavin


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: [COMMITTERS] pgsql: Implement width_bucket() for the
Следующее
От: Jeff Davis
Дата:
Сообщение: Re: Function execution costs 'n all that