Обсуждение: EquivalenceClasses vs volatile functions

Поиск
Список
Период
Сортировка

EquivalenceClasses vs volatile functions

От
Tom Lane
Дата:
Awhile back there was some discussion about how the system assumes that
syntactically equal sort expressions can be considered interchangeable,
which falls down if volatile functions are involved:
http://archives.postgresql.org/pgsql-general/2006-11/msg01523.php
I seem to recall the issue coming up again more recently, but can't
find the thread in the archives right now.  Basically the question
is what to do with examples like these:
select random() from foo order by random();select random() from foo order by 1;select random() as a, random() as b from
fooorder by a, b;select random() as a, random() as b from foo order by b, a;
 

In CVS HEAD, because of syntactic matching of ORDER BY expressions,
all four of these effectively order by the first output column.
This is arguably the wrong thing for the first example, in which you
might expect that random() is called separately to generate the sorting
key.  And it definitely seems the wrong thing for the last one.

What's more, while we have always treated the first two examples alike,
a quick check shows that 7.3 through 8.2 all treat the last example as
ordering primarily by the second output column.  Compare 8.2

regression=# select random() as a, random() as b from int4_tbl order by b, a;        a         |         b         
-------------------+-------------------0.699889197014272 | 0.1487481608055530.302036631386727 |
0.5015847636386750.927149619441479|  0.536219729576260.919696403201669 | 0.6953000994399190.249512315262109 |
0.709584747441113
(5 rows)

vs HEAD

regression=# select random() as a, random() as b from int4_tbl order by b, a;        a         |         b          
-------------------+--------------------0.224498846102506 |  0.789146402385086 0.47139244293794 |
0.08640268212184310.554711272474378|  0.1834706445224580.599514745175838 |  0.7111700745299460.895382364280522 |
0.0681726015172899
(5 rows)

So that seems like a regression that needs to be fixed.  The difficulty
is that, while we understand that EquivalenceClasses involving volatile
functions aren't identical even if the function calls are syntactically
equal, we don't have any way to figure out which targetlist entry is the
matching one for a particular PathKey.  We currently match on equal
expression trees (cf make_sort_from_pathkeys) and thus we get the above
behavior where the first matching column is what gets sorted on.

What I'm thinking of doing is adding a field to EquivalenceClass that
carries the ressortgroupref of the originating ORDER BY key's targetlist
entry, in the case where the EquivalenceClass came from building a
PathKey for ORDER BY.  Then, if the EquivalenceClass is ec_volatile,
we insist on matching that rather than matching the expression tree.
This wouldn't affect the behavior for ordinary non-volatile sort keys,
for which expression equivalence is valid.

If we simply do that then cases 1 and 2 will continue to be treated
the same, because the parser converts case 1 to case 2 based on
syntactic equality of the two random() calls.  I'm unsure whether
changing that is a good idea.  It's always worked that way, and so
we'd risk breaking some applications.  I'm also a bit uncomfortable
with having the parser make semantic choices based on
potentially-changeable attributes like function volatility.  If we
wanted cases 1 and 2 to behave differently, it'd be better to have
the parser always generate resjunk columns for ORDER BY expressions,
and get the planner to collapse out columns that are redundant.
But that seems too big a change to contemplate for 8.3.

Comments?
        regards, tom lane


Re: EquivalenceClasses vs volatile functions

От
Gregory Stark
Дата:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

>     select random() as a, random() as b from foo order by b, a;
>
> And it definitely seems the wrong thing for the last one.

Ouch.

> What I'm thinking of doing is adding a field to EquivalenceClass that
> carries the ressortgroupref of the originating ORDER BY key's targetlist
> entry, in the case where the EquivalenceClass came from building a
> PathKey for ORDER BY.  Then, if the EquivalenceClass is ec_volatile,
> we insist on matching that rather than matching the expression tree.
> This wouldn't affect the behavior for ordinary non-volatile sort keys,
> for which expression equivalence is valid.

What if the equivalence class is in more than one place in the ORDER BY? 

I suppose GROUP BY has the same problem?

And what would happen when you pull up subqueries?

> If we wanted cases 1 and 2 to behave differently, it'd be better to have the
> parser always generate resjunk columns for ORDER BY expressions, and get the
> planner to collapse out columns that are redundant. But that seems too big a
> change to contemplate for 8.3.

That clearly sounds right. 

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about
EnterpriseDB'sPostgreSQL training!
 


Re: EquivalenceClasses vs volatile functions

От
Tom Lane
Дата:
Gregory Stark <stark@enterprisedb.com> writes:
> What if the equivalence class is in more than one place in the ORDER BY? 

That's not a problem.  ORDER BY 1,1 is not different from ORDER BY 1,
even if the referenced expression is volatile, because the secondary
sort key is useless anyway.

> And what would happen when you pull up subqueries?

I think I know how to deal with that.  The worst case would be that
select * from (select random() as a from foo order by a) ssorder by ss.a

would incur two sorts instead of one because we fail to acknowledge the
subquery's sort key as being the same thing as the upper query's.  It
*should* be possible to avoid that, but if it gets too complicated I'll
leave it on the table for 8.4.
        regards, tom lane