Обсуждение: DISTINCT and ORDER BY bug?

Поиск

Список

Период

Сортировка

DISTINCT and ORDER BY bug?

От

Don Baccus

Дата:

07 февраля 2000 г., 01:07:32

The following used to work in 6.5, works in Oracle, and is
very useful:

donb=# create table foo(c varchar);
CREATE
donb=# insert into foo values('abc');
INSERT 72649 1

donb=# select distinct c from foo order by upper(c);
ERROR:  For SELECT DISTINCT, ORDER BY expressions must appear in target list
donb=# 

In other words, we want to order ignoring case - in this case, users
within the Ars Digita Community system.  We want don baccus to appear
next to Joe Blow rather than following Xena Xenophoba.

Is this now refused because it is non-standard?  It seems a pity...

Of course, one can do "select distinct c, upper(c) as ignore ..."

but that forces the return of more data, so is slower, etc...

BTW the very fact that my testing of our partial port of this web
toolkit under V7 pre-beta has gotten this far is a very good sign.

Among other things, it makes heavy (if simple) use of referential
integrity, which has already uncovered two bugs in the port that
I've fixed.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.

Re: [HACKERS] DISTINCT and ORDER BY bug?

От

Tom Lane

Дата:

07 февраля 2000 г., 03:26:32

Don Baccus <dhogaza@pacifier.com> writes:
> The following used to work in 6.5, works in Oracle, and is
> very useful:

> donb=# select distinct c from foo order by upper(c);
> ERROR:  For SELECT DISTINCT, ORDER BY expressions must appear in target list

Well, it's not a bug --- it was an entirely deliberate change.  It
might be a misfeature though.  The case we were concerned about was
select distinct x from foo order by y;

which produces ill-defined results.  If I recall the thread correctly,
Oracle and a number of other DBMSs reject this.  I think your point is
that
select distinct x from foo order by f(x);

*is* well-defined, and useful.  I think you are right, but how
far should we go in detecting common subexpressions?  You might
want to contemplate the difference in these examples:
select distinct sin(x) from foo order by abs(sin(x));
select distinct random(x) from foo order by abs(random(x));

It would be interesting to poke at Oracle to find out just what they
consider a legitimate ORDER BY expression for a SELECT DISTINCT.
        regards, tom lane

Re: [HACKERS] DISTINCT and ORDER BY bug?

От

Taral

Дата:

07 февраля 2000 г., 04:13:34

On Mon, 7 Feb 2000, Tom Lane wrote:

> Well, it's not a bug --- it was an entirely deliberate change.  It
> might be a misfeature though.  The case we were concerned about was
> 
>     select distinct x from foo order by y;
> 
> which produces ill-defined results. 

Okay, I can understand this...

>     select distinct sin(x) from foo order by abs(sin(x));
> 
>     select distinct random(x) from foo order by abs(random(x));

The thing here is that random() is not deterministic on its inputs,
whereas sin() is. Perhaps we should only allow fully deterministic ORDER
BY? (Ugh, another flag for functions...)

Taral

Re: [HACKERS] DISTINCT and ORDER BY bug?

От

Don Baccus

Дата:

07 февраля 2000 г., 04:15:34

At 12:26 AM 2/7/00 -0500, Tom Lane wrote:

>Well, it's not a bug --- it was an entirely deliberate change.  It
>might be a misfeature though.

Ahhh...getting subtle, are we? :)

>  The case we were concerned about was
>
>    select distinct x from foo order by y;

Yes...I remember some discussion regarding this.

>which produces ill-defined results.  If I recall the thread correctly,
>Oracle and a number of other DBMSs reject this.  I think your point is
>that

>    select distinct x from foo order by f(x);

>*is* well-defined, and useful.  I think you are right, but how
>far should we go in detecting common subexpressions?

Not sure...having not been into that part of the code (and busy at
the moment testing my rewrites of small portions of RI trigger
code I rewrote at Jan's request, after our "dispute" [which was more
or less "I'm 50% certain you're right!" "No! I'm 50% you're right!"
until I found the paragraph in Date's book which proved we were both
just about 50% right]) I can't really say.  

I was hoping the standard might give some guidance?

>  You might
>want to contemplate the difference in these examples:
>
>    select distinct sin(x) from foo order by abs(sin(x));

I'm not sure I see a problem here.  My (brief) reading of the
standard tells me that "order by" follows everything else, 
in other words, you get

select ... arbitrary complexity, with group by and all sorts of
cruft ...

then you take that result and apply the "order by" clause.

You'd get all the negative values followed by the positive
values, but you'd also get -1.0 and 1.0 if the database had
those values.  Because they're distinct, and therefore live to
be ordered.

But I'm not sure about it...if you push me, I'll probably go dig
into the standard again (I was so successful with referential
"NO ACTION" last time, yeah, right, I sleep with Date's book under
my pillow at the moment!)

>    select distinct random(x) from foo order by abs(random(x));

Of course, real compiler systems (like I've spent my life working
on) have heuristic or, more modernly, other ways of deciding if a
function returns different values depending on when it is called.
In such systems, you only have to guarantee the correct answer, so
choosing wrong simply means the code runs slower.  

"upper(column_value)" does not within a specific select.  Column
value won't change.  I can think of rules to think of but the
simplest might be that internal functions that are invariant when
their parameters are unchanged might be considered safe.  Others,
not.

Also, the standard might simply say the result is implementation
dependent or (slightly worse) defined if the function returns
different values for a call with the same parameter list in a
single query.  I don't know...it's an interesting question.

The other approach is to simply state that the function has one
and only one value during statement (SQL-statement, in this case)
execution, and yank the sucker out of there, execute it, and stuff
it in a temp variable.  But that's probably too naive.  Still, the
standard might say it is implementation defined as to whether or
not the function will be called once or more than once.  The standard
only cares about embedded SQL but it might give guidance...

>It would be interesting to poke at Oracle to find out just what they
>consider a legitimate ORDER BY expression for a SELECT DISTINCT.

I have full-time access to an Oracle installation, so fire away
regarding examples and questions.

Not just on this narrow subject, but in general.  I'm probably not the
ONLY person here with Oracle access, but I do have it, and my poking
at it won't hurt anything but Oracle's pride...

- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.

Re: [HACKERS] DISTINCT and ORDER BY bug?

От

Don Baccus

Дата:

07 февраля 2000 г., 04:20:33

At 12:12 AM 2/7/00 -0600, Taral wrote:

>The thing here is that random() is not deterministic on its inputs,
>whereas sin() is. Perhaps we should only allow fully deterministic ORDER
>BY? (Ugh, another flag for functions...)

Which, by it's nature is probably a misnomer, because I imagine that
PL/pgSQL functions would always have to be non deterministic whatever
their inputs?   Given that unrecognized syntax is just tossed the
query executor.  Thus calling any 'ole function without PL/pgSQL 
really knowing what's going on?

So you probably end up with a LIST of functions by name that are built-in
and deterministic.

Or ... you simply say that results are really weird if the function has
undeterministic behavior and document it.

Tom's on the right path asking what the standard might say and what
delphic, incomprehensible answer the Oracle might have for us.

(the more I learn about the SQL standard, the more I appreciate the irony
of Oracle's corporate name!)

- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.

Re: [HACKERS] DISTINCT and ORDER BY bug?

От

Tom Lane

Дата:

07 февраля 2000 г., 04:37:33

Don Baccus <dhogaza@pacifier.com> writes:
> At 12:26 AM 2/7/00 -0500, Tom Lane wrote:
>> It would be interesting to poke at Oracle to find out just what they
>> consider a legitimate ORDER BY expression for a SELECT DISTINCT.

> I have full-time access to an Oracle installation, so fire away
> regarding examples and questions.

Well, try these on for size:
select distinct x from foo order by x+1;
select distinct x+1 from foo order by x+1;
select distinct x+1 from foo order by x;
select distinct x+1 from foo order by x+2;
select distinct x+y from foo order by x+y;
select distinct x,y from foo order by x+y;
select distinct x+y from foo order by x,y;
select distinct x+y from foo order by x-y;

A human can easily see that all but the last two are well-defined,
but I'll be a little surprised if Oracle knows it...
        regards, tom lane

Re: [HACKERS] DISTINCT and ORDER BY bug?

От

Chris

Дата:

07 февраля 2000 г., 06:32:36

>         select distinct x from foo order by y;
> 
> which produces ill-defined results.

Why is this ill-defined? If y is in x then it is also distinct and
there's no logic problem sorting on it.

Re: [HACKERS] DISTINCT and ORDER BY bug?

От

Don Baccus

Дата:

07 февраля 2000 г., 13:21:14

At 01:36 AM 2/7/00 -0500, Tom Lane wrote:
>Don Baccus <dhogaza@pacifier.com> writes:
>> At 12:26 AM 2/7/00 -0500, Tom Lane wrote:
>>> It would be interesting to poke at Oracle to find out just what they
>>> consider a legitimate ORDER BY expression for a SELECT DISTINCT.
>
>> I have full-time access to an Oracle installation, so fire away
>> regarding examples and questions.
>
>Well, try these on for size:

Here's what the Oracle proclaims:

select distinct x from foo order by x+1;
no rows selected

select distinct x+1 from foo order by x+1;
no rows selected

select distinct x+1 from foo order by x;
SQL> select distinct x+1 from foo order by x                                     *
ERROR at line 1:
ORA-01791: not a SELECTed expression

select distinct x+1 from foo order by x+2;
SQL> select distinct x+1 from foo order by x+2                                     *
ERROR at line 1:
ORA-01791: not a SELECTed expression

select distinct x+y from foo order by x+y;
SQL> 
no rows selected

I also tried: select distinct x+y from foo order by y+x,
which fails.

select distinct x,y from foo order by x+y;
SQL> 
no rows selected

select distinct x+y from foo order by x,y;
SQL> select distinct x+y from foo order by x,y                                     *
ERROR at line 1:
ORA-01791: not a SELECTed expression

select distinct x+y from foo order by x-y;
SQL> select distinct x+y from foo order by x-y                                     *
ERROR at line 1:
ORA-01791: not a SELECTed expression

My first thought is that it is following a simple rule:

For arithmetic "order by" expressions, either:

1. The exact expression must also appear in the "select" list,  and it must be exact, not just an expression that
computes the same value as the "order by" expressionor

2. all of the variables used by the expression must be listed   in the "select" list as simple column names, not as
partof  an expression.

Must be true.

At least, the rule is simple if you can compare expression trees.

At this point I still am clueless regarding the standard, I think I'll
make Date my morning coffee date again.

- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.

Re: [HACKERS] DISTINCT and ORDER BY bug?

От

Tom Lane

Дата:

07 февраля 2000 г., 13:57:21

Chris <chris@bitmead.com> writes:
>> select distinct x from foo order by y;
>> 
>> which produces ill-defined results.

> Why is this ill-defined? If y is in x then it is also distinct

Huh?  The query specifies distinct values of x, and only x.
Consider    x    y
    1    1    1    10    2    0    2    11

"select distinct x" ought to produce one row with x=1, and one row with
x=2, and nothing else.  If it implicitly did the distinct on y as well,
you'd get four rows with two x=1 and two x=2, which is not my idea of
"distinct x".  But if you don't have four rows out, then there's no
meaningful way to order by y.

6.5.3 in fact produces four rows from this query, which is generally
conceded to be broken behavior.
        regards, tom lane

Re: [HACKERS] DISTINCT and ORDER BY bug?

От

Tom Lane

Дата:

07 февраля 2000 г., 14:04:23

Don Baccus <dhogaza@pacifier.com> writes:
> My first thought is that it is following a simple rule:

> For arithmetic "order by" expressions, either:

> 1. The exact expression must also appear in the "select" list,
>    and it must be exact, not just an expression that computes
>    the same value as the "order by" expression
>  or

> 2. all of the variables used by the expression must be listed 
>    in the "select" list as simple column names, not as part of
>    an expression.

Could be.  How about cases like
select distinct x,y+1 from foo order by x+y+1;

> At least, the rule is simple if you can compare expression trees.

I think we have something pretty similar for GROUP BY, actually,
so it may not be hard to make this work.
        regards, tom lane

Re: [HACKERS] DISTINCT and ORDER BY bug?

От

Don Baccus

Дата:

07 февраля 2000 г., 14:27:12

At 11:03 AM 2/7/00 -0500, Tom Lane wrote:

>Could be.  How about cases like
>
>    select distinct x,y+1 from foo order by x+y+1;


SQL> select distinct x,y+1 from foo order by x+y+1                                         *
ERROR at line 1:
ORA-01791: not a SELECTed expression

>> At least, the rule is simple if you can compare expression trees.

>I think we have something pretty similar for GROUP BY, actually,
>so it may not be hard to make this work.

Actually, yes, you're probably right...



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.

Re: [HACKERS] DISTINCT and ORDER BY bug?

От

Tom Lane

Дата:

07 февраля 2000 г., 15:12:13

Don Baccus <dhogaza@pacifier.com> writes:
SQL> select distinct x,y+1 from foo order by x+y+1
>                                           *
> ERROR at line 1:
> ORA-01791: not a SELECTed expression

Actually, that was a little unfair, since their parser no doubt parsed
"x+y+1" as (x+y)+1, leaving no common subexpression visible.  Do they
accept
select distinct x,y+1 from foo order by x+(y+1)

>>> At least, the rule is simple if you can compare expression trees.

>> I think we have something pretty similar for GROUP BY, actually,
>> so it may not be hard to make this work.

On further thought, I think the real implementation issue is that
doing SELECT DISTINCT ORDER BY requires either two sorting steps
(sort by DISTINCT fields, "uniq" filter, sort again by ORDER BY fields)
or else some very hairy logic to figure out that ORDER BY x+1
"implies" ORDER BY x.  In fact I'm not sure it does imply it
in the general case.  In your original example, the requested sort
was ORDER BY upper(x), but that doesn't guarantee that the tuples
will be ordered adequately for duplicate-x elimination.  For example,
that ORDER BY might yield
Ansel AdamsDon BaccusDON BACCUSDon BaccusJoe Blow...

which is a valid sort by upper(x), but a uniq filter on plain x
will fail to get rid of the second occurrence of "Don Baccus" as
it should.

Possibly we could make this work by implicitly expanding the ORDER BY
to "ORDER BY upper(x), x" which would ensure that the duplicate x's
are brought together.  I am not sure this will give the right results
always, but it seems promising.  We are assuming here that upper(x)
gives equal outputs for equal inputs, so it would fall down on random(x)
--- I suppose we could refuse to do this if we see a function that is
marked non-constant-foldable in pg_proc...
        regards, tom lane

Re: [HACKERS] DISTINCT and ORDER BY bug?

От

Don Baccus

Дата:

07 февраля 2000 г., 16:54:14

At 12:10 PM 2/7/00 -0500, Tom Lane wrote:
>Don Baccus <dhogaza@pacifier.com> writes:
>SQL> select distinct x,y+1 from foo order by x+y+1
>>                                           *
>> ERROR at line 1:
>> ORA-01791: not a SELECTed expression
>
>Actually, that was a little unfair, since their parser no doubt parsed
>"x+y+1" as (x+y)+1, leaving no common subexpression visible.  Do they
>accept
>
>    select distinct x,y+1 from foo order by x+(y+1)

Yes, it does.  So, they must be doing some level of common expression
analysis, for real.

>>>> At least, the rule is simple if you can compare expression trees.
>
>>> I think we have something pretty similar for GROUP BY, actually,
>>> so it may not be hard to make this work.
>
>On further thought, I think the real implementation issue is that
>doing SELECT DISTINCT ORDER BY requires either two sorting steps
>(sort by DISTINCT fields, "uniq" filter, sort again by ORDER BY fields)

Yes.

>or else some very hairy logic to figure out that ORDER BY x+1
>"implies" ORDER BY x.  In fact I'm not sure it does imply it
>in the general case.  In your original example, the requested sort
>was ORDER BY upper(x), but that doesn't guarantee that the tuples
>will be ordered adequately for duplicate-x elimination. 

I realize that.  I would assume that a double-sort penalty might
be incurred, i.e. the select distinct ... is executed followed by
the order by.

>Possibly we could make this work by implicitly expanding the ORDER BY
>to "ORDER BY upper(x), x" which would ensure that the duplicate x's
>are brought together.

That would be another approach, too, if it works for all cases...

>  I am not sure this will give the right results
>always, but it seems promising.  We are assuming here that upper(x)
>gives equal outputs for equal inputs, so it would fall down on random(x)
>--- I suppose we could refuse to do this if we see a function that is
>marked non-constant-foldable in pg_proc...

Something like that, yes.

I just checked Date while off having coffee, and it is clear that the
SQL standard specifies that ORDER BY operates on COLUMNS, not expressions.
So the restriction that's now imposed is indeed standard compliant.  However,
some level of extension in this area would be very useful, and my guess is
that examples like the one that started this discussion are very common.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: DISTINCT and ORDER BY bug?