Обсуждение: DISTINCT and ORDER BY bug?
The following used to work in 6.5, works in Oracle, and is very useful: donb=# create table foo(c varchar); CREATE donb=# insert into foo values('abc'); INSERT 72649 1 donb=# select distinct c from foo order by upper(c); ERROR: For SELECT DISTINCT, ORDER BY expressions must appear in target list donb=# In other words, we want to order ignoring case - in this case, users within the Ars Digita Community system. We want don baccus to appear next to Joe Blow rather than following Xena Xenophoba. Is this now refused because it is non-standard? It seems a pity... Of course, one can do "select distinct c, upper(c) as ignore ..." but that forces the return of more data, so is slower, etc... BTW the very fact that my testing of our partial port of this web toolkit under V7 pre-beta has gotten this far is a very good sign. Among other things, it makes heavy (if simple) use of referential integrity, which has already uncovered two bugs in the port that I've fixed. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
Don Baccus <dhogaza@pacifier.com> writes: > The following used to work in 6.5, works in Oracle, and is > very useful: > donb=# select distinct c from foo order by upper(c); > ERROR: For SELECT DISTINCT, ORDER BY expressions must appear in target list Well, it's not a bug --- it was an entirely deliberate change. It might be a misfeature though. The case we were concerned about was select distinct x from foo order by y; which produces ill-defined results. If I recall the thread correctly, Oracle and a number of other DBMSs reject this. I think your point is that select distinct x from foo order by f(x); *is* well-defined, and useful. I think you are right, but how far should we go in detecting common subexpressions? You might want to contemplate the difference in these examples: select distinct sin(x) from foo order by abs(sin(x)); select distinct random(x) from foo order by abs(random(x)); It would be interesting to poke at Oracle to find out just what they consider a legitimate ORDER BY expression for a SELECT DISTINCT. regards, tom lane
On Mon, 7 Feb 2000, Tom Lane wrote: > Well, it's not a bug --- it was an entirely deliberate change. It > might be a misfeature though. The case we were concerned about was > > select distinct x from foo order by y; > > which produces ill-defined results. Okay, I can understand this... > select distinct sin(x) from foo order by abs(sin(x)); > > select distinct random(x) from foo order by abs(random(x)); The thing here is that random() is not deterministic on its inputs, whereas sin() is. Perhaps we should only allow fully deterministic ORDER BY? (Ugh, another flag for functions...) Taral
At 12:26 AM 2/7/00 -0500, Tom Lane wrote: >Well, it's not a bug --- it was an entirely deliberate change. It >might be a misfeature though. Ahhh...getting subtle, are we? :) > The case we were concerned about was > > select distinct x from foo order by y; Yes...I remember some discussion regarding this. >which produces ill-defined results. If I recall the thread correctly, >Oracle and a number of other DBMSs reject this. I think your point is >that > select distinct x from foo order by f(x); >*is* well-defined, and useful. I think you are right, but how >far should we go in detecting common subexpressions? Not sure...having not been into that part of the code (and busy at the moment testing my rewrites of small portions of RI trigger code I rewrote at Jan's request, after our "dispute" [which was more or less "I'm 50% certain you're right!" "No! I'm 50% you're right!" until I found the paragraph in Date's book which proved we were both just about 50% right]) I can't really say. I was hoping the standard might give some guidance? > You might >want to contemplate the difference in these examples: > > select distinct sin(x) from foo order by abs(sin(x)); I'm not sure I see a problem here. My (brief) reading of the standard tells me that "order by" follows everything else, in other words, you get select ... arbitrary complexity, with group by and all sorts of cruft ... then you take that result and apply the "order by" clause. You'd get all the negative values followed by the positive values, but you'd also get -1.0 and 1.0 if the database had those values. Because they're distinct, and therefore live to be ordered. But I'm not sure about it...if you push me, I'll probably go dig into the standard again (I was so successful with referential "NO ACTION" last time, yeah, right, I sleep with Date's book under my pillow at the moment!) > select distinct random(x) from foo order by abs(random(x)); Of course, real compiler systems (like I've spent my life working on) have heuristic or, more modernly, other ways of deciding if a function returns different values depending on when it is called. In such systems, you only have to guarantee the correct answer, so choosing wrong simply means the code runs slower. "upper(column_value)" does not within a specific select. Column value won't change. I can think of rules to think of but the simplest might be that internal functions that are invariant when their parameters are unchanged might be considered safe. Others, not. Also, the standard might simply say the result is implementation dependent or (slightly worse) defined if the function returns different values for a call with the same parameter list in a single query. I don't know...it's an interesting question. The other approach is to simply state that the function has one and only one value during statement (SQL-statement, in this case) execution, and yank the sucker out of there, execute it, and stuff it in a temp variable. But that's probably too naive. Still, the standard might say it is implementation defined as to whether or not the function will be called once or more than once. The standard only cares about embedded SQL but it might give guidance... >It would be interesting to poke at Oracle to find out just what they >consider a legitimate ORDER BY expression for a SELECT DISTINCT. I have full-time access to an Oracle installation, so fire away regarding examples and questions. Not just on this narrow subject, but in general. I'm probably not the ONLY person here with Oracle access, but I do have it, and my poking at it won't hurt anything but Oracle's pride... - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
At 12:12 AM 2/7/00 -0600, Taral wrote: >The thing here is that random() is not deterministic on its inputs, >whereas sin() is. Perhaps we should only allow fully deterministic ORDER >BY? (Ugh, another flag for functions...) Which, by it's nature is probably a misnomer, because I imagine that PL/pgSQL functions would always have to be non deterministic whatever their inputs? Given that unrecognized syntax is just tossed the query executor. Thus calling any 'ole function without PL/pgSQL really knowing what's going on? So you probably end up with a LIST of functions by name that are built-in and deterministic. Or ... you simply say that results are really weird if the function has undeterministic behavior and document it. Tom's on the right path asking what the standard might say and what delphic, incomprehensible answer the Oracle might have for us. (the more I learn about the SQL standard, the more I appreciate the irony of Oracle's corporate name!) - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
Don Baccus <dhogaza@pacifier.com> writes: > At 12:26 AM 2/7/00 -0500, Tom Lane wrote: >> It would be interesting to poke at Oracle to find out just what they >> consider a legitimate ORDER BY expression for a SELECT DISTINCT. > I have full-time access to an Oracle installation, so fire away > regarding examples and questions. Well, try these on for size: select distinct x from foo order by x+1; select distinct x+1 from foo order by x+1; select distinct x+1 from foo order by x; select distinct x+1 from foo order by x+2; select distinct x+y from foo order by x+y; select distinct x,y from foo order by x+y; select distinct x+y from foo order by x,y; select distinct x+y from foo order by x-y; A human can easily see that all but the last two are well-defined, but I'll be a little surprised if Oracle knows it... regards, tom lane
> select distinct x from foo order by y; > > which produces ill-defined results. Why is this ill-defined? If y is in x then it is also distinct and there's no logic problem sorting on it.
At 01:36 AM 2/7/00 -0500, Tom Lane wrote: >Don Baccus <dhogaza@pacifier.com> writes: >> At 12:26 AM 2/7/00 -0500, Tom Lane wrote: >>> It would be interesting to poke at Oracle to find out just what they >>> consider a legitimate ORDER BY expression for a SELECT DISTINCT. > >> I have full-time access to an Oracle installation, so fire away >> regarding examples and questions. > >Well, try these on for size: Here's what the Oracle proclaims: select distinct x from foo order by x+1; no rows selected select distinct x+1 from foo order by x+1; no rows selected select distinct x+1 from foo order by x; SQL> select distinct x+1 from foo order by x * ERROR at line 1: ORA-01791: not a SELECTed expression select distinct x+1 from foo order by x+2; SQL> select distinct x+1 from foo order by x+2 * ERROR at line 1: ORA-01791: not a SELECTed expression select distinct x+y from foo order by x+y; SQL> no rows selected I also tried: select distinct x+y from foo order by y+x, which fails. select distinct x,y from foo order by x+y; SQL> no rows selected select distinct x+y from foo order by x,y; SQL> select distinct x+y from foo order by x,y * ERROR at line 1: ORA-01791: not a SELECTed expression select distinct x+y from foo order by x-y; SQL> select distinct x+y from foo order by x-y * ERROR at line 1: ORA-01791: not a SELECTed expression My first thought is that it is following a simple rule: For arithmetic "order by" expressions, either: 1. The exact expression must also appear in the "select" list, and it must be exact, not just an expression that computes the same value as the "order by" expressionor 2. all of the variables used by the expression must be listed in the "select" list as simple column names, not as partof an expression. Must be true. At least, the rule is simple if you can compare expression trees. At this point I still am clueless regarding the standard, I think I'll make Date my morning coffee date again. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
Chris <chris@bitmead.com> writes: >> select distinct x from foo order by y; >> >> which produces ill-defined results. > Why is this ill-defined? If y is in x then it is also distinct Huh? The query specifies distinct values of x, and only x. Consider x y 1 1 1 10 2 0 2 11 "select distinct x" ought to produce one row with x=1, and one row with x=2, and nothing else. If it implicitly did the distinct on y as well, you'd get four rows with two x=1 and two x=2, which is not my idea of "distinct x". But if you don't have four rows out, then there's no meaningful way to order by y. 6.5.3 in fact produces four rows from this query, which is generally conceded to be broken behavior. regards, tom lane
Don Baccus <dhogaza@pacifier.com> writes: > My first thought is that it is following a simple rule: > For arithmetic "order by" expressions, either: > 1. The exact expression must also appear in the "select" list, > and it must be exact, not just an expression that computes > the same value as the "order by" expression > or > 2. all of the variables used by the expression must be listed > in the "select" list as simple column names, not as part of > an expression. Could be. How about cases like select distinct x,y+1 from foo order by x+y+1; > At least, the rule is simple if you can compare expression trees. I think we have something pretty similar for GROUP BY, actually, so it may not be hard to make this work. regards, tom lane
At 11:03 AM 2/7/00 -0500, Tom Lane wrote: >Could be. How about cases like > > select distinct x,y+1 from foo order by x+y+1; SQL> select distinct x,y+1 from foo order by x+y+1 * ERROR at line 1: ORA-01791: not a SELECTed expression >> At least, the rule is simple if you can compare expression trees. >I think we have something pretty similar for GROUP BY, actually, >so it may not be hard to make this work. Actually, yes, you're probably right... - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
Don Baccus <dhogaza@pacifier.com> writes: SQL> select distinct x,y+1 from foo order by x+y+1 > * > ERROR at line 1: > ORA-01791: not a SELECTed expression Actually, that was a little unfair, since their parser no doubt parsed "x+y+1" as (x+y)+1, leaving no common subexpression visible. Do they accept select distinct x,y+1 from foo order by x+(y+1) >>> At least, the rule is simple if you can compare expression trees. >> I think we have something pretty similar for GROUP BY, actually, >> so it may not be hard to make this work. On further thought, I think the real implementation issue is that doing SELECT DISTINCT ORDER BY requires either two sorting steps (sort by DISTINCT fields, "uniq" filter, sort again by ORDER BY fields) or else some very hairy logic to figure out that ORDER BY x+1 "implies" ORDER BY x. In fact I'm not sure it does imply it in the general case. In your original example, the requested sort was ORDER BY upper(x), but that doesn't guarantee that the tuples will be ordered adequately for duplicate-x elimination. For example, that ORDER BY might yield Ansel AdamsDon BaccusDON BACCUSDon BaccusJoe Blow... which is a valid sort by upper(x), but a uniq filter on plain x will fail to get rid of the second occurrence of "Don Baccus" as it should. Possibly we could make this work by implicitly expanding the ORDER BY to "ORDER BY upper(x), x" which would ensure that the duplicate x's are brought together. I am not sure this will give the right results always, but it seems promising. We are assuming here that upper(x) gives equal outputs for equal inputs, so it would fall down on random(x) --- I suppose we could refuse to do this if we see a function that is marked non-constant-foldable in pg_proc... regards, tom lane
At 12:10 PM 2/7/00 -0500, Tom Lane wrote: >Don Baccus <dhogaza@pacifier.com> writes: >SQL> select distinct x,y+1 from foo order by x+y+1 >> * >> ERROR at line 1: >> ORA-01791: not a SELECTed expression > >Actually, that was a little unfair, since their parser no doubt parsed >"x+y+1" as (x+y)+1, leaving no common subexpression visible. Do they >accept > > select distinct x,y+1 from foo order by x+(y+1) Yes, it does. So, they must be doing some level of common expression analysis, for real. >>>> At least, the rule is simple if you can compare expression trees. > >>> I think we have something pretty similar for GROUP BY, actually, >>> so it may not be hard to make this work. > >On further thought, I think the real implementation issue is that >doing SELECT DISTINCT ORDER BY requires either two sorting steps >(sort by DISTINCT fields, "uniq" filter, sort again by ORDER BY fields) Yes. >or else some very hairy logic to figure out that ORDER BY x+1 >"implies" ORDER BY x. In fact I'm not sure it does imply it >in the general case. In your original example, the requested sort >was ORDER BY upper(x), but that doesn't guarantee that the tuples >will be ordered adequately for duplicate-x elimination. I realize that. I would assume that a double-sort penalty might be incurred, i.e. the select distinct ... is executed followed by the order by. >Possibly we could make this work by implicitly expanding the ORDER BY >to "ORDER BY upper(x), x" which would ensure that the duplicate x's >are brought together. That would be another approach, too, if it works for all cases... > I am not sure this will give the right results >always, but it seems promising. We are assuming here that upper(x) >gives equal outputs for equal inputs, so it would fall down on random(x) >--- I suppose we could refuse to do this if we see a function that is >marked non-constant-foldable in pg_proc... Something like that, yes. I just checked Date while off having coffee, and it is clear that the SQL standard specifies that ORDER BY operates on COLUMNS, not expressions. So the restriction that's now imposed is indeed standard compliant. However, some level of extension in this area would be very useful, and my guess is that examples like the one that started this discussion are very common. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.