Обсуждение: bug #7499 additional comments

Поиск
Список
Период
Сортировка

bug #7499 additional comments

От
Denis Kolesnik
Дата:
My arguments are:

 is that even
select id, str_last_name from tbl_owners_individual where id in
(83,175,111,1) order by id;

 id  |    str_last_name
-----+----------------------
   1 | Kolesnik
  83 | GXXXXXXXXX
 111 | Kolesnik
 175 | GXXXXXXXXX
(4 =D1=81=D1=82=D1=80=D0=BE=D0=BA=D0=B8)

select id, str_last_name from tbl_owners_individual where id in
(83,175,111,1) order by str_last_name;

 id  |    str_last_name
-----+----------------------
  83 | GXXXXXXXXX
 175 | GXXXXXXXXX
   1 | Kolesnik
 111 | Kolesnik
(4 =D1=81=D1=82=D1=80=D0=BE=D0=BA=D0=B8)

Compare this 2 results and you see, that even if the records with the
same last names do not come directly one after other then "id 1"
always closer to the top, then "id 111"
and "id 83" always clother to the top then "id 175". It proves, that
the sorting by id remains always even if only among records for the
same lastname.

Suppose a person who has basic SQL knowledges would learn on praxis
how would result a query if a person adds the clause "limit 1" to it and

if a person sees results for this query:
select id, str_last_name from tbl_owners_individual order by
str_last_name offset 26 limit 1;

 id |    str_last_name
----+----------------------
 83 | GXXXXXXXXX
(1 =D1=81=D1=82=D1=80=D0=BE=D0=BA=D0=B0)

and compares result to the query

select id, str_last_name from tbl_owners_individual order by
str_last_name offset 26;

 id  |    str_last_name
-----+----------------------
  83 | GXXXXXXXXX
 175 | GXXXXXXXXX
...

then one makes conclusion, that a sorting by id always remain in both
cases, but if one replaces this queries so:

select id, str_last_name from tbl_owners_individual order by
str_last_name limit 1 offset 53;

 id  |    str_last_name
-----+----------------------
 111 | Kolesnik
(1 =D1=81=D1=82=D1=80=D0=BE=D0=BA=D0=B0)

select id, str_last_name from tbl_owners_individual order by
str_last_name offset 53;

 id  |    str_last_name
-----+----------------------
   1 | Kolesnik
 111 | Kolesnik
...

Then a person comes to misunderstanding.

You would sugguest, that one should read documentation.

in the (where with ... replaced a directory in which the PostgreSQL install=
ed)
...PostgreSQL\9.1\doc\postgresql\html\queries-limit.html

"...When using LIMIT, it is important to use an ORDER BY clause that
constrains the result rows into a unique order. .."

here asked to use "ORDER BY" which is done in every query above.

"...The query optimizer takes LIMIT into account when generating query
plans, so you are very likely to get different plans (yielding
different row orders) depending on what you give for LIMIT and OFFSET.
Thus, using different LIMIT/OFFSET values to select different subsets
of a query result will give inconsistent results unless you enforce a
predictable result ordering with ORDER BY. This is not a bug; ..."

the values of "ORDER BY" for LIMIT/OFFSET are not different as you
see. All requirements are filled.


this part "...The query optimizer takes LIMIT into account when
generating query plans, so you are very likely to get different plans
(yielding different row orders) depending on what you give for LIMIT
and OFFSET." would explain, that adding "LIMIT" will result in some
unxplained data sorting, but

this "...Thus, using different LIMIT/OFFSET values to select different
subsets of a query result will give inconsistent results unless you
enforce a predictable result ordering with ORDER BY."

then the query with the results as you see:
select id, str_last_name from tbl_owners_individual where
str_last_name=3D'Kolesnik' order by str_last_name limit 2 offset 2;

 id  |    str_last_name
-----+----------------------
 111 | Kolesnik
 144 | Kolesnik
(2 =D1=81=D1=82=D1=80=D0=BE=D0=BA=D0=B8)

inconsistent results unless you enforce a predictable result ordering
with ORDER BY.

order by is here predictable, exists, but:

select id, str_last_name from tbl_owners_individual order by str_last_name;
...
  49 | Kolesnik
 224 | Kolesnik
 144 | Kolesnik
   1 | Kolesnik
 111 | Kolesnik
...

as you see: offset 2 should return "144 | Kolesnik           "

and "...inconsistent results..." nowhere in this page of documentation
(as I read it, if I do wrong) stated that
"...inconsistent results..." not applies to the following 2 queries:

select id, str_last_name from tbl_owners_individual order by str_last_name;
select id, str_last_name from tbl_owners_individual order by
str_last_name limit 2 offset 2;

I and not only I by reading this page of documentation will conclude
not without a reason that the queries different on presense or absense
"...limit 2 offset 2..." should return results which are consistent.

Basing on this I conclude, that it is a bug.

With respect,
Denis Kolesnik.

On 8/22/12, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
> Denis Kolesnik <lirex.software@gmail.com> wrote:
>
>> I have now VERY strong argument to consider it is as a bug:
>
> No, you appear to have very strong feelings about it, but you are
> not making an argument that holds water.
>
>> if there a understandable for SQL language sequence which sorts
>> in other fashion when adding "LIMIT".
>
> Each query is evaluated in terms of satisfying what is requested in
> that query, and the fastest plan which returns those results is
> chosen.  If you want results to be generated in a specific order, it
> is incumbent on you to specify that in the query -- there is no
> "natural order" to rows which is used as a tie-breaker.  There are
> even optimizations to have one query which is going to scan a table
> start at the point that another table scan, already in progress is
> at, to prevent extra reads -- so exactly the same query run at about
> the same time, with no concurrent database changes can easily return
> rows in different orders.  That's a feature, not a bug.  If you want
> them in a particular order, say so, and appropriate index usage or
> sorts will be added to the query execution to provide them the way
> you ask, even though that is slower than it would be if you didn't
> care about the order.
>
>> and even sorting by id:
>> select id, str_last_name from tbl_owners_individual where id in
>> (83,175,111,1) order by str_last_name;
>>
>>  id  |    str_last_name
>> -----+----------------------
>>   83 | GXXXXXXXXX
>>  175 | GXXXXXXXXX
>>    1 | Kolesnik
>>  111 | Kolesnik
>> (4 ******)
>
> No, it didn't go out of its way to sort that way, it just happened
> to fall out that way that time; don't count on it always being that
> way, even if it happens many times in a row.
>
> test=3D# create table tbl_owners_individual
> test-#   (id int not null primary key, str_last_name text not null);
> CREATE TABLE
> test=3D# insert into tbl_owners_individual values
> test-#   (1, 'Kolesnik'),
> test-#   (83, 'GXXXXXXXXX'),
> test-#   (111, 'Kolesnik'),
> test-#   (175, 'GXXXXXXXXX');
> INSERT 0 4
> test=3D# select id, str_last_name from tbl_owners_individual where id
> in
> test-# (83,175,111,1) order by str_last_name;
>  id  | str_last_name
> -----+---------------
>   83 | GXXXXXXXXX
>  175 | GXXXXXXXXX
>    1 | Kolesnik
>  111 | Kolesnik
> (4 rows)
>
> -Kevin
>

Re: bug #7499 additional comments

От
Tom Lane
Дата:
Denis Kolesnik <lirex.software@gmail.com> writes:
> You would sugguest, that one should read documentation.

Indeed.

> in the (where with ... replaced a directory in which the PostgreSQL installed)
> ...PostgreSQL\9.1\doc\postgresql\html\queries-limit.html

> "...When using LIMIT, it is important to use an ORDER BY clause that
> constrains the result rows into a unique order. .."

> here asked to use "ORDER BY" which is done in every query above.

What you are stubbornly refusing to grasp is that your ORDER BY is not
sufficient to ensure a unique order.  The above-quoted sentence does not
promise that any old ORDER BY will do.  If you simply ORDER BY a
non-unique column, the query's output ordering is only constrained up to
the point that rows with the same ordering value will be adjacent.
If you want more than that, you need to add additional columns to the
ORDER BY to resolve cases where that column is equal.

As long as you haven't done that, you are wrong, and you are wasting
both your time and ours arguing about it.  You've wasted quite enough
of my time already; don't expect to see any further responses on this
subject.

            regards, tom lane

Re: bug #7499 additional comments

От
Craig Ringer
Дата:
On 08/23/2012 04:12 AM, Denis Kolesnik wrote:

> Suppose a person who has basic SQL knowledges would learn on praxis
> how would result a query if a person adds the clause "limit 1" to it

Then they just got bitten by not learning enough and not testing their
code well enough; they were probably programming by recipe and
copy-and-paste, not by learning the platform they're working with.


http://www.postgresql.org/docs/9.1/static/sql-select.html#SQL-ORDERBY

"The ORDER BY clause causes the result rows to be sorted according to
the specified expression(s). If two rows are equal according to the
leftmost expression, they are compared according to the next expression
and so on. If they are equal according to all specified expressions,
they are returned in an implementation-dependent order."


It'd be really nice if every programming language and tool could be
completely safe and easy, with no undefined, implementation-defined or
inconsistent behaviour. Unfortunately, in the real world that doesn't
happen because perfectly specified platforms are (a) really hard to
actually write and (b) usually hard to optimise and thus slow.


Suppose a person with basic C knowledge wrote this (utterly wrong and
dangerous, do not use for anything) program:

#include <stdio.h>
#include <string.h>
#include <malloc.h>
int main() {
         char * blah = (char*)malloc(10);
         strcpy(blah,"1234567890");
         printf("%s\n", blah);
}

This program has *at* *least* one bug that'll cause it to run most of
the time, but fail unpredictably, especially when used as part of a
larger program rather than standalone. Failure will depend on platform,
C library, kernel, compiler settings, and the contents of uninitialized
memory.

Is the platform responsible for the user shooting themselves in the foot
because they didn't learn about null termination of strings, buffer
over-runs, the dangers of using strcpy(), etc? To me it's a bug in the
user's code, not the platform.

Sure, the platform could be easier to use. It could add lots of bounds
checks, prohibit raw memory access, use garbage collection instead of
explicit pointer-based memory management, etc. Then you'd have a new
platform called Java, which is very useful - but not really something
you can use to write tiny programs that take microseconds to run, or
high-performance operating system kernels.

Even Java has plenty of traps and confusing characteristics. Anything to
do with threads. finalize() methods. try {} catch {} finally {}
constructs. Double-checked locking. Plenty more. That's in a language
that was designed to be an easier and safer alternative to C.

Everything is a compromise, including the SQL language and
implementations of it. If Pg made underspecified sorts an error then
lots of other people would scream "bug!" because pretty much every other
database system lets you do this so it'd be a portability problem - and
because it's a really useful behaviour for some purposes. If Pg's query
planner always ensured that sorts were stable and always did the same
sorts, people wouldn't use Pg because it'd be too slow.

More importantly, PostgreSQL has no way of *knowing* for sure that the
sort is underspecified. It can't know that the column you've specified
isn't unique, or at least unique within the subset of data you're
working with. It trusts you to know what you want.

The trick is to read the documentation, learn, and test your code well.

That's true of every language, even those that try to protect the
programmer from their mistakes as much as possible.

--
Craig Ringer