Re: [HACKERS] What about LIMIT in SELECT ?

Поиск
Список
Период
Сортировка
От jwieck@debis.com (Jan Wieck)
Тема Re: [HACKERS] What about LIMIT in SELECT ?
Дата
Msg-id m0zTOnx-000EBRC@orion.SAPserv.Hamburg.dsh.de
обсуждение исходный текст
Ответ на Re: [HACKERS] What about LIMIT in SELECT ?  (Eric Lee Green <eric@linux-hw.com>)
Ответы Re: [HACKERS] What about LIMIT in SELECT ?  (Oleg Bartunov <oleg@sai.msu.su>)
Список pgsql-hackers
Eric Lee Green wrote:
>
> On Tue, 13 Oct 1998, Jeff Hoffmann wrote:
> > >I agree completely, LIMIT would be VERY usefull in web based apps, which
> > >is all I run.  It does not matter to me if it is not part of a formal
> > >standard.  The idea is so common that it is a defacto standard.
> >
> > i'm not familiar with mysql and using "LIMIT" but wouldn't this same effect
> > be achieved by declaring a cursor and fetching however many records in the
> > cursor?  it's a very noticeable improvement when you only want the first 20
> > out of 500 in a 200k record database, at least.
>
> The problem with declaring a cursor vs. the "LIMIT" clause is that the
> "LIMIT" clause, if used properly by the database engine (along with the
> database engine using indexes in "ORDER BY" clauses) allows the database
> engine to short-circuit the tail end of the query. That is, if you have 25
> names and the last one ends with BEAVIS, the database engine doesn't have
> to go through the BUTTHEADS and KENNYs and etc.
>
> Theoretically a cursor is superior to the "LIMIT" clause because you're
> eventually going to want the B's and K's and etc. anyhow -- but only in a
> stateful enviornment. In the stateless web environment, a cursor is
> useless because the connection can close at any time even when you're
> using "persistent" connections (and of course when the connection closes
> the cursor closes).

    I'm  missing something. Well it's right that in the stateless
    web environment a cursor has to be declared  and  closed  for
    any  single  CGI  call.  But even if you have a LIMIT clause,
    your CGI must know with which key to start.

    So your query must look like

        SELECT ... WHERE key > 'last processed key' ORDER BY key;

    And your key must be unique (or at least contain no duplicate
    entries)  or you might miss some rows between the pages (have
    100 Brown's in the table and last processed key was  a  Brown
    while using LIMIT).

    In  postgres you could actually do the following (but read on
    below - it's not optimized correct)

        BEGIN;
        DECLARE c CURSOR FOR SELECT ... WHERE key > 'last' ORDER BY key;
        FETCH 20 IN c;
        (process the 20 rows in CGI)
        CLOSE c;
        COMMIT;

    Having LIMIT looks more elegant and has less overhead in CGI-
    backend   communication.   But  the  cursor  version  is  SQL
    standard and portable.

>
> I wanted very badly to use PostgreSQL for a web project I'm working on,
> but it just wouldn't do the job :-(.

    I've done some tests and what I found out might be a  bug  in
    PostgreSQL's  query  optimizer.  Having a table with 25k rows
    where key is a text field with a unique  index.  Now  I  used
    EXPLAIN for some queries

        SELECT * FROM tab;

        results in a seqscan - expected.

        SELECT * FROM tab ORDER BY key;

        results in a sort->seqscan - I would have
        expected an indexscan!

        SELECT * FROM tab WHERE key > 'G';

        results in an indexscan - expected.

        SELECT * FROM tab WHERE key > 'G' ORDER BY key;

        results in a sort->indexscan - hmmm.

    These  results  stay  the same even if I blow up the table by
    duplicating all rows (now with a non-unique  index)  to  100k
    rows and have them presorted in the table.

    Needless to say that everything is vacuum'd for statistics.

    The  last  one  is  the  query  we  would  need  in  the  web
    environment used over a cursor as in the example  above.  But
    due  to  the  sort,  the backend selects until the end of the
    table, sorts them and then returns only  the  first  20  rows
    (out of sorts result).

    This  is very painful if the qualification (key > ...) points
    to the beginning of the key list.

    Looking at planner.c I can see, that if there is a sortClause
    in  the  parsetree,  the planner creates a sort node and does
    absolutely not check if there is an index that could be  used
    to  do  it.  In  the  examples  above, the sort is absolutely
    needless because the  index  scan  will  already  return  the
    tuples in the right order :-).

    Somewhere  deep  in my brain I found a statement that sorting
    sorted  data  isn't  only  unnecessary  (except   the   order
    changes),  it  is  slow too compared against sorting randomly
    ordered data.

    Can we fix this before 6.4 release, will it be a past 6.4  or
    am I doing something wrong here? I think it isn't a fix (it's
    a planner enhancement) so it should  really  be  a  past  6.4
    item.

    For  now, the only possibility is to omit the ORDER BY in the
    query and hope the planner will always generate an index scan
    (because  of  the  qualification  'key  >  ...').  Doing so I
    selected multiple times 20 rows (with the last key qual  like
    a  CGI  would do) in separate transactions.  Using cursor and
    fetch speeds up the access by a factor of 1000!   But  it  is
    unsafe  and thus NOT RECOMMENDED! It's only a test if cursors
    can do the LIMIT job - and they could if the planner would do
    a better job.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

В списке pgsql-hackers по дате отправления:

Предыдущее
От: jwieck@debis.com (Jan Wieck)
Дата:
Сообщение: Re: [HACKERS] PostgreSQL v6.4 BETA2 ...
Следующее
От: Oleg Bartunov
Дата:
Сообщение: Re: [HACKERS] What about LIMIT in SELECT ?