Re: [HACKERS] Solution for LIMIT cost estimation

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] Solution for LIMIT cost estimation
Дата
Msg-id 18098.950241132@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] Solution for LIMIT cost estimation  (Chris Bitmead <chrisb@nimrod.itg.telstra.com.au>)
Ответы Re: [HACKERS] Solution for LIMIT cost estimation  (Philip Warner <pjw@rhyme.com.au>)
Re: [HACKERS] Solution for LIMIT cost estimation  (Don Baccus <dhogaza@pacifier.com>)
Re: [HACKERS] Solution for LIMIT cost estimation  ("Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu>)
Re: [HACKERS] Solution for LIMIT cost estimation  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-hackers
Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes:
> ... it sounds like this
> proposal could mean that successive SELECTS with LIMIT could
> execute completely different plans and therefore return inconsistent
> results.
> In short, I think the fact that limit doesn't alter the plan may
> be more of a feature than a bug.

A good point (one I'm embarrassed not to have thought about!) but
I don't think it's a reason not to push forward in this direction.
We have had *way* too many complaints about Postgres not being able
to optimize LIMITed queries, so I'm not willing to ignore the issue;
something's got to be done about it.

As Don B. points out nearby, there's no guarantee anyway that
repeatedly issuing the same query with different LIMIT parameters
will get you consistent results.  The only upright and morally
correct approach is to use DECLARE CURSOR followed by FETCH commands
(all within a transaction of course) in order to get results that
are really all part of a single query.  Now DECLARE CURSOR is also
presently optimized on the basis of fetching the entire result set,
so that is still a problem --- I neglected to mention before that
I was intending to tweak the optimizer to optimize that case like a
moderate-sized LIMIT.

But having said that, I hear what you're saying and I think it's
worth thinking about.  Here are four possible alternative responses:

1. Optimize the query as I sketched previously, but using the "count"
part of the LIMIT spec while deliberately ignoring the "offset".
Then you get consistent results for fetching different chunks of the
query result as long as you use the same count each time.  (And as long
as no one else is changing the DB meanwhile, but we'll take that as
read for each of these choices.)

2. Ignore both the count and offset, and optimize any query containing
a LIMIT clause on the basis of some fixed assumption about what fraction
of the tuples will be fetched (I'm thinking something like 1% to 10%).
This allows different fetch sizes to be used without destroying
consistency, but it falls down on the goal of correctly optimizing
"LIMIT 1" hacks.

3. Use the count+offset for all it's worth, and document that you
can't expect to get consistent results from asking for different
LIMITed chunks of the "same" query unless you use ORDER BY to
ensure consistent ordering of the tuples.

4. Fascist variant of #3: make LIMIT without ORDER BY be an error.

SQL92 does not define LIMIT at all, so it's not much help in
deciding what to do.  Is there anything in SQL3?  What do other
DBMSes do about this issue?  Comments, other variants, better ideas
anyone?

> The other thing is, I would like at some stage to change limit so
> that it is attached to a SELECT rather than an entire query so
> you could...
> SELECT * from x where y in (SELECT y from z LIMIT 10) LIMIT 20;
> and I'm not sure how this would interact with that.

Since ORDER BY is only allowed at the top level by SQL92, there
would be no way for the user to ensure predictable results from
such a query.  I think that'd be a dangerous path to go down.
However, if you had an answer that ensured consistent results from
queries with sub-LIMITs, I don't see that there'd be any problem
with allowing the optimizer to optimize 'em.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Hiroshi Inoue"
Дата:
Сообщение: RE: [HACKERS] Solution for LIMIT cost estimation
Следующее
От: Chris Bitmead
Дата:
Сообщение: Re: [HACKERS] libpq