Обсуждение: Limit & offset effect on query plans

Поиск
Список
Период
Сортировка

Limit & offset effect on query plans

От
Amitabh Kant
Дата:
Hi

Our scripts automatically add "LIMIT ALL" & "OFFSET 0" to every select query if no values are passed on for these parameters. I  remember reading through the mailing list that it's better not to pass them if they are not needed as they add a cost to the query plan. Is this the case, or am i looking at a very minor optimization.


Amitabh

P.S. I haven't checked my query plans to see if there are any actual effect of these keywords as I am still working my way through reading the output of "Explain" ouput.






Re: Limit & offset effect on query plans

От
Pavan Deolasee
Дата:
On Thu, Dec 13, 2012 at 9:38 AM, Amitabh Kant <amitabhkant@gmail.com> wrote:
> Hi
>
> Our scripts automatically add "LIMIT ALL" & "OFFSET 0" to every select query
> if no values are passed on for these parameters. I  remember reading through
> the mailing list that it's better not to pass them if they are not needed as
> they add a cost to the query plan. Is this the case, or am i looking at a
> very minor optimization.
>

I would tend to think that is the latter. While undoubtedly
limit/offset clause will add another node during query planning and
execution, AFAICS the OFFSET 0 and LIMIT ALL cases are optimized to a
good extent. So the overhead of having them will not be significant.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: Limit & offset effect on query plans

От
"Kevin Grittner"
Дата:
Pavan Deolasee wrote:
> Amitabh Kant <amitabhkant@gmail.com> wrote:

>> Our scripts automatically add "LIMIT ALL" & "OFFSET 0" to every
>> select query if no values are passed on for these parameters. I
>> remember reading through the mailing list that it's better not
>> to pass them if they are not needed as they add a cost to the
>> query plan. Is this the case, or am i looking at a very minor
>> optimization.
>>
>
> I would tend to think that is the latter. While undoubtedly
> limit/offset clause will add another node during query planning
> and execution, AFAICS the OFFSET 0 and LIMIT ALL cases are
> optimized to a good extent. So the overhead of having them will
> not be significant.

I ran some quick tests on my i7 under Linux. Plan time was
increased by about 40 microseconds (based on EXPLAIN runtime) and
added a limit node to the plan. Execution time on a SELECT * FROM
tenk1 in the regression database went up by 1.35 ms on fully cached
runs.

-Kevin


Re: Limit & offset effect on query plans

От
Tom Lane
Дата:
"Kevin Grittner" <kgrittn@mail.com> writes:
> Pavan Deolasee wrote:
>> I would tend to think that is the latter. While undoubtedly
>> limit/offset clause will add another node during query planning
>> and execution, AFAICS the OFFSET 0 and LIMIT ALL cases are
>> optimized to a good extent. So the overhead of having them will
>> not be significant.

> I ran some quick tests on my i7 under Linux. Plan time was
> increased by about 40 microseconds (based on EXPLAIN runtime) and
> added a limit node to the plan. Execution time on a SELECT * FROM
> tenk1 in the regression database went up by 1.35 ms on fully cached
> runs.

1.35ms out of what?

FWIW, I've been considering teaching the planner to not bother with
an actual Limit plan node if the limit clause is an obvious no-op.
I wasn't thinking about applications that blindly insert such clauses,
but rather about not penalizing subqueries when someone uses one of
these as an optimization fence.  (The clauses would still work as an
opt fence, you'd just not see any Limit node in the final plan.)

            regards, tom lane


Re: Limit & offset effect on query plans

От
"Kevin Grittner"
Дата:
Tom Lane wrote:
> "Kevin Grittner" <kgrittn@mail.com> writes:

>> I ran some quick tests on my i7 under Linux. Plan time was
>> increased by about 40 microseconds (based on EXPLAIN runtime)
>> and added a limit node to the plan. Execution time on a SELECT *
>> FROM tenk1 in the regression database went up by 1.35 ms on
>> fully cached runs.
>
> 1.35ms out of what?

Without the limit node the runtimes (after "priming" the cache)
were:

1.805, 2.533
1.805, 2.495
1.800, 2.446
1.818, 2.470
1.804, 2.502

The first time for each run is "Total runtime" reported by EXPLAIN,
the second is what psql reported from having \timing on.

With the limit node:

3.237, 3.914
3.243, 3.918
3.263, 4.010
3.265, 3.943
3.272, 3.953

I eyeballed that in the console window and said 1.35 based on rough
in-my-head calculations, although with it laid out in a nicer
format, I think I was a little low.

-Kevin


Re: Limit & offset effect on query plans

От
Tom Lane
Дата:
"Kevin Grittner" <kgrittn@mail.com> writes:
> Tom Lane wrote:
>> 1.35ms out of what?

> Without the limit node the runtimes (after "priming" the cache)
> were:

> 1.805, 2.533
> 1.805, 2.495
> 1.800, 2.446
> 1.818, 2.470
> 1.804, 2.502

> The first time for each run is "Total runtime" reported by EXPLAIN,
> the second is what psql reported from having \timing on.

> With the limit node:

> 3.237, 3.914
> 3.243, 3.918
> 3.263, 4.010
> 3.265, 3.943
> 3.272, 3.953

> I eyeballed that in the console window and said 1.35 based on rough
> in-my-head calculations, although with it laid out in a nicer
> format, I think I was a little low.

Huh, so on a percentage basis the Limit-node overhead is actually pretty
significant, at least for a trivial seqscan plan like this case.
(This is probably about the worst-case scenario, really, since it's
tough to beat a simple seqscan for cost-per-emitted-row.  Also I gather
you're not actually transmitting any data to the client ...)

            regards, tom lane


Re: Limit & offset effect on query plans

От
"Kevin Grittner"
Дата:
Tom Lane wrote:

> Huh, so on a percentage basis the Limit-node overhead is actually
> pretty significant, at least for a trivial seqscan plan like this
> case. (This is probably about the worst-case scenario, really,
> since it's tough to beat a simple seqscan for cost-per-emitted-
> row. Also I gather you're not actually transmitting any data to
> the client ...)

Right, I was trying to isolate the cost, and in a more complex
query, or with results streaming back, that could easily be lost in
the noise. Assuming that the setup time for the node is trivial
compared to filtering 10,000 rows, the time per row which passes
through the limit node seems to be (very roughly) 140 nanoseconds
on an i7. I don't know whether that will vary based on the number
or types of columns.

I just tried with returning the results rather than running EXPLAIN
ANALYZE, and any difference was lost in the noise with only five
samples each way. I wonder how much of the difference with EXPLAIN
ANALYZE might have been from the additional time checking. Maybe on
a normal run the difference would be less significant.

-Kevin