Обсуждение: slow plan for min/max

Поиск
Список
Период
Сортировка

slow plan for min/max

От
Pailloncy Jean-Gérard
Дата:
I have:
> psql (PostgreSQL) 7.3.2
I do a modification of 'access/index/indexam.c' where I comment:
> #ifdef NOT_USED
>         if (scan->keys_are_unique && scan->got_tuple)
>         {
>                 if (ScanDirectionIsForward(direction))
>                 {
>                         if (scan->unique_tuple_pos <= 0)
>                                 scan->unique_tuple_pos++;
>                 }
>                 else if (ScanDirectionIsBackward(direction))
>                 {
>                         if (scan->unique_tuple_pos >= 0)
>                                 scan->unique_tuple_pos--;
>                 }
>                 if (scan->unique_tuple_pos == 0)
>                         return heapTuple;
>                 else
>                         return NULL;
>         }
> #endif
I do not remember the references of the bug.
But the solution was planned for 7.4.

I do:
> psql=# \di
> [skip]
>  public | url_next_index_time      | index | postgresql | url
>  [skip]
> (11 rows)
I have an index on next_index_time field on table url.

> psql=# explain select min(next_index_time) from url \g
>                             QUERY PLAN
> -------------------------------------------------------------------
>  Aggregate  (cost=85157.70..85157.70 rows=1 width=4)
>    ->  Seq Scan on url  (cost=0.00..80975.56 rows=1672856 width=4)
> (2 rows)
Silly SeqScan of all the table.

> psql=# explain SELECT next_index_time FROM url ORDER BY
> next_index_time LIMIT 1 \g
>                                            QUERY PLAN
> -----------------------------------------------------------------------
> -------------------------
>  Limit  (cost=0.00..0.20 rows=1 width=4)
>    ->  Index Scan using url_next_index_time on url
> (cost=0.00..340431.47 rows=1672856 width=4)
> (2 rows)
I ask for the same thing.
That's better !

Why the planner does that ?

Jean-Gérard Pailloncy
Paris, France


Re: slow plan for min/max

От
"scott.marlowe"
Дата:
On Sun, 7 Sep 2003, Pailloncy Jean-Gérard wrote:

Asking a question about why max(id) is so much slower than select id order
by id desc limit 1, Pailloncy said:

> I ask for the same thing.
> That's better !

This is a Frequently asked question about something that isn't likely to
change any time soon.

Basically, Postgresql uses an MVCC locking system that makes massively
parallel operation possible, but costs in certain areas, and one of those
areas is aggregate performance over large sets.  MVCC makes it very hard
to optimize all but the simplest of aggregates, and even those
optimzations which are possible would wind up being quite ugly at the
parser level.

You might want to search the archives in the last couple years for this
subject, as it's come up quite often.


Re: slow plan for min/max

От
Neil Conway
Дата:
On Mon, 2003-09-08 at 11:56, scott.marlowe wrote:
> Basically, Postgresql uses an MVCC locking system that makes massively
> parallel operation possible, but costs in certain areas, and one of those
> areas is aggregate performance over large sets.  MVCC makes it very hard
> to optimize all but the simplest of aggregates, and even those
> optimzations which are possible would wind up being quite ugly at the
> parser level.

As was pointed out in a thread a couple days ago, MIN/MAX() optimization
has absolutely nothing to do with MVCC. It does, however, make
optimizing COUNT() more difficult.

-Neil



Re: slow plan for min/max

От
Christopher Browne
Дата:
scott.marlowe@ihs.com ("scott.marlowe") writes:
> On Sun, 7 Sep 2003, Pailloncy Jean-Gérard wrote:
>
> Asking a question about why max(id) is so much slower than select id order
> by id desc limit 1, Pailloncy said:
>
>> I ask for the same thing.  That's better !
>
> This is a Frequently asked question about something that isn't
> likely to change any time soon.
>
> Basically, Postgresql uses an MVCC locking system that makes
> massively parallel operation possible, but costs in certain areas,
> and one of those areas is aggregate performance over large sets.
> MVCC makes it very hard to optimize all but the simplest of
> aggregates, and even those optimzations which are possible would
> wind up being quite ugly at the parser level.

MVCC makes it difficult to optimize aggregates resembling COUNT(*) or
SUM(*), at least vis-a-vis having this available for a whole table
(e.g. - you have to be doing 'SELECT COUNT(*), SUM(SOMEFIELD) FROM
THIS_TABLE' with NO "WHERE" clause).

But there is nothing about MVCC that makes it particularly difficult
to handle the transformation:

 select max(field) from some_table where another_field <
    still_another_field;

   (which isn't particularly efficient) into

 select field from some_table where another_field <
    still_another_field order by field desc limit 1;

The problems observed are thus:

 1.  If the query asks for other data, it might be necessary to scan
     the table to get the other data, making the optimization
     irrelevant;

 2.  If there's a good index to key on, the transformed version might
     be a bunch quicker, but it is nontrivial to determine that, a
     priori;

 3.  It would be a fairly hairy optimization to throw into the query
     optimizer, so people are reluctant to try to do so.

Note that MVCC has _nothing_ to do with any of those three problems.

The MVCC-related point is that there is reluctance to create some
special case that will be troublesome to maintain instead of having
some comprehensive handling of _all_ aggregates.  It seems a better
idea to "fix them all" rather than to kludge things up by fixing one
after another.
--
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/lisp.html
Signs of a Klingon Programmer -  10. "A TRUE  Klingon Warrior does not
comment his code!"

Re: slow plan for min/max

От
"scott.marlowe"
Дата:
On Mon, 8 Sep 2003, Neil Conway wrote:

> On Mon, 2003-09-08 at 11:56, scott.marlowe wrote:
> > Basically, Postgresql uses an MVCC locking system that makes massively
> > parallel operation possible, but costs in certain areas, and one of those
> > areas is aggregate performance over large sets.  MVCC makes it very hard
> > to optimize all but the simplest of aggregates, and even those
> > optimzations which are possible would wind up being quite ugly at the
> > parser level.
>
> As was pointed out in a thread a couple days ago, MIN/MAX() optimization
> has absolutely nothing to do with MVCC. It does, however, make
> optimizing COUNT() more difficult.

Not exactly.  While max(id) is easily optimized by query replacement,
more complex aggregates will still have perfomance issues that would not
be present in a row locking database.  i.e. max((field1/field2)*field3) is
still going to cost more to process, isn't it?



Re: slow plan for min/max

От
Tom Lane
Дата:
"scott.marlowe" <scott.marlowe@ihs.com> writes:
> On Mon, 8 Sep 2003, Neil Conway wrote:
>> As was pointed out in a thread a couple days ago, MIN/MAX() optimization
>> has absolutely nothing to do with MVCC. It does, however, make
>> optimizing COUNT() more difficult.

> Not exactly.  While max(id) is easily optimized by query replacement,
> more complex aggregates will still have perfomance issues that would not
> be present in a row locking database.  i.e. max((field1/field2)*field3) is
> still going to cost more to process, isn't it?

Er, what makes you think that would be cheap in any database?

Postgres would actually have an advantage given its support for
expressional indexes (nee functional indexes).  If we had an optimizer
transform to convert MAX() into an index scan, I would expect it to be
able to match up max((field1/field2)*field3) with an index on
((field1/field2)*field3).

            regards, tom lane

Re: slow plan for min/max

От
Christopher Browne
Дата:
After takin a swig o' Arrakan spice grog, scott.marlowe@ihs.com ("scott.marlowe") belched out...:
> On Mon, 8 Sep 2003, Neil Conway wrote:
>> On Mon, 2003-09-08 at 11:56, scott.marlowe wrote:
>> > Basically, Postgresql uses an MVCC locking system that makes massively
>> > parallel operation possible, but costs in certain areas, and one of those
>> > areas is aggregate performance over large sets.  MVCC makes it very hard
>> > to optimize all but the simplest of aggregates, and even those
>> > optimzations which are possible would wind up being quite ugly at the
>> > parser level.
>>
>> As was pointed out in a thread a couple days ago, MIN/MAX() optimization
>> has absolutely nothing to do with MVCC. It does, however, make
>> optimizing COUNT() more difficult.
>
> Not exactly.  While max(id) is easily optimized by query replacement,
> more complex aggregates will still have perfomance issues that would not
> be present in a row locking database.  i.e. max((field1/field2)*field3) is
> still going to cost more to process, isn't it?

That sort of MAX() would be difficult to optimize in almost any case,
and would mandate doing a scan across the relevant portion of the
table...

... Unless you had a functional index on (field1/field2)*field3, in
which case it might well be that this would cost Still Less.

I still can't fathom what this has to do with MVCC; you have yet to
actually connect it with that...
--
let name="cbbrowne" and tld="ntlug.org" in String.concat "@" [name;tld];;
http://www3.sympatico.ca/cbbrowne/lsf.html
"Cars  move  huge weights at   high    speeds by controlling   violent
explosions many times a second.  ...car analogies are always fatal..."
-- <westprog@my-dejanews.com>

Re: slow plan for min/max

От
Greg Stark
Дата:
"scott.marlowe" <scott.marlowe@ihs.com> writes:

> Basically, Postgresql uses an MVCC locking system that makes massively

As discussed, uh, a few days ago, this particular problem is not caused by
MVCC but by postgres having a general purpose aggregate system and not having
special code for handling min/max. Aggregates normally require access to every
record they're operating on, not just the first or last in some particular
order. You'll note the LIMIT 1/DISTINCT ON work-around works fine with MVCC...

--
greg

Re: slow plan for min/max

От
Josh Berkus
Дата:
Scott,

> Not exactly.  While max(id) is easily optimized by query replacement,
> more complex aggregates will still have perfomance issues that would not
> be present in a row locking database.  i.e. max((field1/field2)*field3) is
> still going to cost more to process, isn't it?

Sorry, no.

The issue has nothing to do with MVCC.   It has everything to do with the fact
that PostgreSQL allows you to create your own aggregates using functions in
any of 11 languages.   This forces the planner to treat aggregates as a
"black box" which does not allow index utilization, because the planner
simply doesn't know what the aggregate is doing internally.

To put it another way, the planner sees SUM() or CONCAT() -- which require
table scans as they must include all values -- as identical to MAX() and
MIN().

Escaping this would require programming a special exception for MAX() and
MIN() into the planner and parser.   This has been discussed numerous times
on HACKERS; the problem is, making special exceptions for MAX() and MIN()
would then make it very difficult to implement MAX() or MIN() for new data
types, as well as requiring a lot of debugging in numerous places.   So far,
nobody has been frustrated enough to spend 3 months tackling the problem.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: slow plan for min/max

От
"Matt Clark"
Дата:
>This is a Frequently asked question about something that isn't likely to
>change any time soon.

You're right, it is in the FAQ, but pretty well buried.  It is entirely
non-obvious to most people that min() and max() don't/can't use indices.
Something so counterintuitive should be explicitly and prominently
advertised, especially since the "order by X limit 1" workaround is so
simple.

Actually, referring down to later parts of this thread, why can't this
optimisation be performed internally for built-in types?  I understand the
issue with aggregates over user-defined types, but surely optimising max()
for int4, text, etc is safe and easy?

Of course I may be so far out of my depth as to be drowning, in which case
please put me out of my misery.

M


Re: slow plan for min/max

От
"Matt Clark"
Дата:
> Actually, referring down to later parts of this thread, why can't this
> optimisation be performed internally for built-in types?  I understand the
> issue with aggregates over user-defined types, but surely optimising max()
> for int4, text, etc is safe and easy?

Sorry, missed the bit about user-defined functions.  So I should have said
built-in functions operating over built-in types.  Which does sound more
complicated, but anyone redefining max() is surely not in a position to seek
sympathy if they lose performance?


Re: slow plan for min/max

От
Tom Lane
Дата:
"Matt Clark" <matt@ymogen.net> writes:
> Actually, referring down to later parts of this thread, why can't this
> optimisation be performed internally for built-in types?  I understand the
> issue with aggregates over user-defined types, but surely optimising max()
> for int4, text, etc is safe and easy?

I can't see that the datatype involved has anything to do with it.
None of the issues that come up in making the planner do this are
datatype-specific.  You could possibly avoid adding some columns
to pg_aggregate if you instead hard-wired the equivalent knowledge
(for builtin types only) into some code somewhere, but a patch that
approached it that way would be rejected as unmaintainable.

            regards, tom lane

Re: slow plan for min/max

От
"Matt Clark"
Дата:
> "Matt Clark" <matt@ymogen.net> writes:
> > Actually, referring down to later parts of this thread, why can't this
> > optimisation be performed internally for built-in types?  I
> understand the
> > issue with aggregates over user-defined types, but surely
> optimising max()
> > for int4, text, etc is safe and easy?
>
> I can't see that the datatype involved has anything to do with it.
> None of the issues that come up in making the planner do this are
> datatype-specific.  You could possibly avoid adding some columns
> to pg_aggregate if you instead hard-wired the equivalent knowledge
> (for builtin types only) into some code somewhere, but a patch that
> approached it that way would be rejected as unmaintainable.

I don't pretend to have any useful knowledge of the internals of this, so
much of what I write may seem like noise to you guys.  The naive question is
'I have an index on X, so finding max(X) should be trivial, so why can't the
planner exploit that triviality?'.  AFAICS the short sophisticated answer is
that it just isn't trivial in the general case.

Upon rereading the docs on aggregates I see that it really isn't trivial at
all.  Not even knowing things like 'this index uses the same function as
this aggregate' gets you very far, because of the very general nature of the
implementation of aggs.

So it should be flagged very prominently in the docs that max() and min()
are almost always not what 90% of people want to use 90% of the time,
because indexes do the same job much better for anything other than tiny
tables.

Know what we (OK, I) need?  An explicitly non-aggregate max() and min(),
implemented differently, so they can be optimised.  let's call them
idx_max() and idx_min(), which completely bypass the standard aggregate
code.  Because let's face it, in most cases where you regularly want a max
or a min you have an index defined, and you want the DB to use it.

And I would volunteer to do it, I would, but you really don't want my C in
your project ;-)  I do volunteer to do some doc tweaking though - who do I
talk to?

M


Re: slow plan for min/max

От
Tom Lane
Дата:
"Matt Clark" <matt@ymogen.net> writes:
> Know what we (OK, I) need?  An explicitly non-aggregate max() and min(),
> implemented differently, so they can be optimised.

Not per se.  The way I've been visualizing this is that we add to
pg_aggregate a column named, say, aggsortop, with the definition:

    zero: no index optimization possible
    not zero: OID of a comparison operator ('<' or '>')

A nonzero entry means that the aggregate's value is the same as the
first item of the aggregate's input when sorted by the given operator.
(So MIN uses the '<' operator for its datatype and MAX uses '>'.)
Of course we have to add a clause to CREATE AGGREGATE to allow this to
be set for max/min aggregates of user-defined types.  But that's just a
small matter of programming.  This gives us all the type-specific info
we need; the aggsortop can be matched against the opclasses of indexes
to figure out whether a particular index is relevant to a particular max
or min call.

The hard part comes in teaching the planner to use this information
intelligently.  Exactly which queries is it even *possible* to use the
transformation for?  Which queries is it really a win for?  (Obviously
it's not if there's no matching index, but even if there is, the
presence of WHERE or GROUP BY clauses has got to affect the answer.)
How do you structure the resulting query plan, if it's at all complex
(think multiple aggregate calls...)?  I'm not clear on the answers to
any of those questions, so I'm not volunteering to try to code it up ...

            regards, tom lane

Re: slow plan for min/max

От
Pailloncy Jean-Gérard
Дата:
I did not expect so many answers about this question.
Thanks.

I find by myself the "order by trick" to speed min/max function.

Jean-Gérard Pailloncy


Re: slow plan for min/max

От
Greg Stark
Дата:
The only connection to MVCC is that the "obvious" solution doesn't work,
namely storing a cache of the aggregate in the table information.

So what would it take to implement this for "all" aggregates? Where I think
"all" really just means min(), max(), first(), last().

I think it would mean having a way to declare when defining an aggregate that
only specific records are necessary. For first() and last() it would only have
to indicate in some way that only the first or last record of the grouping was
necessary in the pre-existing order.

For min() and max() it would have to indicate not only that only the first or
last record is necessary but also the sort order to impose.

Then if the optimizer determines that all the aggregates used either impose no
sort order or impose compatible sort orders, then it should insert an extra
sort step before the grouping, and flag the executor to indicate it should do
DISTINCT ON type behaviour to skip unneeded records.

Now the problem I see is if there's no index on the sort order imposed, and
the previous step wasn't a merge join or something else that would return the
records in order then it's not necessarily any faster to sort the records and
return only some. It might be for small numbers of records, but it might be
faster to just read them all in and check each one for min/max the linear way.

--
greg

Re: slow plan for min/max

От
Josh Berkus
Дата:
Greg,

> The only connection to MVCC is that the "obvious" solution doesn't work,
> namely storing a cache of the aggregate in the table information.

Well, that solution also doesn't work if you use a WHERE condition or JOIN,
now does it?

> So what would it take to implement this for "all" aggregates? Where I think
> "all" really just means min(), max(), first(), last().

Um, what the heck are first() and last()?   These are not supported aggregates
... table rows are *not* ordered.

> For min() and max() it would have to indicate not only that only the first
> or last record is necessary but also the sort order to impose.

I think Tom already suggested this based on adding a field to CREATE
AGGREGATE.  But I think implementation isn't as simple as you think it is.

> Now the problem I see is if there's no index on the sort order imposed, and
> the previous step wasn't a merge join or something else that would return
> the records in order then it's not necessarily any faster to sort the
> records and return only some. It might be for small numbers of records, but
> it might be faster to just read them all in and check each one for min/max
> the linear way.

Yes, Tom mentioned this also.  Working out the rules whereby the planner could
decide the viability of index use is a non-trivial task.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: slow plan for min/max

От
Bruno Wolff III
Дата:
On Tue, Sep 09, 2003 at 12:54:04 -0400,
  Greg Stark <gsstark@mit.edu> wrote:
>
> So what would it take to implement this for "all" aggregates? Where I think
> "all" really just means min(), max(), first(), last().

There can be other aggregates where indexes are helpful. The case of interest
is when functions such that if the new item is contains the current value
of the aggregate then the new value of the aggregate with be that of the
current item. This allows you to skip looking at all of the other items
contained in the current item. Dual problems can also benefit in a similar
manner. In a case where the set is totally ordered by the contains index
(as is the case or max and min) then the problem is even simpler and you
can use the greatest or least element as appropiate.

Re: slow plan for min/max

От
Thomas Swan
Дата:
Tom Lane wrote:

>"scott.marlowe" <scott.marlowe@ihs.com> writes:
>
>
>>On Mon, 8 Sep 2003, Neil Conway wrote:
>>
>>
>>>As was pointed out in a thread a couple days ago, MIN/MAX() optimization
>>>has absolutely nothing to do with MVCC. It does, however, make
>>>optimizing COUNT() more difficult.
>>>
>>>
>
>
>
>>Not exactly.  While max(id) is easily optimized by query replacement,
>>more complex aggregates will still have perfomance issues that would not
>>be present in a row locking database.  i.e. max((field1/field2)*field3) is
>>still going to cost more to process, isn't it?
>>
>>
>
>Er, what makes you think that would be cheap in any database?
>
>Postgres would actually have an advantage given its support for
>expressional indexes (nee functional indexes).  If we had an optimizer
>transform to convert MAX() into an index scan, I would expect it to be
>able to match up max((field1/field2)*field3) with an index on
>((field1/field2)*field3).
>
>

Would it be possible to rewrite min and max at the parser level into a
select/subselect  (clause) condition ( repeat condition ) order by
(clause ) descending/ascending limit 1 and thereby avoiding the
penalties of altering the default aggregate behavior?  Would it yield
anything beneficial?



Re: slow plan for min/max

От
Bruno Wolff III
Дата:
On Tue, Sep 09, 2003 at 14:06:56 -0500,
  Thomas Swan <tswan@idigx.com> wrote:
>
> Would it be possible to rewrite min and max at the parser level into a
> select/subselect  (clause) condition ( repeat condition ) order by
> (clause ) descending/ascending limit 1 and thereby avoiding the
> penalties of altering the default aggregate behavior?  Would it yield
> anything beneficial?

That isn't always going to be the best way to do the calculation. If there
are other aggregates or if the groups are small, doing things the normal
way (and hash aggregates in 7.4 will help) can be faster.

Re: slow plan for min/max

От
"Matt Clark"
Дата:
> > Know what we (OK, I) need?  An explicitly non-aggregate max() and min(),
> > implemented differently, so they can be optimised.
>
> Not per se.  The way I've been visualizing this is that we add to
> pg_aggregate a column named, say, aggsortop, with the definition:
...snip of cunning potentially geralisable plan...
> How do you structure the resulting query plan, if it's at all complex
> (think multiple aggregate calls...)?  I'm not clear on the answers to
> any of those questions, so I'm not volunteering to try to code it up ...

So, you're not going to code it, I'm not going to code it, I doubt anyone
else is soon.

The issue is going to remain then, that max() and min() are implemented in a
way that is grossly counterintuitively slow for 99% of uses.  It's not bad,
or wrong, just a consequence of many higher level factors.  This should
therefore be very prominently flagged in the docs until there is either a
general or specific solution.

FYI I have rewritten 4 queries today to work around this (with nice
performance benefits) as a result of this thread.  Yeah, I should have
spotted the _silly_ seq scans beforehand, but if you're not looking, you
don't tend to see.  Best improvement is 325msec to 0.60msec!

I'm happy to do the doc work.

M