Обсуждение: BUG #5652: Optimizer does wrong thing with partitioned tables

Поиск
Список
Период
Сортировка

BUG #5652: Optimizer does wrong thing with partitioned tables

От
"Mladen Gogala"
Дата:
The following bug has been logged online:

Bug reference:      5652
Logged by:          Mladen Gogala
Email address:      mladen.gogala@vmsinfo.com
PostgreSQL version: 8.4.4
Operating system:   Red Hat Linux 5.5, 64b
Description:        Optimizer does wrong thing with partitioned tables
Details:

Optimizer chooses to scan each partitioned table sequentially, instead of
using the available index:

news=# explain select max(created_at) from moreover_documents;
                                                      QUERY PLAN


----------------------------------------------------------------------------
----
---------------------------------------
 Aggregate  (cost=5115432.65..5115432.66 rows=1 width=8)
   ->  Append  (cost=0.00..5017318.72 rows=39245572 width=8)
         ->  Seq Scan on moreover_documents  (cost=0.00..10.20 rows=20
width=8)
         ->  Seq Scan on moreover_documents_y2010m06 moreover_documents
(cost=0
.00..236550.85 rows=1859585 width=8)
         ->  Seq Scan on moreover_documents_y2010m07 moreover_documents
(cost=0
.00..2073604.38 rows=16276938 width=8)
         ->  Seq Scan on moreover_documents_y2010m08 moreover_documents
(cost=0
.00..2022494.13 rows=15670513 width=8)
         ->  Seq Scan on moreover_documents_y2010m09 moreover_documents
(cost=0
.00..684618.36 rows=5438436 width=8)
         ->  Seq Scan on moreover_documents_y2010m10 moreover_documents
(cost=0
.00..10.20 rows=20 width=8)
         ->  Seq Scan on moreover_documents_y2010m11 moreover_documents
(cost=0
.00..10.20 rows=20 width=8)
         ->  Seq Scan on moreover_documents_y2010m12 moreover_documents
(cost=0
.00..10.20 rows=20 width=8)
         ->  Seq Scan on moreover_documents_y2011m01 moreover_documents
(cost=0
.00..10.20 rows=20 width=8)


When on single partition, the optimizer does the right thing:
news=# explain select max(created_at) from moreover_documents_y2010m09;
                                                                 QUERY PLAN


----------------------------------------------------------------------------
----
------------------------------------------------------------
 Result  (cost=0.15..0.16 rows=1 width=0)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.00..0.15 rows=1 width=8)
           ->  Index Scan Backward using mdocs_created_y2010m09 on
moreover_docu
ments_y2010m09  (cost=0.00..800757.60 rows=5438436 width=8)
                 Filter: (created_at IS NOT NULL)
(5 rows)


Index is available on each and every partition. There is only one way to
calculate the max for the entire partitioned table: to calculate max for
each partition and then select max of those. Optimizer should take into
account the available access paths for each partition, yet it doesn't do so.

Re: BUG #5652: Optimizer does wrong thing with partitioned tables

От
Euler Taveira de Oliveira
Дата:
Mladen Gogala escreveu:
> Optimizer chooses to scan each partitioned table sequentially, instead of
> using the available index:
>
This is not a bug. How would the optimizer know that the maximum value is in
that specific partition? There is neither a global index for a partitioned
table nor an optimizer artifact to know aggregate information before scanning
all of the partitions. Maybe when we have a better support for table
partitioning such optimizer artifact would be implemented but don't hold your
breath.


--
  Euler Taveira de Oliveira
  http://www.timbira.com/

Re: BUG #5652: Optimizer does wrong thing with partitioned tables

От
Mladen Gogala
Дата:
Euler Taveira de Oliveira wrote:
> Mladen Gogala escreveu:
>
>> Optimizer chooses to scan each partitioned table sequentially, instead of
>> using the available index:
>>
>>
> This is not a bug. How would the optimizer know that the maximum value is in
> that specific partition? There is neither a global index for a partitioned
> table nor an optimizer artifact to know aggregate information before scanning
> all of the partitions. Maybe when we have a better support for table
> partitioning such optimizer artifact would be implemented but don't hold your
> breath.
>
>
>
Euler, optimizer is selecting a wrong path, which is a bug by
definition. I am not asking the optimizer to know what is the maximum, I
am asking it to select the right path, something that it isn't doing. As
is visible from the original bug report, the optimizer is choosing
sequential scan instead of the index scan. When only a single partition
is involved, the index scan is selected, while for the entire table,
sequential scan is selected for every partition.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

Re: BUG #5652: Optimizer does wrong thing with partitioned tables

От
Chris Travers
Дата:
Just adding my voice to the "fix it" camp.  Is there any reason the
table scans in this sort of thing cannot be independently planned?

Best Wishes,
Chris Travers

Re: BUG #5652: Optimizer does wrong thing with partitioned tables

От
Jeff Davis
Дата:
On Fri, 2010-09-10 at 10:37 -0400, Mladen Gogala wrote:
> Euler, optimizer is selecting a wrong path, which is a bug by
> definition.

I agree that the optimizer should be improved here, but it's not really
a "bug". I think what you are requesting is considered more of a feature
to make the optimizer smarter.

The optimizer will never be so good that it always picks the best path.
Consider a 100-table join: would it be a "bug" if it didn't find the
perfect join order?

Regards,
    Jeff Davis

Re: BUG #5652: Optimizer does wrong thing with partitioned tables

От
Jeff Davis
Дата:
On Fri, 2010-09-10 at 08:10 -0700, Chris Travers wrote:
> Just adding my voice to the "fix it" camp.  Is there any reason the
> table scans in this sort of thing cannot be independently planned?

I don't think it's about independent planning. For instance, AVG clearly
can't be planned this way, there are particular properties of MAX that
allow the optimization:

  1. MAX(x) can be rewritten as: ORDER BY x DESC LIMIT 1
  2. The MAX of set S is the MAX of the MAXes of each partition of S

The optimizer knows about the former, but not the latter.

Regards,
    Jeff Davis

Re: BUG #5652: Optimizer does wrong thing with partitioned tables

От
Mladen Gogala
Дата:
Jeff Davis wrote:
> On Fri, 2010-09-10 at 08:10 -0700, Chris Travers wrote:
>
>> Just adding my voice to the "fix it" camp.  Is there any reason the
>> table scans in this sort of thing cannot be independently planned?
>>
>
> I don't think it's about independent planning. For instance, AVG clearly
> can't be planned this way, there are particular properties of MAX that
> allow the optimization:
>
>   1. MAX(x) can be rewritten as: ORDER BY x DESC LIMIT 1
>   2. The MAX of set S is the MAX of the MAXes of each partition of S
>
> The optimizer knows about the former, but not the latter.
>
> Regards,
>     Jeff Davis
>
>
Jeff, that's the problem. Functions like "MAX" are rather ordinary and
frequently used. Using sequential scan to read all partitions is the
wrong thing to do. I agree that AVG() cannot be computed using index but
MAX() and MIN() can. I will send you personally 2 versions of a script
that I am still writing, just to see to what extent do I go to get the
necessary performance. Optimizer definitely needs fixes when it comes to
partitions.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

Re: BUG #5652: Optimizer does wrong thing with partitioned tables

От
Jeff Davis
Дата:
On Fri, 2010-09-10 at 16:53 -0400, Mladen Gogala wrote:
> Jeff, that's the problem. Functions like "MAX" are rather ordinary and
> frequently used.

I agree that it could be improved. The best way to move such improvement
forward is to advance the discussion on -hackers. Start with this thread
here:

    http://archives.postgresql.org/pgsql-hackers/2009-07/msg01115.php

and see if you have anything to add, or see if you can find related
discussions. The main point is that -bugs is usually for short
discussions about fixing defects that were just discovered, and -hackers
is better for problems without an obvious solution, where discussion is
required.

Regards,
    Jeff Davis

Re: BUG #5652: Optimizer does wrong thing with partitioned tables

От
Chris Travers
Дата:
On Fri, Sep 10, 2010 at 1:53 PM, Mladen Gogala <mgogala@vmsinfo.com> wrote:

> Jeff, that's the problem. Functions like "MAX" are rather ordinary and
> frequently used. Using sequential scan to read all partitions is the wrong
> thing to do. I agree that AVG() cannot be computed using index but MAX() and
> MIN() can. I will send you personally 2 versions of a script that I am still
> writing, just to see to what extent do I go to get the necessary
> performance. Optimizer definitely needs fixes when it comes to partitions.
>

IIRC, the planner already has been tweaked to allow index scans on MAX
for single tables.  This of course did not happen within a stable
branch.

The question over whether this is a "bug" or a "feature" depends to a
large extent on how one defines a bug.  I would be inclined to call
this a "bug" for discussion purposes since it causes the planner to
make plan choices that are well known to be problematic in these
cases, but it I would not be in favor of correcting this in a stable
branch.  My reading of the change log is that it is rare that changes
to long-standing behavior in general, and particularly for the
optimizer, occur within a stable branch.  Given that this is
long-standing behavior, I think it is worth accepting that it is not a
"bug" we might want fixed within a stable release.

I agree with the suggestion that a discussion start on -hackers.  I
still think it is a problem that should be fixed.  Just not in a
stable branch, esp. because this has a reasonable workaround (changing
to an order by... limit 1).

I guess what I am trying to suggest here is that "bug" and "feature"
are not distinct categories which have no overlap.  Where software,
like an RDBMS, is mission-critical, I think it is a good practice to
do what the Pg developers do and avoid making unnecessary changes
within a stable release.  This means that some "bugs" should be
treated as "features" where the behavior is longstanding, a workaround
is possible, and the fix likely to involve changes to important
components.

Best Wishes,
Chris Travers

Re: BUG #5652: Optimizer does wrong thing with partitioned tables

От
Mladen Gogala
Дата:
Jeff Davis wrote:
> On Fri, 2010-09-10 at 16:53 -0400, Mladen Gogala wrote:
>
>> Jeff, that's the problem. Functions like "MAX" are rather ordinary and
>> frequently used.
>>
>
> I agree that it could be improved. The best way to move such improvement
> forward is to advance the discussion on -hackers. Start with this thread
> here:
>
>     http://archives.postgresql.org/pgsql-hackers/2009-07/msg01115.php
>
> and see if you have anything to add, or see if you can find related
> discussions. The main point is that -bugs is usually for short
> discussions about fixing defects that were just discovered, and -hackers
> is better for problems without an obvious solution, where discussion is
> required.
>
> Regards,
>     Jeff Davis
>
>
Thanks, I will do that.  However, I cannot call myself a hacker, I am
but a mere beginner with Postgres.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com

Re: BUG #5652: Optimizer does wrong thing with partitioned tables

От
David Fetter
Дата:
On Fri, Sep 10, 2010 at 11:34:16PM -0400, Mladen Gogala wrote:
> Jeff Davis wrote:
> >On Fri, 2010-09-10 at 16:53 -0400, Mladen Gogala wrote:
> >>Jeff, that's the problem. Functions like "MAX" are rather ordinary
> >>and frequently used.
> >
> >I agree that it could be improved. The best way to move such
> >improvement forward is to advance the discussion on -hackers. Start
> >with this thread here:
> >
> >    http://archives.postgresql.org/pgsql-hackers/2009-07/msg01115.php
> >
> >and see if you have anything to add, or see if you can find related
> >discussions. The main point is that -bugs is usually for short
> >discussions about fixing defects that were just discovered, and
> >-hackers is better for problems without an obvious solution, where
> >discussion is required.
> >
> Thanks, I will do that.  However, I cannot call myself a hacker, I
> am but a mere beginner with Postgres.

You're a hacker when you start hacking :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: BUG #5652: Optimizer does wrong thing with partitioned tables

От
Robert Haas
Дата:
On Fri, Sep 10, 2010 at 9:31 AM, Euler Taveira de Oliveira
<euler@timbira.com> wrote:
> Mladen Gogala escreveu:
>> Optimizer chooses to scan each partitioned table sequentially, instead of
>> using the available index:
>>
> This is not a bug. How would the optimizer know that the maximum value is in
> that specific partition? There is neither a global index for a partitioned
> table nor an optimizer artifact to know aggregate information before scanning
> all of the partitions. Maybe when we have a better support for table
> partitioning such optimizer artifact would be implemented but don't hold your
> breath.

I wonder if Merge Append could be made to help with this case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company