Обсуждение: Partitions and the optimizer.

Поиск
Список
Период
Сортировка

Partitions and the optimizer.

От
Mladen Gogala
Дата:
Optimizer doesn't behave well when the tables are partitioned. Here is
the problem:

news=# explain
select max("document#") from moreover_documents_y2010m08;

QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.15..0.16 rows=1 width=0)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.00..0.15 rows=1 width=8)
           ->  Index Scan Backward using pk_moreover_documents_y2010m08
on moreover_documents_y2010m08  (cost=0.00..2169871.61 rows=14615132
width=8)
                 Filter: ("document#" IS NOT NULL)
(5 rows)

Time: 31.191 ms
news=# explain
select max("document#") from moreover_documents;
                                                      QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=4227019.40..4227019.41 rows=1 width=8)
   ->  Append  (cost=0.00..4145103.32 rows=32766432 width=8)
         ->  Seq Scan on moreover_documents  (cost=0.00..10.20 rows=20
width=8)
         ->  Seq Scan on moreover_documents_y2010m06 moreover_documents
(cost=0.00..236523.53 rows=1856853 width=8)
         ->  Seq Scan on moreover_documents_y2010m07 moreover_documents
(cost=0.00..2073778.27 rows=16294327 width=8)
         ->  Seq Scan on moreover_documents_y2010m08 moreover_documents
(cost=0.00..1834740.32 rows=14615132 width=8)
         ->  Seq Scan on moreover_documents_y2010m09 moreover_documents
(cost=0.00..10.20 rows=20 width=8)
         ->  Seq Scan on moreover_documents_y2010m10 moreover_documents
(cost=0.00..10.20 rows=20 width=8)
         ->  Seq Scan on moreover_documents_y2010m11 moreover_documents
(cost=0.00..10.20 rows=20 width=8)
         ->  Seq Scan on moreover_documents_y2010m12 moreover_documents
(cost=0.00..10.20 rows=20 width=8)
         ->  Seq Scan on moreover_documents_y2011m01 moreover_documents
(cost=0.00..10.20 rows=20 width=8)
(11 rows)

Time: 31.961 ms
news=#

I have several partitions for the table, 3 of them are loaded. When I
search for the maximum value of the primary key column on a single
partition, the access path is as it should be, using the PK index. Each
of the partitions has the same PK. When I try selecting the maximum on
the entire table, the indexes on the partitions are not used, Postgres
does the full scan instead. The difference in timing is drastic:
news=# select max("document#") from moreover_documents_y2010m08;
    max
------------
 1175107508
(1 row)

Time: 56.778 ms
news=# select max("document#") from moreover_documents;
    max
------------
 1175107508
(1 row)

Time: 200490.228 ms
news=#

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




Re: Partitions and the optimizer.

От
Jayadevan M
Дата:
Hello,
> Optimizer doesn't behave well when the tables are partitioned. Here is
> the problem:
>
> news=# explain
> select max("document#") from moreover_documents_y2010m08;
>
> QUERY
> PLAN
>

------------------------------------------------------------------------------------------------------------------------------------------------------
>  Result  (cost=0.15..0.16 rows=1 width=0)
>    InitPlan 1 (returns $0)
>      ->  Limit  (cost=0.00..0.15 rows=1 width=8)
>            ->  Index Scan Backward using pk_moreover_documents_y2010m08
> on moreover_documents_y2010m08  (cost=0.00..2169871.61 rows=14615132
> width=8)
>                  Filter: ("document#" IS NOT NULL)
> (5 rows)
>
> Time: 31.191 ms
> news=# explain
> select max("document#") from moreover_documents;
>                                                       QUERY
> PLAN
>
-----------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=4227019.40..4227019.41 rows=1 width=8)
>    ->  Append  (cost=0.00..4145103.32 rows=32766432 width=8)
>          ->  Seq Scan on moreover_documents  (cost=0.00..10.20 rows=20
> width=8)
>          ->  Seq Scan on moreover_documents_y2010m06 moreover_documents
> (cost=0.00..236523.53 rows=1856853 width=8)
>          ->  Seq Scan on moreover_documents_y2010m07 moreover_documents
> (cost=0.00..2073778.27 rows=16294327 width=8)
>          ->  Seq Scan on moreover_documents_y2010m08 moreover_documents
> (cost=0.00..1834740.32 rows=14615132 width=8)
>          ->  Seq Scan on moreover_documents_y2010m09 moreover_documents
> (cost=0.00..10.20 rows=20 width=8)
>          ->  Seq Scan on moreover_documents_y2010m10 moreover_documents
> (cost=0.00..10.20 rows=20 width=8)
>          ->  Seq Scan on moreover_documents_y2010m11 moreover_documents
> (cost=0.00..10.20 rows=20 width=8)
>          ->  Seq Scan on moreover_documents_y2010m12 moreover_documents
> (cost=0.00..10.20 rows=20 width=8)
>          ->  Seq Scan on moreover_documents_y2011m01 moreover_documents
> (cost=0.00..10.20 rows=20 width=8)
> (11 rows)
>
> Time: 31.961 ms
> news=#
>
> I have several partitions for the table, 3 of them are loaded. When I
> search for the maximum value of the primary key column on a single
> partition, the access path is as it should be, using the PK index. Each
> of the partitions has the same PK. When I try selecting the maximum on
> the entire table, the indexes on the partitions are not used, Postgres
> does the full scan instead. The difference in timing is drastic:
> news=# select max("document#") from moreover_documents_y2010m08;
>     max
> ------------
>  1175107508
> (1 row)
>
> Time: 56.778 ms
> news=# select max("document#") from moreover_documents;
>     max
> ------------
>  1175107508
> (1 row)
>
> Time: 200490.228 ms
> news=#
>
> --
I am a PostgreSQL novice myself. So please try my suggestions only if you
have enough time to experiment.
If you try
select max("document#") from (
select "document#" from moreover_documents_y2010m08
union
select "document#" from moreover_documents_xxx
)
would that improve performance?

Regards,
Jayadevan





DISCLAIMER:

"The information in this e-mail and any attachment is intended only for
the person to whom it is addressed and may contain confidential and/or
privileged material. If you have received this e-mail in error, kindly
contact the sender and destroy all copies of the original communication.
IBS makes no warranty, express or implied, nor guarantees the accuracy,
adequacy or completeness of the information contained in this email or any
attachment and is not liable for any errors, defects, omissions, viruses
or for resultant loss or damage, if any, direct or indirect."






Re: Partitions and the optimizer.

От
Mladen Gogala
Дата:
Jayadevan M wrote:
>
> I am a PostgreSQL novice myself. So please try my suggestions only if you
> have enough time to experiment.
>

Good ideas can come from anybody. Being a novice is legal in almost all
states. Not sure about AZ, though.

> If you try
> select max("document#") from (
> select "document#" from moreover_documents_y2010m08
> union
> select "document#" from moreover_documents_xxx
> )
> would that improve performance?
>
> Regards,
> Jayadevan
>
Yes, this does improve performance:
news=> explain
select max(doc) from (
select max("document#") as doc from moreover_documents_y2010m08
union all
select max("document#") as doc from moreover_documents_y2010m07) as dc;
 QUER
Y
PLAN


--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--
 Aggregate  (cost=0.34..0.35 rows=1 width=8)
   ->  Append  (cost=0.15..0.34 rows=2 width=8)
         ->  Result  (cost=0.15..0.16 rows=1 width=0)
               InitPlan 1 (returns $0)
                 ->  Limit  (cost=0.00..0.15 rows=1 width=8)
                       ->  Index Scan Backward using
pk_moreover_documents_y2010
m08 on moreover_documents_y2010m08  (cost=0.00..2169871.61 rows=14615132
width=8
)
                             Filter: ("document#" IS NOT NULL)
         ->  Result  (cost=0.15..0.16 rows=1 width=0)
               InitPlan 2 (returns $1)
                 ->  Limit  (cost=0.00..0.15 rows=1 width=8)
                       ->  Index Scan Backward using
pk_moreover_documents_y2010
m07 on moreover_documents_y2010m07  (cost=0.00..2419975.91 rows=16294327
width=8
)
                             Filter: ("document#" IS NOT NULL)
(12 rows)

The point of my post is that the optimizer should be doing that, not me.
I recently saw a presentation about the Postgres optimizer which begun
with: "don't assume that you're smarter than the optimizer". Well, I
apparently an smarter than the optimizer and I am not even particularly
smart.


--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com