Обсуждение: why is postgres estimating so badly?

Поиск
Список
Период
Сортировка

why is postgres estimating so badly?

От
"Luis Alberto Amigo Navarro"
Дата:
I have a query and estimations and results don´t look similar, here is explain analyze:
 
 NOTICE:  QUERY PLAN:
 
Sort  (cost=12443.90..12443.90 rows=1 width=93) (actual time=505331.94..505332.67 rows=175 loops=1)
  ->  Aggregate  (cost=12443.88..12443.89 rows=1 width=93) (actual time=472520.29..505326.48 rows=175 loops=1)
        ->  Group  (cost=12443.88..12443.89 rows=1 width=93) (actual time=472307.31..485173.92 rows=325302 loops=1)
              ->  Sort  (cost=12443.88..12443.88 rows=1 width=93) (actual time=472307.24..473769.79 rows=325302 loops=1)
                    ->  Nested Loop  (cost=12439.25..12443.87 rows=1 width=93) (actual time=103787.68..441614.43 rows=325302 loops=1)
                          ->  Hash Join  (cost=12439.25..12440.64 rows=1 width=85) (actual time=103733.76..120916.86 rows=325302 loops=1)
                                ->  Seq Scan on nation  (cost=0.00..1.25 rows=25 width=15) (actual time=7.81..8.72 rows=25 loops=1)
                                ->  Hash  (cost=12439.25..12439.25 rows=1 width=70) (actual time=103722.25..103722.25 rows=0 loops=1)
                                      ->  Nested Loop  (cost=0.00..12439.25 rows=1 width=70) (actual time=95.43..100162.91 rows=325302 loops=1)
                                            ->  Nested Loop  (cost=0.00..12436.23 rows=1 width=62) (actual time=84.91..47502.93 rows=325302 loops=1)
                                                  ->  Nested Loop  (cost=0.00..12412.93 rows=4 width=24) (actual time=66.86..8806.01 rows=43424 loops=1)
                                                        ->  Seq Scan on part  (cost=0.00..12399.00 rows=1 width=4) (actual time=24.88..4076.81 rows=10856 loops=1)
                                                        ->  Index Scan using partsupp_pkey on partsupp  (cost=0.00..13.89 rows=4 width=20) (actual time=0.20..0.34 rows=4 loops=10856)
                                                  ->  Index Scan using l_partsupp_index on lineitem  (cost=0.00..6.02 rows=1 width=38) (actual time=0.20..0.61 rows=7 loops=43424)
                                            ->  Index Scan using supplier_pkey on supplier  (cost=0.00..3.01 rows=1 width=8) (actual time=0.08..0.10 rows=1 loops=325302)
                          ->  Index Scan using orders_pkey on orders  (cost=0.00..3.22 rows=1 width=8) (actual time=0.85..0.87 rows=1 loops=325302)
Total runtime: 505563.85 msec
estimated 12000msec
 
here is the query:
SELECT
 nation,
 o_year,
 CAST((sum(amount))AS NUMERIC(10,2))AS sum_profit
FROM(
 SELECT
  nation.name AS nation,
  EXTRACT(year FROM orders.orderdate) AS o_year,
  lineitem.extendedprice*(1-lineitem.discount)-partsupp.supplycost*lineitem.quantity AS amount
 FROM
  part,
  supplier,
  lineitem,
  partsupp,
  orders,
  nation
 WHERE
  supplier.suppkey=lineitem.suppkey
  AND partsupp.suppkey=lineitem.suppkey
  AND partsupp.partkey=lineitem.partkey
  AND part.partkey=lineitem.partkey
  AND orders.orderkey=lineitem.orderkey
  AND supplier.nationkey=nation.nationkey
  AND part.name LIKE '%green%'
 ) AS profit
GROUP BY
 nation,
 o_year
ORDER BY
 nation,
 o_year DESC;
lineitem is about 6M rows
partsupp 800K rows
part 200K rows
 
any advice?
Thanks and regards
 
 

Re: why is postgres estimating so badly?

От
"Nathan C. Burnett"
Дата:
The first thing to point out is that the estimated cost is measured in
terms of page reads while the actual time is measured in milliseconds.  So
even if the cost estimate is accurate it is unlikely that those numbers
will be the same.

-N

--
Nathan C. Burnett
Research Assistant, Wisconsin Network Disks
Department of Computer Sciences
University of Wisconsin - Madison
ncb@cs.wisc.edu

On Wed, 17 Jul 2002, Luis Alberto Amigo Navarro wrote:

> I have a query and estimations and results don�t look similar, here is explain analyze:
>
>  NOTICE:  QUERY PLAN:
>
> Sort  (cost=12443.90..12443.90 rows=1 width=93) (actual time=505331.94..505332.67 rows=175 loops=1)
>   ->  Aggregate  (cost=12443.88..12443.89 rows=1 width=93) (actual time=472520.29..505326.48 rows=175 loops=1)
>         ->  Group  (cost=12443.88..12443.89 rows=1 width=93) (actual time=472307.31..485173.92 rows=325302 loops=1)
>               ->  Sort  (cost=12443.88..12443.88 rows=1 width=93) (actual time=472307.24..473769.79 rows=325302
loops=1)
>                     ->  Nested Loop  (cost=12439.25..12443.87 rows=1 width=93) (actual time=103787.68..441614.43
rows=325302loops=1) 
>                           ->  Hash Join  (cost=12439.25..12440.64 rows=1 width=85) (actual time=103733.76..120916.86
rows=325302loops=1) 
>                                 ->  Seq Scan on nation  (cost=0.00..1.25 rows=25 width=15) (actual time=7.81..8.72
rows=25loops=1) 
>                                 ->  Hash  (cost=12439.25..12439.25 rows=1 width=70) (actual time=103722.25..103722.25
rows=0loops=1) 
>                                       ->  Nested Loop  (cost=0.00..12439.25 rows=1 width=70) (actual
time=95.43..100162.91rows=325302 loops=1) 
>                                             ->  Nested Loop  (cost=0.00..12436.23 rows=1 width=62) (actual
time=84.91..47502.93rows=325302 loops=1) 
>                                                   ->  Nested Loop  (cost=0.00..12412.93 rows=4 width=24) (actual
time=66.86..8806.01rows=43424 loops=1) 
>                                                         ->  Seq Scan on part  (cost=0.00..12399.00 rows=1 width=4)
(actualtime=24.88..4076.81 rows=10856 loops=1) 
>                                                         ->  Index Scan using partsupp_pkey on partsupp
(cost=0.00..13.89rows=4 width=20) (actual time=0.20..0.34 rows=4 loops=10856) 
>                                                   ->  Index Scan using l_partsupp_index on lineitem  (cost=0.00..6.02
rows=1width=38) (actual time=0.20..0.61 rows=7 loops=43424) 
>                                             ->  Index Scan using supplier_pkey on supplier  (cost=0.00..3.01 rows=1
width=8)(actual time=0.08..0.10 rows=1 loops=325302) 
>                           ->  Index Scan using orders_pkey on orders  (cost=0.00..3.22 rows=1 width=8) (actual
time=0.85..0.87rows=1 loops=325302) 
> Total runtime: 505563.85 msec
>
> estimated 12000msec
>
> here is the query:
> SELECT
>  nation,
>  o_year,
>  CAST((sum(amount))AS NUMERIC(10,2))AS sum_profit
> FROM(
>  SELECT
>   nation.name AS nation,
>   EXTRACT(year FROM orders.orderdate) AS o_year,
>   lineitem.extendedprice*(1-lineitem.discount)-partsupp.supplycost*lineitem.quantity AS amount
>  FROM
>   part,
>   supplier,
>   lineitem,
>   partsupp,
>   orders,
>   nation
>  WHERE
>   supplier.suppkey=lineitem.suppkey
>   AND partsupp.suppkey=lineitem.suppkey
>   AND partsupp.partkey=lineitem.partkey
>   AND part.partkey=lineitem.partkey
>   AND orders.orderkey=lineitem.orderkey
>   AND supplier.nationkey=nation.nationkey
>   AND part.name LIKE '%green%'
>  ) AS profit
> GROUP BY
>  nation,
>  o_year
> ORDER BY
>  nation,
>  o_year DESC;
>
> lineitem is about 6M rows
> partsupp 800K rows
> part 200K rows
>
> any advice?
> Thanks and regards
>
>
>



Re: why is postgres estimating so badly?

От
Tom Lane
Дата:
"Luis Alberto Amigo Navarro" <lamigo@atc.unican.es> writes:
>  ->  Seq Scan on part  (cost=0.00..12399.00 rows=1 width=4) (actual time=24.88..4076.81 rows=10856 loops=1)

Seems like the major misestimation is above: the LIKE clause on part is
estimated to select just one row, but it selects 10856 of 'em.  Had the
planner realized the number of returned rows would be in the thousands,
it'd likely have used quite a different plan structure.

>   AND part.name LIKE '%green%'

It's difficult for the planner to produce a decent estimate for the
selectivity of an unanchored LIKE clause, since there are no statistics
it can use for the purpose.  We recently changed FIXED_CHAR_SEL in
src/backend/utils/adt/selfuncs.c from 0.04 to 0.20, which would make
this particular case come out better.  (I believe the estimate would
work out to about 320, if part is 200K rows; that should be enough to
produce at least some change of plan.)  You could try patching your
local installation likewise.
        regards, tom lane


Re: why is postgres estimating so badly?

От
"Luis Alberto Amigo Navarro"
Дата:
> >   AND part.name LIKE '%green%'
>
> It's difficult for the planner to produce a decent estimate for the
> selectivity of an unanchored LIKE clause, since there are no statistics
> it can use for the purpose.  We recently changed FIXED_CHAR_SEL in
> src/backend/utils/adt/selfuncs.c from 0.04 to 0.20, which would make
> this particular case come out better.  (I believe the estimate would
> work out to about 320, if part is 200K rows; that should be enough to
> produce at least some change of plan.)  You could try patching your
> local installation likewise.

Here are the results, worse than before:
NOTICE:  QUERY PLAN:

Sort  (cost=25209.88..25209.88 rows=1 width=93) (actual
time=1836143.78..1836144.48 rows=175 loops=1) ->  Aggregate  (cost=25209.85..25209.87 rows=1 width=93) (actual
time=1803559.97..1836136.47 rows=175 loops=1)       ->  Group  (cost=25209.85..25209.86 rows=2 width=93) (actual
time=1803348.04..1816093.89 rows=325302 loops=1)             ->  Sort  (cost=25209.85..25209.85 rows=2 width=93)
(actual
time=1803347.97..1804795.41 rows=325302 loops=1)                   ->  Hash Join  (cost=25208.43..25209.84 rows=2
width=93)
(actual time=1744714.61..1772790.19 rows=325302 loops=1)                         ->  Seq Scan on nation
(cost=0.00..1.25rows=25
 
width=15) (actual time=13.92..14.84 rows=25 loops=1)                         ->  Hash  (cost=25208.42..25208.42 rows=2
width=78) (actual time=1744603.74..1744603.74 rows=0 loops=1)                               ->  Nested Loop
(cost=0.00..25208.42rows=2
 
width=78) (actual time=139.21..1740110.04 rows=325302 loops=1)                                     ->  Nested Loop
(cost=0.00..25201.19
rows=2 width=70) (actual time=122.37..1687895.49 rows=325302 loops=1)                                           ->
NestedLoop
 
(cost=0.00..25187.93 rows=4 width=62) (actual time=121.75..856097.27
rows=325302 loops=1)                                                 ->  Nested Loop
(cost=0.00..17468.91 rows=1280 width=24) (actual time=78.43..19698.77
rows=43424 loops=1)                                                       ->  Seq Scan on part
(cost=0.00..12399.00 rows=320 width=4) (actual time=29.57..4179.70
rows=10856 loops=1)                                                       ->  Index Scan using
partsupp_pkey on partsupp  (cost=0.00..15.79 rows=4 width=20) (actual
time=1.17..1.33 rows=4 loops=10856)                                                 ->  Index Scan using
l_partsupp_index on lineitem  (cost=0.00..6.02 rows=1 width=38) (actual
time=2.83..18.97 rows=7 loops=43424)                                           ->  Index Scan using orders_pkey
on orders  (cost=0.00..3.23 rows=1 width=8) (actual time=2.47..2.50 rows=1
loops=325302)                                     ->  Index Scan using supplier_pkey on
supplier  (cost=0.00..3.01 rows=1 width=8) (actual time=0.08..0.09 rows=1
loops=325302)
Total runtime: 1836375.16 msec


It looks even worse, another advice?, or maybe a query change. here is the
query again:
SELECTnation,o_year,CAST((sum(amount))AS NUMERIC(10,2))AS sum_profit
FROM(SELECT nation.name AS nation, EXTRACT(year FROM orders.orderdate) AS o_year,

lineitem.extendedprice*(1-lineitem.discount)-partsupp.supplycost*lineitem.qu
antity AS amountFROM part, supplier, lineitem, partsupp, orders, nationWHERE supplier.suppkey=lineitem.suppkey AND
partsupp.suppkey=lineitem.suppkeyAND partsupp.partkey=lineitem.partkey AND part.partkey=lineitem.partkey AND
orders.orderkey=lineitem.orderkeyAND supplier.nationkey=nation.nationkey AND part.name LIKE '%green%') AS profit
 
GROUP BYnation,o_year
ORDER BYnation,o_year DESC;


Thanks and regards