Обсуждение: Optimizer Not using the Right plan

Поиск
Список
Период
Сортировка

Optimizer Not using the Right plan

От
Pallav Kalva
Дата:
Hi,

     Postgres 8.2.4 is not using the right plan for different values.

    From the below queries listing.addressvaluation table has 19million
records , the other table listing.valuationchangeperiod is just lookup
table with 3 records.

    If you can see the explain plans for the statements the first one
uses a bad plan for  737987 addressid search, does a index scan backward
on the primary key "addressvaluationid" takes more time to execute and
the same query for a different addressid (5851202) uses the correct
optimal plan with index scan on "addressid" column which is way quicker.

   Autovacuums usually vacuums these tables regularly, in fact I checked
the pg_stat_user_tables the last vacuum/analyze on this table was last
night.
   I did another manual vacuum analyze on the listing.addrevaluation
table it uses the right plan for all the values now.

   Can anyone explain me this wierd behavior ?
   why does it have different plans for different values and after doing
manual vacuum analyze it works properly ?

    Are autovacuums not effective enough ?

   Here are my autovacuum settings

autovacuum_naptime = 120min
autovacuum_vacuum_threshold = 500
autovacuum_analyze_threshold = 250
autovacuum_vacuum_scale_factor = 0.001
autovacuum_analyze_scale_factor = 0.001
autovacuum_freeze_max_age = 200000000
autovacuum_vacuum_cost_delay = -1
autovacuum_vacuum_cost_limit = -1

   Here are the table structures


 listing.addressvaluation
                                                   Table
"listing.addressvaluation"
           Column           |            Type
|                                 Modifiers

----------------------------+-----------------------------+---------------------------------------------------------------------------
 addressvaluationid         | integer                     | not null
default nextval(('listing.addressvaluationseq'::text)::regclass)
 fkaddressid                | integer                     | not null
 fkaddressvaluationsourceid | integer                     | not null
 sourcereference            | text                        |
 createdate                 | timestamp without time zone | not null
default ('now'::text)::timestamp(6) without time zone
 valuationdate              | timestamp without time zone | not null
 valuationamount            | numeric(14,2)               |
 valuationhigh              | numeric(14,2)               |
 valuationlow               | numeric(14,2)               |
 valuationconfidence        | integer                     |
 valuationchange            | numeric(14,2)               |
 fkvaluationchangeperiodid  | integer                     |
 historycharturl            | text                        |
 regionhistorycharturl      | text                        |
Indexes:
    "pk_addressvaluation_addressvaluationid" PRIMARY KEY, btree
(addressvaluationid), tablespace "indexdata"
    "idx_addressvaluation_createdate" btree (createdate), tablespace
"indexdata"
    "idx_addressvaluation_fkaddressid" btree (fkaddressid), tablespace
"indexdata"
    "idx_addressvaluation_fkaddressid2" btree (fkaddressid), tablespace
"indexdata"
Foreign-key constraints:
    "fk_addressvaluation_address" FOREIGN KEY (fkaddressid) REFERENCES
listing.address(addressid)
    "fk_addressvaluation_addressvaluationsource" FOREIGN KEY
(fkaddressvaluationsourceid) REFERENCES
listing.addressvaluationsource(addressvaluationsourceid)
    "fk_addressvaluation_valuationchangeperiod" FOREIGN KEY
(fkvaluationchangeperiodid) REFERENCES
listing.valuationchangeperiod(valuationchangeperiodid)

listing.valuationchangeperiod
                                       Table "listing.valuationchangeperiod"
         Column          |  Type   |
Modifiers
-------------------------+---------+--------------------------------------------------------------------------------
 valuationchangeperiodid | integer | not null default
nextval(('listing.valuationchangeperiodseq'::text)::regclass)
 name                    | text    | not null
Indexes:
    "pk_valuationchangeperiod_valuationchangeperiodid" PRIMARY KEY,
btree (valuationchangeperiodid), tablespace "indexdata"
    "uq_valuationchangeperiod_name" UNIQUE, btree (name), tablespace
"indexdata"



For Addressid 737987 after autovacuum before manual vacuum analyze
-------------------------------------------------------------------------------------------
explain
select this_.addressvaluationid as addressv1_150_1_,
this_.sourcereference as sourcere2_150_1_,
          this_.createdate as createdate150_1_, this_.valuationdate as
valuatio4_150_1_,
          this_.valuationamount as valuatio5_150_1_, this_.valuationhigh
as valuatio6_150_1_,
          this_.valuationlow as valuatio7_150_1_,
this_.valuationconfidence as valuatio8_150_1_,
          this_.valuationchange as valuatio9_150_1_,
this_.historycharturl as history10_150_1_,
          this_.regionhistorycharturl as regionh11_150_1_,
this_.fkaddressid as fkaddre12_150_1_,
          this_.fkaddressvaluationsourceid as fkaddre13_150_1_,
this_.fkvaluationchangeperiodid as fkvalua14_150_1_,
         valuationc2_.valuationchangeperiodid as valuatio1_197_0_,
valuationc2_.name as name197_0_
from listing.addressvaluation this_ left outer join
listing.valuationchangeperiod valuationc2_
        on
this_.fkvaluationchangeperiodid=valuationc2_.valuationchangeperiodid
where this_.fkaddressid=737987
order by this_.addressvaluationid
desc limit 1;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..678.21 rows=1 width=494)
   ->  Nested Loop Left Join  (cost=0.00..883026.09 rows=1302 width=494)
         ->  Index Scan Backward using
pk_addressvaluation_addressvaluationid on addressvaluation this_
(cost=0.00..882649.43 rows=1302 width=482)
               Filter: (fkaddressid = 737987)
         ->  Index Scan using
pk_valuationchangeperiod_valuationchangeperiodid on
valuationchangeperiod valuationc2_  (cost=0.00..0.28 rows=1 width=12)
               Index Cond: (this_.fkvaluationchangeperiodid =
valuationc2_.valuationchangeperiodid)
(6 rows)


For Addressid 5851202 after autovacuum before manual vacuum analyze
--------------------------------------------------------------------------------------------

select this_.addressvaluationid as addressv1_150_1_,
this_.sourcereference as sourcere2_150_1_,
          this_.createdate as createdate150_1_, this_.valuationdate as
valuatio4_150_1_,
          this_.valuationamount as valuatio5_150_1_, this_.valuationhigh
as valuatio6_150_1_,
          this_.valuationlow as valuatio7_150_1_,
this_.valuationconfidence as valuatio8_150_1_,
          this_.valuationchange as valuatio9_150_1_,
this_.historycharturl as history10_150_1_,
          this_.regionhistorycharturl as regionh11_150_1_,
this_.fkaddressid as fkaddre12_150_1_,
          this_.fkaddressvaluationsourceid as fkaddre13_150_1_,
this_.fkvaluationchangeperiodid as fkvalua14_150_1_,
          valuationc2_.valuationchangeperiodid as valuatio1_197_0_,
valuationc2_.name as name197_0_
from listing.addressvaluation this_ left outer join
listing.valuationchangeperiod valuationc2_
on this_.fkvaluationchangeperiodid=valuationc2_.valuationchangeperiodid
where this_.fkaddressid=5851202
order by this_.addressvaluationid
desc limit 1;
                                                              QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=30.68..30.68 rows=1 width=494)
   ->  Sort  (cost=30.68..30.71 rows=11 width=494)
         Sort Key: this_.addressvaluationid
         ->  Hash Left Join  (cost=1.07..30.49 rows=11 width=494)
               Hash Cond: (this_.fkvaluationchangeperiodid =
valuationc2_.valuationchangeperiodid)
               ->  Index Scan using idx_addressvaluation_fkaddressid2 on
addressvaluation this_  (cost=0.00..29.27 rows=11 width=482)
                     Index Cond: (fkaddressid = 5851202)
               ->  Hash  (cost=1.03..1.03 rows=3 width=12)
                     ->  Seq Scan on valuationchangeperiod valuationc2_
(cost=0.00..1.03 rows=3 width=12)
(9 rows)


After manual vacuum analyze  for addressid 737987
------------------------------------------------------------------

explain
select this_.addressvaluationid as addressv1_150_1_,
this_.sourcereference as sourcere2_150_1_,
          this_.createdate as createdate150_1_, this_.valuationdate as
valuatio4_150_1_,
          this_.valuationamount as valuatio5_150_1_, this_.valuationhigh
as valuatio6_150_1_,
          this_.valuationlow as valuatio7_150_1_,
this_.valuationconfidence as valuatio8_150_1_,
          this_.valuationchange as valuatio9_150_1_,
this_.historycharturl as history10_150_1_,
          this_.regionhistorycharturl as regionh11_150_1_,
this_.fkaddressid as fkaddre12_150_1_,
          this_.fkaddressvaluationsourceid as fkaddre13_150_1_,
this_.fkvaluationchangeperiodid as fkvalua14_150_1_,
          valuationc2_.valuationchangeperiodid as valuatio1_197_0_,
valuationc2_.name as name197_0_
from listing.addressvaluation this_ inner join
listing.valuationchangeperiod valuationc2_
on this_.fkvaluationchangeperiodid=valuationc2_.valuationchangeperiodid
where this_.fkaddressid=737987
order by this_.addressvaluationid
desc limit 1;
                                                             QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=31.24..31.24 rows=1 width=494)
   ->  Sort  (cost=31.24..31.27 rows=11 width=494)
         Sort Key: this_.addressvaluationid
         ->  Hash Join  (cost=1.07..31.05 rows=11 width=494)
               Hash Cond: (this_.fkvaluationchangeperiodid =
valuationc2_.valuationchangeperiodid)
               ->  Index Scan using idx_addressvaluation_fkaddressid on
addressvaluation this_  (cost=0.00..29.83 rows=11 width=482)
                     Index Cond: (fkaddressid = 737987)
               ->  Hash  (cost=1.03..1.03 rows=3 width=12)
                     ->  Seq Scan on valuationchangeperiod valuationc2_
(cost=0.00..1.03 rows=3 width=12)
(9 rows)



Thanks!
Pallav.



Re: Optimizer Not using the Right plan

От
Tom Lane
Дата:
Pallav Kalva <pkalva@livedatagroup.com> writes:
>    why does it have different plans for different values

Because the values occur different numbers of times (or so it thinks
anyway).  If the rowcount estimates are far from reality, perhaps
increasing the statistics target would help.  However, since you
only showed EXPLAIN and not EXPLAIN ANALYZE output, no one can
really tell whether the optimizer did anything wrong here.

            regards, tom lane

Re: Optimizer Not using the Right plan

От
Pallav Kalva
Дата:
Tom Lane wrote:
> Pallav Kalva <pkalva@livedatagroup.com> writes:
>
>>    why does it have different plans for different values
>>
>
> Because the values occur different numbers of times (or so it thinks
> anyway).  If the rowcount estimates are far from reality, perhaps
> increasing the statistics target would help.  However, since you
> only showed EXPLAIN and not EXPLAIN ANALYZE output, no one can
> really tell whether the optimizer did anything wrong here.
>
>             regards, tom lane
>

Hi Tom,

   Thanks! for your reply, here is an another example of the same query
with different addressid now. This time I got the explain analyze on the
query,
   this query also uses the Index Scan Backwards, it says it took 28
seconds but I can say that after looking at the postgres logs it took
more than 2 min
   when the query first ran.  I ran this one again to get the explain
analyze.

   The statistics set to "default_statistics_target = 100"

   I am sure if it uses index on addressid it would be quicker but for
some reason it using index backward scan on addressvaluationid and that
is taking too long.

   Not only this one there are some other queries which use index scan
backwards scan and it takes too long. Index scan backwards most of the
time is not doing good for me is there any way to avoid it ?



explain analyze
select this_.addressvaluationid as addressv1_150_1_,
this_.sourcereference as sourcere2_150_1_,
          this_.createdate as createdate150_1_, this_.valuationdate as
valuatio4_150_1_, this_.valuationamount as valuatio5_150_1_,
          this_.valuationhigh as valuatio6_150_1_, this_.valuationlow as
valuatio7_150_1_,
          this_.valuationconfidence as valuatio8_150_1_,
this_.valuationchange as valuatio9_150_1_,
          this_.historycharturl as history10_150_1_,
this_.regionhistorycharturl as regionh11_150_1_,
          this_.fkaddressid as fkaddre12_150_1_,
this_.fkaddressvaluationsourceid as fkaddre13_150_1_,
          this_.fkvaluationchangeperiodid as fkvalua14_150_1_,
valuationc2_.valuationchangeperiodid as valuatio1_197_0_,
          valuationc2_.name as name197_0_
from listing.addressvaluation this_ left outer join
listing.valuationchangeperiod valuationc2_
       on
this_.fkvaluationchangeperiodid=valuationc2_.valuationchangeperiodid
where this_.fkaddressid= 6664161
order by this_.addressvaluationid desc limit 1;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..677.69 rows=1 width=494) (actual
time=28454.708..28454.712 rows=1 loops=1)
   ->  Nested Loop Left Join  (cost=0.00..883705.44 rows=1304 width=494)
(actual time=28454.700..28454.700 rows=1 loops=1)
         ->  Index Scan Backward using
pk_addressvaluation_addressvaluationid on addressvaluation this_
(cost=0.00..883328.22 rows=1304 width=482) (actual
time=28441.236..28441.236 rows=1 loops=1)
               Filter: (fkaddressid = 6664161)
         ->  Index Scan using
pk_valuationchangeperiod_valuationchangeperiodid on
valuationchangeperiod valuationc2_  (cost=0.00..0.28 rows=1 width=12)
(actual time=13.447..13.447 rows=1 loops=1)
               Index Cond: (this_.fkvaluationchangeperiodid =
valuationc2_.valuationchangeperiodid)
 Total runtime: 28454.789 ms
(7 rows)



Re: Optimizer Not using the Right plan

От
"galy lee"
Дата:
 
I think you may increase the row number that you want to limit, like LIMIT 50.
 
LIMIT can change the cost of a plan dramatically. Looking in your SQL:
 
   where this_.fkaddressid= 6664161
   order by this_.addressvaluationid desc limit 1;
 
Planner may use either index1(this_.fkaddressid) or index2(this_.addressvaluationid) to scan the table. Although it is obvious that using index2 is very expensive, but because you are trying to limit one row from 1304 row, so the cost of using index2 changes to  

   883328/1304=677.69

The cost of using index1 should be lager than 1304, so planner chooses index2.

Planner tends to choose a plan which has small startup cost when you are trying to LIMIT a small portion of data over a large data set. It seems that the following issue also comes from the same root.

  http://archives.postgresql.org/pgsql-performance/2007-11/msg00395.php

Best Regards
Galy Lee
 
Tom Lane wrote:
Pallav Kalva <pkalva ( at ) livedatagroup ( dot ) com> writes:
   why does it have different plans for different values
Because the values occur different numbers of times (or so it thinks
anyway).  If the rowcount estimates are far from reality, perhaps
increasing the statistics target would help.  However, since you
only showed EXPLAIN and not EXPLAIN ANALYZE output, no one can
really tell whether the optimizer did anything wrong here.
		regards, tom lane
Hi Tom,

Thanks! for your reply, here is an another example of the same query with different addressid now. This time I got the explain analyze on the query, this query also uses the Index Scan Backwards, it says it took 28 seconds but I can say that after looking at the postgres logs it took more than 2 min when the query first ran. I ran this one again to get the explain analyze.
 The statistics set to "default_statistics_target = 100"

I am sure if it uses index on addressid it would be quicker but for some reason it using index backward scan on addressvaluationid and that is taking too long.
 Not only this one there are some other queries which use index scan backwards scan and it takes too long. Index scan backwards most of the time is not doing good for me is there any way to avoid it ? 
explain analyze
select this_.addressvaluationid as addressv1_150_1_, this_.sourcereference as sourcere2_150_1_, this_.createdate as createdate150_1_, this_.valuationdate as valuatio4_150_1_, this_.valuationamount as valuatio5_150_1_, this_.valuationhigh as valuatio6_150_1_, this_.valuationlow as valuatio7_150_1_, this_.valuationconfidence as valuatio8_150_1_, this_.valuationchange as valuatio9_150_1_, this_.historycharturl as history10_150_1_, this_.regionhistorycharturl as regionh11_150_1_, this_.fkaddressid as fkaddre12_150_1_, this_.fkaddressvaluationsourceid as fkaddre13_150_1_, this_.fkvaluationchangeperiodid as fkvalua14_150_1_, valuationc2_.valuationchangeperiodid as valuatio1_197_0_,
         valuationc2_.name as name197_0_
from listing.addressvaluation this_ left outer join listing.valuationchangeperiod valuationc2_ on this_.fkvaluationchangeperiodid=valuationc2_.valuationchangeperiodid
where this_.fkaddressid= 6664161
order by this_.addressvaluationid desc limit 1;
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.00..677.69 rows=1 width=494) (actual time=28454.708..28454.712 rows=1 loops=1) -> Nested Loop Left Join (cost=0.00..883705.44 rows=1304 width=494) (actual time=28454.700..28454.700 rows=1 loops=1) -> Index Scan Backward using pk_addressvaluation_addressvaluationid on addressvaluation this_ (cost=0.00..883328.22 rows=1304 width=482) (actual time=28441.236..28441.236 rows=1 loops=1)
              Filter: (fkaddressid = 6664161)
-> Index Scan using pk_valuationchangeperiod_valuationchangeperiodid on valuationchangeperiod valuationc2_ (cost=0.00..0.28 rows=1 width=12) (actual time=13.447..13.447 rows=1 loops=1) Index Cond: (this_.fkvaluationchangeperiodid = valuationc2_.valuationchangeperiodid)
Total runtime: 28454.789 ms
(7 rows)