Обсуждение: Cost estimates consistently too high - does it matter?

Поиск
Список
Период
Сортировка

Cost estimates consistently too high - does it matter?

От
"Matt Clark"
Дата:
Hi,
 
I've noticed that the cost estimates for a lot of my queries are consistently far to high.  Sometimes it's because the row estimates are wrong, like this:
 
explain analyze select logtime from loginlog where uid='Ymogen::YM_User::3e2c0869c2fdd26d8a74d218d5a6ff585d490560' and result = 'Success' order by logtime desc limit 3;
NOTICE:  QUERY PLAN:
Limit  (cost=0.00..221.85 rows=3 width=8) (actual time=0.21..2.39 rows=3 loops=1)
  ->  Index Scan Backward using loginlog_logtime_idx on loginlog  (cost=0.00..12846.69 rows=174 width=8) (actual time=0.20..2.37 rows=4 loops=1)
Total runtime: 2.48 msec
 
The row estimate here is off by a factor of 50, but the cost estimate is off by a factor of 5000.
 
Sometimes the row estimates are good, but the costs are still too high:
 
explain analyze select u.email from ym_user u join mobilepm m on (m.ownerid = u._id) where m.status = 'Validated' and m.network = 'TMOBILEUK';
NOTICE:  QUERY PLAN:
Nested Loop  (cost=0.00..2569.13 rows=441 width=145) (actual time=1.93..248.57 rows=553 loops=1)
  ->  Seq Scan on mobilepm m  (cost=0.00..795.11 rows=441 width=58) (actual time=1.69..132.83 rows=553 loops=1)
  ->  Index Scan using ym_user_id_idx on ym_user u  (cost=0.00..4.01 rows=1 width=87) (actual time=0.19..0.20 rows=1 loops=553)
Total runtime: 249.47 msec
 
loginlog has 180000 rows, mobilepm has 12000, ym_user has 50000, and they've all been analyzed prior to running the query.
 
The server is a Quad PIII 700 Xeon/1MB cache, 3GB RAM, hardware RAID 10 on two SCSI channels with 128MB write-back cache.
 
I've lowered the random_page_cost to 2 to reflect the decent disk IO, but I suppose the fact that the DB & indexes are essentially all cached in RAM might also be affecting the results, although effective_cache_size is set to a realistic 262144 (2GB).  Those planner params in full:
 
#effective_cache_size = 1000  # default in 8k pages
#random_page_cost = 4
#cpu_tuple_cost = 0.01
#cpu_index_tuple_cost = 0.001
#cpu_operator_cost = 0.0025
effective_cache_size = 262144 # 2GB of FS cache
random_page_cost = 2
 
For now the planner seems to be making the right choices, but my concern is that at some point the planner might start making some bad decisions, especially on more complex queries.  Should I bother tweaking the planner costs more, and if so which ones?  Am I fretting over nothing?
 
Cheers
 
Matt

Matt Clark
Ymogen Ltd
matt@ymogen.net
corp.ymogen.net

 

Re: Cost estimates consistently too high - does it matter?

От
Tom Lane
Дата:
"Matt Clark" <matt@ymogen.net> writes:
> I've noticed that the cost estimates for a lot of my queries are
> consistently far to high.

You seem to be under a misapprehension.  The cost estimates are not
in units of milliseconds, they are on an arbitrary scale with 1.0
defined as one disk fetch.

LIMIT throws another monkey wrench into the mix: the estimates for the
plan nodes underneath the limit are done as if the plan were to be
executed to completion, which of course it won't be.

            regards, tom lane

Re: Cost estimates consistently too high - does it matter?

От
"Matt Clark"
Дата:
Well, I usually am under a misapprehension!  Thanks for the explanation about LIMIT too.

In that case then, I shall stop worrying and learn to love the planner.

M

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: 08 August 2003 16:15
> To: Matt Clark
> Cc: pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] Cost estimates consistently too high - does it
> matter?
>
>
> "Matt Clark" <matt@ymogen.net> writes:
> > I've noticed that the cost estimates for a lot of my queries are
> > consistently far to high.
>
> You seem to be under a misapprehension.  The cost estimates are not
> in units of milliseconds, they are on an arbitrary scale with 1.0
> defined as one disk fetch.
>
> LIMIT throws another monkey wrench into the mix: the estimates for the
> plan nodes underneath the limit are done as if the plan were to be
> executed to completion, which of course it won't be.
>
>             regards, tom lane
>
>