Обсуждение: Suboptimal execution plan for simple query

Поиск
Список
Период
Сортировка

Suboptimal execution plan for simple query

От
"Markus Wollny"
Дата:
Hi!

We've got a table with the following definition:

CREATE TABLE image_relation
(
  id integer,
  article_id integer NOT NULL,
  entity_id integer NOT NULL,
  image_id integer NOT NULL,
  subline text,
  "position" integer,
  article_headline text,
  entity_name text,
  entity_type_id integer,
  entity_source text,
  default_pic character varying(3) NOT NULL,
  last_updated timestamp without time zone NOT NULL DEFAULT now(),
  CONSTRAINT pkey_image_relation PRIMARY KEY (article_id, entity_id, image_id)
)
WITH (OIDS=FALSE);

There are simple btree indexes on article_id, default_pic, entity_id, id, image_id, last_updated and position. The
tablehas about 723,000 rows, stats say table size is 135MB, toast tables are 184MB and index size was at a whopping
727MB- so I thought I might do some additional maintenance. After reindexing, I got index size down to 131MB. This
howeverdid not affect the planner choices in any way, as they and the resulting execution times stayed the same before
andafter table maintenance (reindex and subsequent vacuum analyze). Our PostgreSQL version is 8.2.4 (I am going to move
onto the latest and greatest 8.3 in about two weeks). 

Now I've got this simple query

SELECT  image_id
FROM image_relation
WHERE entity_id = 69560::integer
ORDER BY last_updated DESC
LIMIT    1;

which currently runs for something around 600ms. Here's the explain analyze output:

"Limit  (cost=0.00..144.78 rows=1 width=12) (actual time=599.745..599.747 rows=1 loops=1)"
"  ->  Index Scan Backward using idx_image_relation_last_updated on image_relation  (cost=0.00..39525.70 rows=273
width=12)(actual time=599.741..599.741 rows=1 loops=1)" 
"        Filter: (entity_id = 69560)"
"Total runtime: 599.825 ms"

SELECT  image_id
FROM image_relation
WHERE entity_id = 69560::integer;

only returns three rows. So I wonder why the planner chooses to use the index on last_updated instead of the index on
entity_id;I found out that I can get it to reconsider and make a wiser choice by adding some seemingly superfluous
statementto the WHERE clause (notice the AND... bit): 

SELECT  image_id
FROM image_relation
WHERE entity_id = 69560
AND entity_id = entity_id
ORDER BY last_updated DESC
LIMIT    1

"Limit  (cost=881.82..881.82 rows=1 width=12) (actual time=0.097..0.099 rows=1 loops=1)"
"  ->  Sort  (cost=881.82..881.82 rows=1 width=12) (actual time=0.094..0.094 rows=1 loops=1)"
"        Sort Key: last_updated"
"        ->  Index Scan using idx_image_relation_entity_id on image_relation  (cost=0.00..881.81 rows=1 width=12)
(actualtime=0.063..0.075 rows=3 loops=1)" 
"              Index Cond: (entity_id = 69560)"
"              Filter: (entity_id = entity_id)"
"Total runtime: 0.128 ms"

That's much more like it. The table is being vacuumed on a regular basis by both a nightly cron and the autovacuum
daemon. 

Any ideas on what's going wrong here?

Kind regards

   Markus


Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Rainer Rosenbusch
Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276



Re: Suboptimal execution plan for simple query

От
"Markus Wollny"
Дата:
Hi!

In preparation for my upcoming upgrade to PostgreSQL 8.3.5, I have taken the opportunity to try this scenario on a test
machinewith the latest PostgreSQL version. Unfortunately the result remains the same, though this database has been
justreloaded from a dump and vacuum analyzed. select version() outputs "PostgreSQL 8.3.5 on i686-pc-linux-gnu, compiled
byGCC gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)". 

Kind regards

   Markus

Just for reference:
> Now I've got this simple query
>
> SELECT  image_id
> FROM image_relation
> WHERE entity_id = 69560::integer
> ORDER BY last_updated DESC
> LIMIT    1;
>
> which currently runs for something around 600ms. Here's the explain
> analyze output:
>
> "Limit  (cost=0.00..144.78 rows=1 width=12) (actual
> time=599.745..599.747 rows=1 loops=1)" "  ->  Index Scan Backward
> using idx_image_relation_last_updated on image_relation
> (cost=0.00..39525.70 rows=273 width=12) (actual time=599.741..599.741
> rows=1 loops=1)" "        Filter: (entity_id = 69560)" "Total
> runtime: 599.825 ms"

> SELECT  image_id
> FROM image_relation
> WHERE entity_id = 69560
> AND entity_id = entity_id
> ORDER BY last_updated DESC
> LIMIT    1
>
> "Limit  (cost=881.82..881.82 rows=1 width=12) (actual
> time=0.097..0.099 rows=1 loops=1)" "  ->  Sort  (cost=881.82..881.82
> rows=1 width=12) (actual time=0.094..0.094 rows=1 loops=1)" "
> Sort Key: last_updated" "        ->  Index Scan using
> idx_image_relation_entity_id on image_relation  (cost=0.00..881.81
> rows=1 width=12) (actual time=0.063..0.075 rows=3 loops=1)" "
> Index Cond: (entity_id = 69560)" "              Filter: (entity_id =
> entity_id)" "Total runtime: 0.128 ms"



Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Rainer Rosenbusch
Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276



Re: Suboptimal execution plan for simple query

От
Sam Mason
Дата:
On Wed, Nov 12, 2008 at 04:15:23PM +0100, Markus Wollny wrote:
> I've got this simple query
>
> SELECT  image_id
> FROM image_relation
> WHERE entity_id = 69560::integer
> ORDER BY last_updated DESC
> LIMIT    1;
>
> which currently runs for something around 600ms. Here's the explain analyze output:
>
> "Limit  (cost=0.00..144.78 rows=1 width=12) (actual time=599.745..599.747 rows=1 loops=1)"
> "  ->  Index Scan Backward using idx_image_relation_last_updated on image_relation  (cost=0.00..39525.70 rows=273
width=12)(actual time=599.741..599.741 rows=1 loops=1)" 
> "        Filter: (entity_id = 69560)"
> "Total runtime: 599.825 ms"

The database would appear to be thinking that it's better off running
through time backwards to find the entry than searching for the entry
directly.  This is normally because each entry_id has several rows and
running through time would end up doing less work (especially as it
wouldn't need to sort the results afterwards).

You may have some luck with increasing the statistics target on the
entry_id and last_updated columns and re-ANALYZING the table.  Then
again, the fact that it thinks it's only going to get a single row
back when it searches for the entity_id suggests that it's all a bit
confused!


  Sam

Re: Suboptimal execution plan for simple query

От
"Markus Wollny"
Дата:
Hi!

Sam Mason wrote:
> You may have some luck with increasing the statistics target on the
> entry_id and last_updated columns and re-ANALYZING the table.  Then
> again, the fact that it thinks it's only going to get a single row
> back when it searches for the entity_id suggests that it's all a bit
> confused!

Thank you for that suggestion. Increasing the statistics target on entity_id from the default 10 to 30 and re-analyzing
didthe trick: 

"Limit  (cost=340.75..340.75 rows=1 width=12) (actual time=0.084..0.085 rows=1 loops=1)"
"  ->  Sort  (cost=340.75..341.00 rows=103 width=12) (actual time=0.081..0.081 rows=1 loops=1)"
"        Sort Key: last_updated"
"        ->  Index Scan using idx_image_relation_entity_id on image_relation  (cost=0.00..337.30 rows=103 width=12)
(actualtime=0.059..0.065 rows=3 loops=1)" 
"              Index Cond: (entity_id = 69560)"
"Total runtime: 0.121 ms"

Kind regards

   Markus


Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Rainer Rosenbusch
Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276



Re: Suboptimal execution plan for simple query

От
Sam Mason
Дата:
On Thu, Nov 13, 2008 at 01:56:11PM +0100, Markus Wollny wrote:
> Sam Mason wrote:
> > You may have some luck with increasing the statistics target on the
> > entry_id and last_updated columns and re-ANALYZING the table.  Then
> > again, the fact that it thinks it's only going to get a single row
> > back when it searches for the entity_id suggests that it's all a bit
> > confused!
>
> Thank you for that suggestion. Increasing the statistics target on
> entity_id from the default 10 to 30 and re-analyzing did the trick:

Even higher may be good for other entities; it thinks it's getting 103
rows back for this entity, whereas infact it only gets 3 back.  Or is,
on average, 103 a reasonable guess?

> "Limit  (cost=340.75..340.75 rows=1 width=12) (actual time=0.084..0.085 rows=1 loops=1)"
> "  ->  Sort  (cost=340.75..341.00 rows=103 width=12) (actual time=0.081..0.081 rows=1 loops=1)"
> "        Sort Key: last_updated"
> "        ->  Index Scan using idx_image_relation_entity_id on image_relation  (cost=0.00..337.30 rows=103 width=12)
(actualtime=0.059..0.065 rows=3 loops=1)" 
> "              Index Cond: (entity_id = 69560)"
> "Total runtime: 0.121 ms"

A target over 100 will change the way it does the stats and may produce
a better fit; try the query with a few different entities (i.e. ones
where you know you've got many rows in the table, and ones where you've
only got one or two) and see what numbers it comes back with.  The
smaller the target is, the faster the queries are planned and larger
targets should allow the planner to cope with more uneven datasets.
If the distribution is reasonably uniform you should be able to get
away with low targets, less even distributions normally require larger
targets.


  Sam