Обсуждение: Suboptimal execution plan for simple query
Hi! We've got a table with the following definition: CREATE TABLE image_relation ( id integer, article_id integer NOT NULL, entity_id integer NOT NULL, image_id integer NOT NULL, subline text, "position" integer, article_headline text, entity_name text, entity_type_id integer, entity_source text, default_pic character varying(3) NOT NULL, last_updated timestamp without time zone NOT NULL DEFAULT now(), CONSTRAINT pkey_image_relation PRIMARY KEY (article_id, entity_id, image_id) ) WITH (OIDS=FALSE); There are simple btree indexes on article_id, default_pic, entity_id, id, image_id, last_updated and position. The tablehas about 723,000 rows, stats say table size is 135MB, toast tables are 184MB and index size was at a whopping 727MB- so I thought I might do some additional maintenance. After reindexing, I got index size down to 131MB. This howeverdid not affect the planner choices in any way, as they and the resulting execution times stayed the same before andafter table maintenance (reindex and subsequent vacuum analyze). Our PostgreSQL version is 8.2.4 (I am going to move onto the latest and greatest 8.3 in about two weeks). Now I've got this simple query SELECT image_id FROM image_relation WHERE entity_id = 69560::integer ORDER BY last_updated DESC LIMIT 1; which currently runs for something around 600ms. Here's the explain analyze output: "Limit (cost=0.00..144.78 rows=1 width=12) (actual time=599.745..599.747 rows=1 loops=1)" " -> Index Scan Backward using idx_image_relation_last_updated on image_relation (cost=0.00..39525.70 rows=273 width=12)(actual time=599.741..599.741 rows=1 loops=1)" " Filter: (entity_id = 69560)" "Total runtime: 599.825 ms" SELECT image_id FROM image_relation WHERE entity_id = 69560::integer; only returns three rows. So I wonder why the planner chooses to use the index on last_updated instead of the index on entity_id;I found out that I can get it to reconsider and make a wiser choice by adding some seemingly superfluous statementto the WHERE clause (notice the AND... bit): SELECT image_id FROM image_relation WHERE entity_id = 69560 AND entity_id = entity_id ORDER BY last_updated DESC LIMIT 1 "Limit (cost=881.82..881.82 rows=1 width=12) (actual time=0.097..0.099 rows=1 loops=1)" " -> Sort (cost=881.82..881.82 rows=1 width=12) (actual time=0.094..0.094 rows=1 loops=1)" " Sort Key: last_updated" " -> Index Scan using idx_image_relation_entity_id on image_relation (cost=0.00..881.81 rows=1 width=12) (actualtime=0.063..0.075 rows=3 loops=1)" " Index Cond: (entity_id = 69560)" " Filter: (entity_id = entity_id)" "Total runtime: 0.128 ms" That's much more like it. The table is being vacuumed on a regular basis by both a nightly cron and the autovacuum daemon. Any ideas on what's going wrong here? Kind regards Markus Computec Media AG Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818) Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Rainer Rosenbusch Vorsitzender des Aufsichtsrates: Jürg Marquard Umsatzsteuer-Identifikationsnummer: DE 812 575 276
Hi! In preparation for my upcoming upgrade to PostgreSQL 8.3.5, I have taken the opportunity to try this scenario on a test machinewith the latest PostgreSQL version. Unfortunately the result remains the same, though this database has been justreloaded from a dump and vacuum analyzed. select version() outputs "PostgreSQL 8.3.5 on i686-pc-linux-gnu, compiled byGCC gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)". Kind regards Markus Just for reference: > Now I've got this simple query > > SELECT image_id > FROM image_relation > WHERE entity_id = 69560::integer > ORDER BY last_updated DESC > LIMIT 1; > > which currently runs for something around 600ms. Here's the explain > analyze output: > > "Limit (cost=0.00..144.78 rows=1 width=12) (actual > time=599.745..599.747 rows=1 loops=1)" " -> Index Scan Backward > using idx_image_relation_last_updated on image_relation > (cost=0.00..39525.70 rows=273 width=12) (actual time=599.741..599.741 > rows=1 loops=1)" " Filter: (entity_id = 69560)" "Total > runtime: 599.825 ms" > SELECT image_id > FROM image_relation > WHERE entity_id = 69560 > AND entity_id = entity_id > ORDER BY last_updated DESC > LIMIT 1 > > "Limit (cost=881.82..881.82 rows=1 width=12) (actual > time=0.097..0.099 rows=1 loops=1)" " -> Sort (cost=881.82..881.82 > rows=1 width=12) (actual time=0.094..0.094 rows=1 loops=1)" " > Sort Key: last_updated" " -> Index Scan using > idx_image_relation_entity_id on image_relation (cost=0.00..881.81 > rows=1 width=12) (actual time=0.063..0.075 rows=3 loops=1)" " > Index Cond: (entity_id = 69560)" " Filter: (entity_id = > entity_id)" "Total runtime: 0.128 ms" Computec Media AG Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818) Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Rainer Rosenbusch Vorsitzender des Aufsichtsrates: Jürg Marquard Umsatzsteuer-Identifikationsnummer: DE 812 575 276
On Wed, Nov 12, 2008 at 04:15:23PM +0100, Markus Wollny wrote: > I've got this simple query > > SELECT image_id > FROM image_relation > WHERE entity_id = 69560::integer > ORDER BY last_updated DESC > LIMIT 1; > > which currently runs for something around 600ms. Here's the explain analyze output: > > "Limit (cost=0.00..144.78 rows=1 width=12) (actual time=599.745..599.747 rows=1 loops=1)" > " -> Index Scan Backward using idx_image_relation_last_updated on image_relation (cost=0.00..39525.70 rows=273 width=12)(actual time=599.741..599.741 rows=1 loops=1)" > " Filter: (entity_id = 69560)" > "Total runtime: 599.825 ms" The database would appear to be thinking that it's better off running through time backwards to find the entry than searching for the entry directly. This is normally because each entry_id has several rows and running through time would end up doing less work (especially as it wouldn't need to sort the results afterwards). You may have some luck with increasing the statistics target on the entry_id and last_updated columns and re-ANALYZING the table. Then again, the fact that it thinks it's only going to get a single row back when it searches for the entity_id suggests that it's all a bit confused! Sam
Hi! Sam Mason wrote: > You may have some luck with increasing the statistics target on the > entry_id and last_updated columns and re-ANALYZING the table. Then > again, the fact that it thinks it's only going to get a single row > back when it searches for the entity_id suggests that it's all a bit > confused! Thank you for that suggestion. Increasing the statistics target on entity_id from the default 10 to 30 and re-analyzing didthe trick: "Limit (cost=340.75..340.75 rows=1 width=12) (actual time=0.084..0.085 rows=1 loops=1)" " -> Sort (cost=340.75..341.00 rows=103 width=12) (actual time=0.081..0.081 rows=1 loops=1)" " Sort Key: last_updated" " -> Index Scan using idx_image_relation_entity_id on image_relation (cost=0.00..337.30 rows=103 width=12) (actualtime=0.059..0.065 rows=3 loops=1)" " Index Cond: (entity_id = 69560)" "Total runtime: 0.121 ms" Kind regards Markus Computec Media AG Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818) Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Rainer Rosenbusch Vorsitzender des Aufsichtsrates: Jürg Marquard Umsatzsteuer-Identifikationsnummer: DE 812 575 276
On Thu, Nov 13, 2008 at 01:56:11PM +0100, Markus Wollny wrote: > Sam Mason wrote: > > You may have some luck with increasing the statistics target on the > > entry_id and last_updated columns and re-ANALYZING the table. Then > > again, the fact that it thinks it's only going to get a single row > > back when it searches for the entity_id suggests that it's all a bit > > confused! > > Thank you for that suggestion. Increasing the statistics target on > entity_id from the default 10 to 30 and re-analyzing did the trick: Even higher may be good for other entities; it thinks it's getting 103 rows back for this entity, whereas infact it only gets 3 back. Or is, on average, 103 a reasonable guess? > "Limit (cost=340.75..340.75 rows=1 width=12) (actual time=0.084..0.085 rows=1 loops=1)" > " -> Sort (cost=340.75..341.00 rows=103 width=12) (actual time=0.081..0.081 rows=1 loops=1)" > " Sort Key: last_updated" > " -> Index Scan using idx_image_relation_entity_id on image_relation (cost=0.00..337.30 rows=103 width=12) (actualtime=0.059..0.065 rows=3 loops=1)" > " Index Cond: (entity_id = 69560)" > "Total runtime: 0.121 ms" A target over 100 will change the way it does the stats and may produce a better fit; try the query with a few different entities (i.e. ones where you know you've got many rows in the table, and ones where you've only got one or two) and see what numbers it comes back with. The smaller the target is, the faster the queries are planned and larger targets should allow the planner to cope with more uneven datasets. If the distribution is reasonably uniform you should be able to get away with low targets, less even distributions normally require larger targets. Sam