Обсуждение: planner index choice
Hi there, I have a simple query where I don't understand the planner's choice to use a particular index. The main table looks like this: # \d sq_ast_attr_val Table "public.sq_ast_attr_val" Column | Type | Modifiers -------------+-----------------------+------------------------------ assetid | character varying(15) | not null attrid | integer | not null contextid | integer | not null default 0 custom_val | text | use_default | character(1) | not null default '1'::bpchar Indexes: "ast_attr_val_pk" PRIMARY KEY, btree (assetid, attrid, contextid) "sq_ast_attr_val_assetid" btree (assetid) "sq_ast_attr_val_attrid" btree (attrid) "sq_ast_attr_val_concat" btree (((assetid::text || '~'::text) || attrid)) "sq_ast_attr_val_contextid" btree (contextid) The query: SELECT assetid, custom_val FROM sq_ast_attr_val WHERE attrid IN (SELECT attrid FROM sq_ast_attr WHERE name = 'is_contextable' AND (type_code = 'metadata_field_select' OR owning_type_code = 'metadata_field')) AND contextid = 0 INTERSECT SELECT assetid, custom_val FROM sq_ast_attr_val WHERE assetid = '62321' AND contextid = 0; The explain analyze plan: http://explain.depesz.com/s/nWs I'm not sure why it's picking the sq_ast_attr_val_contextid index to do the contextid = 0 check, the other parts (attrid/assetid) are much more selective. If I drop that particular index: http://explain.depesz.com/s/zp All (I hope) relevant postgres info: Centos 5.5 x86_64 running pg8.4.4. Server has 8gig memory. # select name, setting, source from pg_settings where name in ('shared_buffers', 'effective_cache_size', 'work_mem'); name | setting ----------------------+-------- shared_buffers | 262144 effective_cache_size | 655360 work_mem | 32768 All planner options are enabled: # select name, setting, source from pg_settings where name like 'enable_%'; name | setting | source -------------------+---------+--------- enable_bitmapscan | on | default enable_hashagg | on | default enable_hashjoin | on | default enable_indexscan | on | default enable_mergejoin | on | default enable_nestloop | on | default enable_seqscan | on | default enable_sort | on | default enable_tidscan | on | default Any insights welcome - thanks! -- Postgresql & php tutorials http://www.designmagick.com/
Chris <dmagick@gmail.com> writes: > The query: > SELECT > assetid, custom_val > FROM > sq_ast_attr_val > WHERE > attrid IN (SELECT attrid FROM sq_ast_attr WHERE name = > 'is_contextable' AND (type_code = 'metadata_field_select' OR > owning_type_code = 'metadata_field')) > AND contextid = 0 > INTERSECT > SELECT > assetid, custom_val > FROM > sq_ast_attr_val > WHERE > assetid = '62321' > AND contextid = 0; > The explain analyze plan: > http://explain.depesz.com/s/nWs Hrm ... are you *certain* that's an 8.4 server? Because the bit with Index Cond: (sq_ast_attr_val.attrid = "outer".attrid) is a locution that EXPLAIN hasn't used since 8.1, according to a quick check. More recent versions don't say "outer". The actual problem seems to be that choose_bitmap_and() is choosing to add an indexscan on sq_ast_attr_val_contextid, even though this index is a lot less selective than the sq_ast_attr_val_attrid scan it had already picked. I've seen that behavior before, and there were a series of patches back in 2006-2007 that seem to have pretty much fixed it. So that's another reason for suspecting you've got an old server version there... regards, tom lane
Hi, > Hrm ... are you *certain* that's an 8.4 server? Yep. # psql -U postgres -d db psql (8.4.4) db=# select version(); version ------------------------------------------------------------------------------------------------------------------ PostgreSQL 8.4.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-48), 64-bit (1 row) > The actual problem seems to be that choose_bitmap_and() is choosing to > add an indexscan on sq_ast_attr_val_contextid, even though this index > is a lot less selective than the sq_ast_attr_val_attrid scan it had > already picked. I've seen that behavior before, and there were a series > of patches back in 2006-2007 that seem to have pretty much fixed it. > So that's another reason for suspecting you've got an old server version > there... I just recreated the index and re-ran the explain analyze and it doesn't give the "outer" bit any more - not sure how I got that before. db=# begin; BEGIN db=# create index attr_val_contextid on sq_ast_attr_val(contextid); CREATE INDEX db=# analyze sq_ast_attr_val; ANALYZE db=# explain analyze SELECT db-# assetid, custom_val db-# FROM db-# sq_ast_attr_val db-# WHERE db-# attrid IN (SELECT attrid FROM sq_ast_attr WHERE name = db(# 'is_contextable' AND (type_code = 'metadata_field_select' OR db(# owning_type_code = 'metadata_field')) db-# AND contextid = 0 db-# INTERSECT db-# SELECT db-# assetid, custom_val db-# FROM db-# sq_ast_attr_val db-# WHERE db-# assetid = '62321' db-# AND contextid = 0; http://explain.depesz.com/s/br9 Without that index (again with an analyze after doing a rollback): http://explain.depesz.com/s/gxH -- Postgresql & php tutorials http://www.designmagick.com/
> http://explain.depesz.com/s/br9 > http://explain.depesz.com/s/gxH Well, I don't have time to do a thorough analysis right now, but in all the plans you've posted there are quite high values in the "Rows x" column (e.g. the 5727.5 value). That means a significant difference in estimated and actual row number, which may lead to poor choice of indexes etc. The planner may simply think the index is better due to imprecise statistics etc. Try to increase te statistics target for the columns, e.g. ALTER TABLE table ALTER COLUMN column SET STATISTICS integer where "integer" is between 0 and 1000 (the default value is 10 so use 100 or maybe 1000), run analyze and try to run the query again. Tomas
tv@fuzzy.cz writes: >> http://explain.depesz.com/s/br9 >> http://explain.depesz.com/s/gxH > Well, I don't have time to do a thorough analysis right now, but in all > the plans you've posted there are quite high values in the "Rows x" column > (e.g. the 5727.5 value). > That means a significant difference in estimated and actual row number, > which may lead to poor choice of indexes etc. The planner may simply think > the index is better due to imprecise statistics etc. Yeah. The sq_ast_attr_val_attrid scan is a lot more selective than the planner is guessing (3378 rows estimated vs an average of 15 actual), and I think that is making the difference. If you look at the estimated row counts and costs, it's expecting that adding the second index will cut the number of heap fetches about 7x, hence saving somewhere around 4800 cost units in the heapscan step, more than it thinks the indexscan will cost. But taking 15 row fetches down to 2 isn't nearly enough to pay for the extra indexscan. > Try to increase te statistics target for the columns, e.g. > ALTER TABLE table ALTER COLUMN column SET STATISTICS integer It's worth a try but I'm not sure how much it'll help. A different line of attack is to play with the planner cost parameters. In particular, reducing random_page_cost would reduce the estimated cost of the heap fetches and thus discourage it from using the extra index. If you're working with mostly-cached tables then this would probably improve behavior overall, too. regards, tom lane