Обсуждение: planner index choice
Hi there,
I have a simple query where I don't understand the planner's choice to
use a particular index.
The main table looks like this:
# \d sq_ast_attr_val
Table "public.sq_ast_attr_val"
Column | Type | Modifiers
-------------+-----------------------+------------------------------
assetid | character varying(15) | not null
attrid | integer | not null
contextid | integer | not null default 0
custom_val | text |
use_default | character(1) | not null default '1'::bpchar
Indexes:
"ast_attr_val_pk" PRIMARY KEY, btree (assetid, attrid, contextid)
"sq_ast_attr_val_assetid" btree (assetid)
"sq_ast_attr_val_attrid" btree (attrid)
"sq_ast_attr_val_concat" btree (((assetid::text || '~'::text) ||
attrid))
"sq_ast_attr_val_contextid" btree (contextid)
The query:
SELECT
assetid, custom_val
FROM
sq_ast_attr_val
WHERE
attrid IN (SELECT attrid FROM sq_ast_attr WHERE name =
'is_contextable' AND (type_code = 'metadata_field_select' OR
owning_type_code = 'metadata_field'))
AND contextid = 0
INTERSECT
SELECT
assetid, custom_val
FROM
sq_ast_attr_val
WHERE
assetid = '62321'
AND contextid = 0;
The explain analyze plan:
http://explain.depesz.com/s/nWs
I'm not sure why it's picking the sq_ast_attr_val_contextid index to do
the contextid = 0 check, the other parts (attrid/assetid) are much more
selective.
If I drop that particular index:
http://explain.depesz.com/s/zp
All (I hope) relevant postgres info:
Centos 5.5 x86_64 running pg8.4.4.
Server has 8gig memory.
# select name, setting, source from pg_settings where name in
('shared_buffers', 'effective_cache_size', 'work_mem');
name | setting
----------------------+--------
shared_buffers | 262144
effective_cache_size | 655360
work_mem | 32768
All planner options are enabled:
# select name, setting, source from pg_settings where name like 'enable_%';
name | setting | source
-------------------+---------+---------
enable_bitmapscan | on | default
enable_hashagg | on | default
enable_hashjoin | on | default
enable_indexscan | on | default
enable_mergejoin | on | default
enable_nestloop | on | default
enable_seqscan | on | default
enable_sort | on | default
enable_tidscan | on | default
Any insights welcome - thanks!
--
Postgresql & php tutorials
http://www.designmagick.com/
Chris <dmagick@gmail.com> writes: > The query: > SELECT > assetid, custom_val > FROM > sq_ast_attr_val > WHERE > attrid IN (SELECT attrid FROM sq_ast_attr WHERE name = > 'is_contextable' AND (type_code = 'metadata_field_select' OR > owning_type_code = 'metadata_field')) > AND contextid = 0 > INTERSECT > SELECT > assetid, custom_val > FROM > sq_ast_attr_val > WHERE > assetid = '62321' > AND contextid = 0; > The explain analyze plan: > http://explain.depesz.com/s/nWs Hrm ... are you *certain* that's an 8.4 server? Because the bit with Index Cond: (sq_ast_attr_val.attrid = "outer".attrid) is a locution that EXPLAIN hasn't used since 8.1, according to a quick check. More recent versions don't say "outer". The actual problem seems to be that choose_bitmap_and() is choosing to add an indexscan on sq_ast_attr_val_contextid, even though this index is a lot less selective than the sq_ast_attr_val_attrid scan it had already picked. I've seen that behavior before, and there were a series of patches back in 2006-2007 that seem to have pretty much fixed it. So that's another reason for suspecting you've got an old server version there... regards, tom lane
Hi,
> Hrm ... are you *certain* that's an 8.4 server?
Yep.
# psql -U postgres -d db
psql (8.4.4)
db=# select version();
version
------------------------------------------------------------------------------------------------------------------
PostgreSQL 8.4.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20080704 (Red Hat 4.1.2-48), 64-bit
(1 row)
> The actual problem seems to be that choose_bitmap_and() is choosing to
> add an indexscan on sq_ast_attr_val_contextid, even though this index
> is a lot less selective than the sq_ast_attr_val_attrid scan it had
> already picked. I've seen that behavior before, and there were a series
> of patches back in 2006-2007 that seem to have pretty much fixed it.
> So that's another reason for suspecting you've got an old server version
> there...
I just recreated the index and re-ran the explain analyze and it doesn't
give the "outer" bit any more - not sure how I got that before.
db=# begin;
BEGIN
db=# create index attr_val_contextid on sq_ast_attr_val(contextid);
CREATE INDEX
db=# analyze sq_ast_attr_val;
ANALYZE
db=# explain analyze SELECT
db-# assetid, custom_val
db-# FROM
db-# sq_ast_attr_val
db-# WHERE
db-# attrid IN (SELECT attrid FROM sq_ast_attr WHERE name =
db(# 'is_contextable' AND (type_code = 'metadata_field_select' OR
db(# owning_type_code = 'metadata_field'))
db-# AND contextid = 0
db-# INTERSECT
db-# SELECT
db-# assetid, custom_val
db-# FROM
db-# sq_ast_attr_val
db-# WHERE
db-# assetid = '62321'
db-# AND contextid = 0;
http://explain.depesz.com/s/br9
Without that index (again with an analyze after doing a rollback):
http://explain.depesz.com/s/gxH
--
Postgresql & php tutorials
http://www.designmagick.com/
> http://explain.depesz.com/s/br9 > http://explain.depesz.com/s/gxH Well, I don't have time to do a thorough analysis right now, but in all the plans you've posted there are quite high values in the "Rows x" column (e.g. the 5727.5 value). That means a significant difference in estimated and actual row number, which may lead to poor choice of indexes etc. The planner may simply think the index is better due to imprecise statistics etc. Try to increase te statistics target for the columns, e.g. ALTER TABLE table ALTER COLUMN column SET STATISTICS integer where "integer" is between 0 and 1000 (the default value is 10 so use 100 or maybe 1000), run analyze and try to run the query again. Tomas
tv@fuzzy.cz writes: >> http://explain.depesz.com/s/br9 >> http://explain.depesz.com/s/gxH > Well, I don't have time to do a thorough analysis right now, but in all > the plans you've posted there are quite high values in the "Rows x" column > (e.g. the 5727.5 value). > That means a significant difference in estimated and actual row number, > which may lead to poor choice of indexes etc. The planner may simply think > the index is better due to imprecise statistics etc. Yeah. The sq_ast_attr_val_attrid scan is a lot more selective than the planner is guessing (3378 rows estimated vs an average of 15 actual), and I think that is making the difference. If you look at the estimated row counts and costs, it's expecting that adding the second index will cut the number of heap fetches about 7x, hence saving somewhere around 4800 cost units in the heapscan step, more than it thinks the indexscan will cost. But taking 15 row fetches down to 2 isn't nearly enough to pay for the extra indexscan. > Try to increase te statistics target for the columns, e.g. > ALTER TABLE table ALTER COLUMN column SET STATISTICS integer It's worth a try but I'm not sure how much it'll help. A different line of attack is to play with the planner cost parameters. In particular, reducing random_page_cost would reduce the estimated cost of the heap fetches and thus discourage it from using the extra index. If you're working with mostly-cached tables then this would probably improve behavior overall, too. regards, tom lane