Use of multipart index with "IN"

Поиск
Список
Период
Сортировка
От Rob Messer
Тема Use of multipart index with "IN"
Дата
Msg-id 20031023181831.40095.qmail@web41215.mail.yahoo.com
обсуждение исходный текст
Ответ на Re: Moving postgresql.conf tunables into 2003...  (Sean Chittenden <sean@chittenden.org>)
Ответы Re: Use of multipart index with "IN"
Список pgsql-performance
I have a reporting system that does regular queries on a table with a
multipart index.  I am running version 7.3.4.  Here is the table
definition:

               Table "public.ds_rec_fld"
    Column     |          Type           | Modifiers
---------------+-------------------------+-----------
 dsid          | character varying(20)   | not null
 recid         | integer                 | not null
 field_name    | character varying(20)   | not null
 option_tag    | character varying(10)   | not null
 option_value  | integer                 |
 field_text    | character varying(2000) |
 field_type_cd | character varying(8)    |
Indexes: ds_rf_ndx1 btree (recid, field_name, option_value)

Normally queries are done using recid and field_name, so Postgresql
returns rows very quickly as expected.  Here is a sample explain
analyze output for a typical query:

db=> explain analyze
db-> select field_name, option_tag from ds_rec_fld where recid = 3000
and field_name = 'Q3A1';
                                                       QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------
 Index Scan using ds_rf_ndx1 on ds_rec_fld  (cost=0.00..163.09 rows=40
width=38) (actual time=0.06..0.07 rows=1 loops=1)
   Index Cond: ((recid = 3001) AND (field_name = 'Q3A1'::character
varying))
 Total runtime: 0.12 msec
(3 rows)

The problem comes in when we are selecting multiple field_name values
in one query.  The normal SQL syntax we have been using is like this:

select field_name, option_tag from ds_rec_fld where recid = 3001 and
field_name in ('Q3A1', 'Q3A9');

This is just a simplified example, at times there can be a lot of
field_name values in one query in the "in" clause.  Here postgresql
refuses to use the full index, instead doing a filter based on part of
the first recid part of index.  Here is the explain analyze output:

 Index Scan using ds_rf_ndx1 on ds_rec_fld  (cost=0.00..30425.51
rows=80 width=38) (actual time=0.18..1.08 rows=2 loops=1)
   Index Cond: (recid = 3001)
   Filter: ((field_name = 'Q3A1'::character varying) OR (field_name =
'Q3A9'::character varying))
 Total runtime: 1.12 msec
(4 rows)

So, 10 times longer.  This is an issue because at times we are
iterating through thousands of recid values.

I did a vacuum analyze, adjusted random_page_cost, etc. all to no
avail.

I also noticed that the problem goes away when I reformat the query
like this:

select field_name, option_tag from ds_rec_fld where
(recid = 3001 and field_name = 'Q3A1') or
(recid = 3001 and field_name = 'Q3A9')

Here is the explain analyze output for this:

  Index Scan using ds_rf_ndx1, ds_rf_ndx1 on ds_rec_fld
(cost=0.00..326.57 rows=80 width=38) (actual time=0.07..0.10 rows=2
loops=1)
   Index Cond: (((recid = 3001) AND (field_name = 'Q3A1'::character
varying)) OR ((recid = 3001) AND (field_name = 'Q3A9'::character
varying)))
 Total runtime: 0.16 msec
(3 rows)

Much better.  So I have partially solved my own problem, but there are
other places that this is not this simple to fix.

Therefore, my question is, is there some way to force postgresql to use
the full index and still stick with the shorter "field_name in ('...',
'...')" syntax?

If anyone has any thoughts please let me know.  Also it strikes me that
perhaps the optimizer could be tweaked to treat the first case like the
second one.  Thanks in advance,

Rob

__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

В списке pgsql-performance по дате отправления:

Предыдущее
От: "John Pagakis"
Дата:
Сообщение: Performance Concern
Следующее
От: William Yu
Дата:
Сообщение: Re: Tuning for mid-size server