Re: Questions on 7.2.1 query plan choices

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Questions on 7.2.1 query plan choices
Дата
Msg-id 7082.1019108945@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Questions on 7.2.1 query plan choices  (Ed Loehr <pggeneral@bluepolka.net>)
Ответы Re: Questions on 7.2.1 query plan choices  (Curt Sampson <cjs@cynic.net>)
Список pgsql-general
Ed Loehr <pggeneral@bluepolka.net> writes:
> $ psql -c "explain select key, value from freetext where tobeindexed = 't'
> and isindexed = 'f'
> NOTICE:  QUERY PLAN:

> Seq Scan on freetext  (cost=0.00..102114.21 rows=296161 width=1138)

> $ psql -c "select count(key) from freetext"
>   count
> --------
>   728868
> (1 row)

> $ psql -c "select count(key) from freetext where tobeindexed = 't' and
> isindexed = 'f'"
>   count
> -------
>    1319
> (1 row)

The problem here is that the planner is estimating 296161 rows retrieved
instead of 1319.  If it were right, then a seqscan would be the right
choice.  My guess is that there is a strong correlation between the
tobeindexed and isindexed columns --- but the current statistical model
has no clue about cross-column correlations, so you get an estimate
that's just based on the product of the frequencies independently.

Curt Sampson's nearby remarks about partial indexes are not a bad
suggestion.  An even more direct attack is to combine these two
columns into a single column with four states (you could use smallint
or "char"-with-the-quotes).  The 7.2 planner *would* have a pretty
good idea about the relative frequencies of the different states, and
would make the right seqscan-vs-indexscan choice depending on which
state you were scanning for.

            regards, tom lane

В списке pgsql-general по дате отправления:

Предыдущее
От: Curt Sampson
Дата:
Сообщение: Re: Questions on 7.2.1 query plan choices
Следующее
От: Curt Sampson
Дата:
Сообщение: Re: Questions on 7.2.1 query plan choices