Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Дата
Msg-id 27971.967041030@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan  (Jules Bean <jules@jellybean.co.uk>)
Ответы Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan  (Tiago Antão <tra@fct.unl.pt>)
Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan  (Jules Bean <jules@jellybean.co.uk>)
Список pgsql-hackers
Jules Bean <jules@jellybean.co.uk> writes:
> I have in a table a 'category' column which takes a small number of
> (basically fixed) values.  Here by 'small', I mean ~1000, while the
> table itself has ~10 000 000 rows. Some categories have many, many
> more rows than others.  In particular, there's one category which hits
> over half the rows.  Because of this (AIUI) postgresql assumes
> that the query
>    select ... from thistable where category='something'
> is best served by a seqscan, even though there is an index on
> category.

Yes, we know about that one.  We have stats about the most common value
in a column, but no information about how the less-common values are
distributed.  We definitely need stats about several top values not just
one, because this phenomenon of a badly skewed distribution is pretty
common.

BTW, if your highly-popular value is actually a dummy value ('UNKNOWN'
or something like that), a fairly effective workaround is to replace the
dummy entries with NULL.  The system does account for NULLs separately
from real values, so you'd then get stats based on the most common
non-dummy value.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: New MAC OUI capabilities
Следующее
От: Tom Lane
Дата:
Сообщение: Re: analyze.c