Re: sequential scan on select distinct

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: sequential scan on select distinct
Дата
Msg-id 874ql7ztnb.fsf@stark.xeocode.com
обсуждение исходный текст
Ответ на Re: sequential scan on select distinct  (Pierre-Frédéric Caillaud<lists@boutiquenumerique.com>)
Ответы Re: sequential scan on select distinct  (Pierre-Frédéric Caillaud<lists@boutiquenumerique.com>)
Re: sequential scan on select distinct  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
Pierre-Frédéric Caillaud <lists@boutiquenumerique.com> writes:

>     I don't know WHY (oh why) postgres does not use this kind of strategy
> when distinct'ing an indexed field... Anybody got an idea ?

Well there are two questions here. Why given the current plans available does
postgres choose a sequential scan instead of an index scan. And why isn't
there this kind of "skip index scan" available.

Postgres chooses a sequential scan with a sort (or hash aggregate) over an
index scan because it expects it to be faster. sequential scans are much
faster than random access scans of indexes, plus index scans need to read many
more blocks. If you're finding the index scan to be just as fast as sequential
scans you might consider lowering random_page_cost closer to 1.0. But note
that you may be getting fooled by a testing methodology where more things are
cached than would be in production.

why isn't a "skip index scan" plan available? Well, nobody's written the code
yet. It would part of the same code needed to get an index scan used for:

    select y,min(x) from bar group by y

And possibly also related to the TODO item:

    Use index to restrict rows returned by multi-key index when used with
    non-consecutive keys to reduce heap accesses

    For an index on col1,col2,col3, and a WHERE clause of col1 = 5 and col3 =
    9, spin though the index checking for col1 and col3 matches, rather than
    just col1


Note that the optimizer would have to make a judgement call based on the
expected number of distinct values. If you had much more than 256 distinct
values then the your plpgsql function wouldn't have performed well at all.

--
greg

В списке pgsql-performance по дате отправления:

Предыдущее
От: Ole Langbehn
Дата:
Сообщение: Re: sequential scan on select distinct
Следующее
От: SZUCS Gábor
Дата:
Сообщение: Re: Excessive context switching on SMP Xeons