Re: SELECT DISTINCT never uses an index?

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: SELECT DISTINCT never uses an index?
Дата	8 июля 2016 г. 01:53:40
Msg-id	1564.1467932013@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Re: SELECT DISTINCT never uses an index? (Robert Haas <robertmhaas@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

Robert Haas <robertmhaas@gmail.com> writes:
> The alternative worth considering is presumably something like:

> GroupAggregate
> -> Index Only Scan on grue_size

> Scanning an entire index in order is pretty expensive, but it seems
> possible that this could be faster than the Seq Scan, especially on a
> table with other wide columns, because then the index might be a lot
> smaller than the table.  Even if the index traversal generates some
> random I/O, if it's sufficiently smaller than the table you will still
> come out ahead.  I'm not positive that the planner will actually
> consider this plan,

Of course it does.  Simple example in the regression database:

regression=# explain select distinct unique1 from tenk1;                                        QUERY PLAN
                         
 
--------------------------------------------------------------------------------
------------Unique  (cost=0.29..295.29 rows=10000 width=4)  ->  Index Only Scan using tenk1_unique1 on tenk1
(cost=0.29..270.29rows=100
 
00 width=4)
(2 rows)

I think though that this depends on being an IOS, with a fairly wide and
all-all-visible table, in order for the cost estimate to come out cheaper
than a seqscan.  If you disable IOS then the planner's second choice is
a seqscan:

regression=# set enable_indexonlyscan to 0;
SET
regression=# explain select distinct unique1 from tenk1;                          QUERY PLAN

-----------------------------------------------------------------HashAggregate  (cost=483.00..583.00 rows=10000
width=4) Group Key: unique1  ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=4)
 
(3 rows)

A whole-table plain indexscan, or IOS with any significant number of heap
probes needed, is not going to be preferred over a seqscan because of the
amount of random I/O it implies.

> We're probably missing a few tricks on queries of this type. If the
> index-traversal machinery had a mechanism to skip quickly to the next
> distinct value, that could be used here:

Yeah, I suspect Bill was imagining that that sort of plan could be
used; but it requires execution machinery we have not got.
        regards, tom lane

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Simon Riggs
Дата: 08 июля 2016 г., 01:43:53
Сообщение: Re: MVCC overheads

Следующее

От: Simon Riggs
Дата: 08 июля 2016 г., 02:15:29
Сообщение: Re: A Modest Upgrade Proposal

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: SELECT DISTINCT never uses an index?

Предыдущее

Следующее