Re: select count(distinct ...) is slower than select distinct in about 5x

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: select count(distinct ...) is slower than select distinct in about 5x
Дата
Msg-id CAMkU=1wFdvb76NWVw=zLKF3H6-tn=+z9o_Nv2oioWfy6Z8Xo0A@mail.gmail.com
обсуждение исходный текст
Ответ на select count(distinct ...) is slower than select distinct in about 5x  (jacket41142 <jacket41142@gmail.com>)
Список pgsql-performance
On Tue, Dec 10, 2013 at 9:28 AM, jacket41142 <jacket41142@gmail.com> wrote:
 

test=> select distinct col_int from t1 group by col_int;
Time: 1177.936 ms

So the performance difference is not very large.
But when I do that:

test=> select count(distinct col_int) from t1;
 count
-------
  1025
(1 row)

Time: 7367.476 ms


count(distinct ...) always sorts, rather than using a hash, to do its work.  I don't think that there is any fundamental reason that it could not be changed to allow it to use hashing, it just hasn't been done yet.  It is complicated by the fact that you can have multiple count() expressions in the same query which demand sorting/grouping on different columns.

Cheers,

Jeff

В списке pgsql-performance по дате отправления:

Предыдущее
От: Kevin Grittner
Дата:
Сообщение: Re: select count(distinct ...) is slower than select distinct in about 5x
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: Parallel Select query performance and shared buffers