Efficient count(distinct x) query question

Поиск

Список

Период

Сортировка

От	Sefer Tov
Тема	Efficient count(distinct x) query question
Дата	16 января 2011 г. 16:36:39
Msg-id	BAY150-w37665BB920D0A4FACDE22FA8F50@phx.gbl обсуждение исходный текст
Список	pgsql-general

Дерево обсуждения

Hi,

I have several queries that perform something like:

select count(data) as count1, count(distinct data) as count2 from large_table group by user;

My problem is that this table contains about 500M records and the moment I perform a "count(distinct ...)" the planner always solved it using sorting (once the data is sorted this clearly becomes an easy problem to solve).

I was wondering whether PostgreSql will be introducing "hash of hashes" support for solving this (the first hash part of the "group by" HashAggregate and the inner hashes for tracking the distinct keys).

When I consider sorting large volumes of data at "n*log(n)" using external disk, then cannot hashing be faster (even if the inner-hashes also use some external storage). When "n" is fairly large, the logarithmic factor becomes a dominant factor.

I'd love to hear thoughts and ideas on that.

Thanks,

Sefer.

В списке pgsql-general по дате отправления:

Предыдущее

От: Andy Colson
Дата: 16 января 2011 г., 16:28:14
Сообщение: Re: database slowdown

Следующее

От: Tom Lane
Дата: 16 января 2011 г., 16:36:42
Сообщение: Re: Trigger Performance

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Efficient count(distinct x) query question

Предыдущее

Следующее