Re: Faster distinct query?

Поиск
Список
Период
Сортировка
От hubert depesz lubaczewski
Тема Re: Faster distinct query?
Дата
Msg-id 20210923070435.GB8425@depesz.com
обсуждение исходный текст
Ответ на Faster distinct query?  (Israel Brewster <ijbrewster@alaska.edu>)
Ответы Re: Faster distinct query?  (Ryan Booz <ryan@timescale.com>)
Re: Faster distinct query?  (Israel Brewster <ijbrewster@alaska.edu>)
Список pgsql-general
On Wed, Sep 22, 2021 at 12:05:22PM -0800, Israel Brewster wrote:
> I was wondering if there was any way to improve the performance of this query:
> 
> SELECT station,array_agg(distinct(channel)) as channels FROM data GROUP BY station;
> 
> The explain execution plan can be found here:
> https://explain.depesz.com/s/mtxB#html <https://explain.depesz.com/s/mtxB#html>
> 
> and it looks pretty straight forward. It does an index_only scan, followed by an aggregate, to produce a result that
isa list of stations along with a list of channels associated with each (there can be anywhere from 1 to 3 channels
associatedwith each station). This query takes around 5 minutes to run.
 
> 
> To work around the issue, I created a materialized view that I can update periodically, and of course I can query
saidview in no time flat. However, I’m concerned that as the dataset grows, the time it takes to refresh the view will
alsogrow (correct me if I am wrong there).
 
> 
> This is running PostgreSQL 13, and the index referenced is a two-column index on data(station, channel)

It looks that there is ~ 170 stations, and ~ 800 million rows int he
table.

can you tell us how many rows has this:

select distinct station, channel from data;

If this is not huge, then you can make the query run much faster using
skip scan - recursive cte.

Best regards,

depesz




В списке pgsql-general по дате отправления:

Предыдущее
От: Mladen Gogala
Дата:
Сообщение: Re: Faster distinct query?
Следующее
От: Tobias Meyer
Дата:
Сообщение: Re: Remove duplicated row in pg_largeobject_metadata