Faster distinct query?

Поиск
Список
Период
Сортировка
От Israel Brewster
Тема Faster distinct query?
Дата
Msg-id 0BB059BF-54BD-46FA-8B26-28FEF9A04198@alaska.edu
обсуждение исходный текст
Ответы Re: Faster distinct query?  ("David G. Johnston" <david.g.johnston@gmail.com>)
Re: Faster distinct query?  (Michael Lewis <mlewis@entrata.com>)
Re: Faster distinct query?  (hubert depesz lubaczewski <depesz@depesz.com>)
Re: Faster distinct query?  (Geoff Winkless <pgsqladmin@geoff.dj>)
Список pgsql-general
I was wondering if there was any way to improve the performance of this query:

SELECT station,array_agg(distinct(channel)) as channels FROM data GROUP BY station;

The explain execution plan can be found here:

and it looks pretty straight forward. It does an index_only scan, followed by an aggregate, to produce a result that is a list of stations along with a list of channels associated with each (there can be anywhere from 1 to 3 channels associated with each station). This query takes around 5 minutes to run.

To work around the issue, I created a materialized view that I can update periodically, and of course I can query said view in no time flat. However, I’m concerned that as the dataset grows, the time it takes to refresh the view will also grow (correct me if I am wrong there).

This is running PostgreSQL 13, and the index referenced is a two-column index on data(station, channel)
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Remove duplicated row in pg_largeobject_metadata
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: Faster distinct query?