Re: Faster distinct query?
| От | hubert depesz lubaczewski |
|---|---|
| Тема | Re: Faster distinct query? |
| Дата | |
| Msg-id | 20210923070435.GB8425@depesz.com обсуждение |
| Ответ на | Faster distinct query? (Israel Brewster <ijbrewster@alaska.edu>) |
| Ответы |
Re: Faster distinct query?
Re: Faster distinct query? |
| Список | pgsql-general |
On Wed, Sep 22, 2021 at 12:05:22PM -0800, Israel Brewster wrote: > I was wondering if there was any way to improve the performance of this query: > > SELECT station,array_agg(distinct(channel)) as channels FROM data GROUP BY station; > > The explain execution plan can be found here: > https://explain.depesz.com/s/mtxB#html <https://explain.depesz.com/s/mtxB#html> > > and it looks pretty straight forward. It does an index_only scan, followed by an aggregate, to produce a result that isa list of stations along with a list of channels associated with each (there can be anywhere from 1 to 3 channels associatedwith each station). This query takes around 5 minutes to run. > > To work around the issue, I created a materialized view that I can update periodically, and of course I can query saidview in no time flat. However, I’m concerned that as the dataset grows, the time it takes to refresh the view will alsogrow (correct me if I am wrong there). > > This is running PostgreSQL 13, and the index referenced is a two-column index on data(station, channel) It looks that there is ~ 170 stations, and ~ 800 million rows int he table. can you tell us how many rows has this: select distinct station, channel from data; If this is not huge, then you can make the query run much faster using skip scan - recursive cte. Best regards, depesz
В списке pgsql-general по дате отправления: