Re: very slow query to summarize data for a month ...

Поиск

Список

Период

Сортировка

От	Greg Stark
Тема	Re: very slow query to summarize data for a month ...
Дата	12 ноября 2003 г. 04:53:18
Msg-id	87oevii15f.fsf@stark.dyndns.tv обсуждение исходный текст
Ответ на	Re: very slow query to summarize data for a month ... ("Marc G. Fournier" <scrappy@postgresql.org>)
Ответы	Re: very slow query to summarize data for a month ...
Список	pgsql-performance

Дерево обсуждения

"Marc G. Fournier" <scrappy@postgresql.org> writes:

> Just as a side note, just doing a straight scan for the records, with no
> SUM()/GROUP BY involved, with the month_trunc() index is still >8k msec:

Well so the problem isn't the query at all, you just have too much data to
massage online. You can preprocess the data offline into a more managable
amount of data for your online reports.

What I used to do for a similar situation was to do hourly queries sort of
like this:

insert into data_aggregate (day, hour, company_id, total_bytes)
 (select trunc(now(),'day'), trunc(now(), 'hour'), company_id, sum(bytes)
    from raw_data
   where time between trunc(now(),'hour') and trunc(now(),'hour')+'1 hour'::interval
   group by company_id
 )

[this was actually on oracle and the data looked kind of different, i'm making
this up as i go along]

Then later the reports could run quickly based on data_aggregate instead of
slowly based on the much larger data set accumulated by the minute. Once I had
this schema set up it was easy to follow it for all of the rapidly growing
data tables.

Now in my situation I had thousands of records accumulating per second, so
hourly was already a big win. I originally chose hourly because I thought I
might want time-of-day reports but that never panned out. On the other hand it
was a win when the system broke once because I could easily see that and fix
it before midnight when it would have actually mattered. Perhaps in your
situation you would want daily aggregates or something else.

One of the other advantages of these aggregate tables was that we could purge
the old data much sooner with much less resistance from the business. Since
the reports were all still available and a lot of ad-hoc queries could still
be done without the raw data anyways.

Alternatively you can just give up on online reports. Eventually you'll have
some query that takes way more than 8s anyways. You can pregenerate the entire
report as a batch job instead. Either send it off as a nightly e-mail, store
it as an html or csv file for the web server, or (my favourite) store the data
for the report as an sql table and then have multiple front-ends that do a
simple "select *" to pull the data and format it.

--
greg

В списке pgsql-performance по дате отправления:

Предыдущее

От: "Fred Moyer"
Дата: 12 ноября 2003 г., 02:18:08
Сообщение: Re: Value of Quad vs. Dual Processor machine

Следующее

От: Shridhar Daithankar
Дата: 12 ноября 2003 г., 07:43:42
Сообщение: Re: Value of Quad vs. Dual Processor machine

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: *very* slow query to summarize data for a month ...

Предыдущее

Следующее

Re: very slow query to summarize data for a month ...