Re: Optimizing tuning and table design for large analytics DB

Поиск
Список
Период
Сортировка
От John R Pierce
Тема Re: Optimizing tuning and table design for large analytics DB
Дата
Msg-id 4A032D9F.9080200@hogranch.com
обсуждение исходный текст
Ответ на Optimizing tuning and table design for large analytics DB  (Rob W <digital_illuminati@yahoo.com>)
Список pgsql-general
Rob W wrote:
> Can anyone point me towards good articles or books that would help a PostgreSQL novice (i.e. me) learn the optimal
approachesto setting up a DB for analytics? 
>
> In this particular case, I need to efficiently analyze approximately 300 million system log events (i.e. time series
data).It's log data, so it's only appended to the table, not inserted and is never modified. Only 90 days worth of data
willbe retained, so old records need to be deleted periodically. Query performance will only be important for small
subsetsof the data (e.g. when analyzing a week or day's worth of data), the rest of the reports will be run in batch
mode.There will likely only be one user at a time doing ad-hoc queries. 
>
> This is a a follow-up to the earlier suggestions that PostgreSQL will handle the volumes of data I plan to work with,
soI figured I'd give it a shot.  
>


one approach to speeding up the handling of time expired data like this
is to partition it, maybe by week.   eg, you create a seperate table for
each of 14 weeks, and have a view that joins them all for doing
queries.  you insert your new records to the latest week table, then
each week truncate the oldest week table and switch to using that one
for the new inserts....   this is more efficient than having one large
table and deleting individual rows.

you can speed up the inserts some by doing them in batches, for
instance, collecting a few minutes worth of new records, and inserting
them all as one transaction.   depending on how many fields of these
tables are indexed, this can greatly reduce the overhead of maintaining
those indices.

see http://www.postgresql.org/docs/current/static/ddl-partitioning.html
for more on this sort of partitioning.   Above, I mentioned using a view
to read the whole table as a join, this page discusses using inheritance
instead, which has advantages.

if your reporting requirements include the sorts of statistics that can
be precomputed, it can be advantageous to keep a set of running tallies
in separate tables, like per hour and per day counts for each event
class, which can be used to reduce the amount of bulk querying required
to generate statistical count reports. of course, these tally tables
also need aging, but there's much MUCH less data in them so conventional
row deletes is probably fine.



В списке pgsql-general по дате отправления:

Предыдущее
От: Scott Marlowe
Дата:
Сообщение: Re: Upgrading from postgres 8.1 to 8.3
Следующее
От: Scott Mead
Дата:
Сообщение: Re: Upgrading from postgres 8.1 to 8.3