Re: Postgresql vs. aggregates
От | Scott Marlowe |
---|---|
Тема | Re: Postgresql vs. aggregates |
Дата | |
Msg-id | 1086822759.23855.46.camel@localhost.localdomain обсуждение исходный текст |
Ответ на | Postgresql vs. aggregates (jao@geophile.com) |
Ответы |
Re: Postgresql vs. aggregates
|
Список | pgsql-general |
On Wed, 2004-06-09 at 16:32, jao@geophile.com wrote: > I have an application with a table that tracks objects with a "size" > attribute. What we want to do is to periodically report on the number > of these objects and the sum of the object sizes. The table will > typically have 1-5 million rows. > > I know that the obvious "select count(*), sum(size) ..." is not a good > idea, and we have seen the impact such a query has on performance. > > Then we tried creating triggers to maintain a table with object_count > and sum_size columns: > > create table summary(object_count bigint, sum_size bigint) > > We populate the table with exactly one row, initialized to (0, 0). > A trigger runs an update like this: > > update summary > set object_count = object_count + 1, > sum_size = sum_size + new.size > > The problem is that our application has periods where objects are > being created at a rapid rate. Updates to the summary table slow down > over time, and the slowdown is accompanied by very low CPU idle% as > reported by vmstat. I'm wondering if these data are time sensitive, and if so, how sensitive. If they could be up to say a minute old or something, using a materialized view of some sort might work well here. Or a temp table to hold all the incoming data until it needs to be dumped into the real table. Just a thought, might not work for you. Also, do you have a lot of parallel updates to the same rows, or are they all different rows? If you're doing a lot of updating of the same rows over and over, any database is going to be somewhat slow, either in vacuuming like for postgresql, or locking as in row level locking databases. If they're all different, then this should be no different for postgresql than for a row locking database, since each tuple will only have two or three versions, instead of some smaller percentage of rows having hundreds of versions. Just wondering.
В списке pgsql-general по дате отправления: