Обсуждение: Trigger or Function
Hello, I'm a postgres newbie and am wondering what's the best way to do this. I am gathering some data and will be inserting to a table once daily. The table is quite simple but I want the updates to be as efficient as possible since this db is part of a big data project. Say I have a table with these columns: | Date | Hostname | DayVal | WeekAvg | MonthAvg | When I insert a new row I have the values for Date, Hostname, DayVal. Is it possible to define the table is such a way that the WeekAvg and MonthAvg are automatically updated as follows? WeekAvg = current rows DayVal plus the sum of DayVal for the previous 6 rows. MonthAvg = current row's DayVal plus the sum of DayVal for the previous 29 rows. Should I place the logic in a Trigger or in a Function? Does someone have an example or a link showing how I could set this up? Regards, Alan
On Tue, Jul 12, 2011 at 9:41 AM, alan <alan.miller3@gmail.com> wrote: > Hello, > I'm a postgres newbie and am wondering what's the best way to do this. > > I am gathering some data and will be inserting to a table once daily. > The table is quite simple but I want the updates to be as efficient as > possible since > this db is part of a big data project. > > Say I have a table with these columns: > | Date | Hostname | DayVal | WeekAvg | MonthAvg | > > When I insert a new row I have the values for Date, Hostname, DayVal. > Is it possible to define the table is such a way that the WeekAvg and > MonthAvg > are automatically updated as follows? > WeekAvg = current rows DayVal plus the sum of DayVal for the > previous 6 rows. > MonthAvg = current row's DayVal plus the sum of DayVal for the > previous 29 rows. > > Should I place the logic in a Trigger or in a Function? > Does someone have an example or a link showing how I could set this > up? IMHO that design does not fit the relational model well because you are trying to store multirow aggregate values in individual rows. For example, your values will be wrong if you insert rows in the wrong order (i.e. today's data before yesterday's data). My first approach would be to remove WeekAvg and MonthAvg from the table and create a view which calculates appropriate values. If that proves too inefficient (e.g. because the data set is too huge and too much data is queried for individual queries) we can start optimizing. One approach to optimizing would be to have secondary tables | Week | Hostname | WeekAvg | | Month | Hostname | MonthAvg | and update them with an insert trigger and probably also with an update and delete trigger. If you actually need increasing values (i.e. running totals) you can use windowing functions (analytic SQL in Oracle). View definitions then of course need to change. http://www.postgresql.org/docs/9.0/interactive/queries-table-expressions.html#QUERIES-WINDOW Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
> My first approach would be to remove WeekAvg and MonthAvg from the
> table and create a view which calculates appropriate values.
Thanks Robert, I had to upgrade to 9.0.4 to use the extended windowing
features.
Here is how I set it up. If anyone sees an issue, please let me know.
I'm new to postgres.
Basically, my "daily_vals" table contains HOST, DATE, & VALUE columns.
What I wanted was a way to automatically populate a 4th column
called "rolling_average", which would be the sum of <n> preceding
columns.
testdb=# select * from daily_vals;
rid | date | host | value
-----+------------+--------+-------------
1 | 2011-07-01 | hosta | 100.0000
2 | 2011-07-02 | hosta | 200.0000
3 | 2011-07-03 | hosta | 400.0000
4 | 2011-07-04 | hosta | 500.0000
5 | 2011-07-05 | hosta | 100.0000
6 | 2011-07-06 | hosta | 700.0000
7 | 2011-07-07 | hosta | 200.0000
8 | 2011-07-08 | hosta | 100.0000
9 | 2011-07-09 | hosta | 100.0000
10 | 2011-07-10 | hosta | 100.0000
11 | 2011-07-01 | hostb | 5.7143
12 | 2011-07-02 | hostb | 8.5714
13 | 2011-07-03 | hostb | 11.4286
14 | 2011-07-04 | hostb | 8.5714
15 | 2011-07-05 | hostb | 2.8571
16 | 2011-07-06 | hostb | 1.4286
17 | 2011-07-07 | hostb | 1.4286
I created a view called weekly_average using this VIEW statement.
CREATE OR REPLACE
VIEW weekly_average
AS SELECT *, sum(value) OVER (PARTITION BY host
ORDER BY rid
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as rolling_average FROM daily_vals;
The I query the view just like a regular table.
the rolling average is calulated from the previuous 6 rows (for each
host).
testdb=# select * from weekly_average;
rid | date | host | value | rolling_average
-----+------------+--------+----------+------------------
1 | 2011-07-01 | hosta | 100.0000 | 100.0000
2 | 2011-07-02 | hosta | 200.0000 | 300.0000
3 | 2011-07-03 | hosta | 400.0000 | 700.0000
4 | 2011-07-04 | hosta | 500.0000 | 1200.0000
5 | 2011-07-05 | hosta | 100.0000 | 1300.0000
6 | 2011-07-06 | hosta | 700.0000 | 2000.0000
7 | 2011-07-07 | hosta | 200.0000 | 1400.0000
8 | 2011-07-08 | hosta | 100.0000 | 1400.0000
9 | 2011-07-09 | hosta | 100.0000 | 1200.0000
10 | 2011-07-10 | hosta | 100.0000 | 600.0000
11 | 2011-07-01 | hostb | 5.7143 | 5.7143
12 | 2011-07-02 | hostb | 8.5714 | 14.2857
13 | 2011-07-03 | hostb | 11.4286 | 25.7143
14 | 2011-07-04 | hostb | 8.5714 | 34.2857
15 | 2011-07-05 | hostb | 2.8571 | 37.1428
16 | 2011-07-06 | hostb | 1.4286 | 38.5714
17 | 2011-07-07 | hostb | 1.4286 | 40.0000
Alan
On 24/07/11 03:58, alan wrote:
>> My first approach would be to remove WeekAvg and MonthAvg from the
>> table and create a view which calculates appropriate values.
> Thanks Robert, I had to upgrade to 9.0.4 to use the extended windowing
> features.
> Here is how I set it up. If anyone sees an issue, please let me know.
> I'm new to postgres.
>
> Basically, my "daily_vals" table contains HOST, DATE,& VALUE columns.
> What I wanted was a way to automatically populate a 4th column
> called "rolling_average", which would be the sum of<n> preceding
> columns.
>
> testdb=# select * from daily_vals;
> rid | date | host | value
> -----+------------+--------+-------------
> 1 | 2011-07-01 | hosta | 100.0000
> 2 | 2011-07-02 | hosta | 200.0000
> 3 | 2011-07-03 | hosta | 400.0000
> 4 | 2011-07-04 | hosta | 500.0000
> 5 | 2011-07-05 | hosta | 100.0000
> 6 | 2011-07-06 | hosta | 700.0000
> 7 | 2011-07-07 | hosta | 200.0000
> 8 | 2011-07-08 | hosta | 100.0000
> 9 | 2011-07-09 | hosta | 100.0000
> 10 | 2011-07-10 | hosta | 100.0000
> 11 | 2011-07-01 | hostb | 5.7143
> 12 | 2011-07-02 | hostb | 8.5714
> 13 | 2011-07-03 | hostb | 11.4286
> 14 | 2011-07-04 | hostb | 8.5714
> 15 | 2011-07-05 | hostb | 2.8571
> 16 | 2011-07-06 | hostb | 1.4286
> 17 | 2011-07-07 | hostb | 1.4286
>
>
> I created a view called weekly_average using this VIEW statement.
>
> CREATE OR REPLACE
> VIEW weekly_average
> AS SELECT *, sum(value) OVER (PARTITION BY host
> ORDER BY rid
> ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
> ) as rolling_average FROM daily_vals;
>
>
> The I query the view just like a regular table.
> the rolling average is calulated from the previuous 6 rows (for each
> host).
>
> testdb=# select * from weekly_average;
> rid | date | host | value | rolling_average
> -----+------------+--------+----------+------------------
> 1 | 2011-07-01 | hosta | 100.0000 | 100.0000
> 2 | 2011-07-02 | hosta | 200.0000 | 300.0000
> 3 | 2011-07-03 | hosta | 400.0000 | 700.0000
> 4 | 2011-07-04 | hosta | 500.0000 | 1200.0000
> 5 | 2011-07-05 | hosta | 100.0000 | 1300.0000
> 6 | 2011-07-06 | hosta | 700.0000 | 2000.0000
> 7 | 2011-07-07 | hosta | 200.0000 | 1400.0000
> 8 | 2011-07-08 | hosta | 100.0000 | 1400.0000
> 9 | 2011-07-09 | hosta | 100.0000 | 1200.0000
> 10 | 2011-07-10 | hosta | 100.0000 | 600.0000
> 11 | 2011-07-01 | hostb | 5.7143 | 5.7143
> 12 | 2011-07-02 | hostb | 8.5714 | 14.2857
> 13 | 2011-07-03 | hostb | 11.4286 | 25.7143
> 14 | 2011-07-04 | hostb | 8.5714 | 34.2857
> 15 | 2011-07-05 | hostb | 2.8571 | 37.1428
> 16 | 2011-07-06 | hostb | 1.4286 | 38.5714
> 17 | 2011-07-07 | hostb | 1.4286 | 40.0000
>
> Alan
>
>
>
The above gives just the rolling sum, you need to divide by the number
of rows in the sum to get the average (I assume you want the arithmetic
mean, as the are many types of average!).
CREATE OR REPLACE
VIEW weekly_average
AS SELECT
*,
round((sum(value) OVER mywindow / LEAST(6, (row_number() OVER
mywindow))), 4) AS rolling_average
FROM daily_vals
WINDOW mywindow AS
(
PARTITION BY host
ORDER BY rid
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
);
Cheers,
Gavin
On Sat, Jul 30, 2011 at 3:01 AM, Gavin Flower
<GavinFlower@archidevsys.co.nz> wrote:
> On 24/07/11 03:58, alan wrote:
>>>
>>> My first approach would be to remove WeekAvg and MonthAvg from the
>>> table and create a view which calculates appropriate values.
>>
>> Thanks Robert, I had to upgrade to 9.0.4 to use the extended windowing
>> features.
>> Here is how I set it up. If anyone sees an issue, please let me know.
>> I'm new to postgres.
>>
>> Basically, my "daily_vals" table contains HOST, DATE,& VALUE columns.
>> What I wanted was a way to automatically populate a 4th column
>> called "rolling_average", which would be the sum of<n> preceding
>> columns.
There seems to be contradiction in the naming here. Did you mean "avg
of<n> preceding columns."?
>> I created a view called weekly_average using this VIEW statement.
>>
>> CREATE OR REPLACE
>> VIEW weekly_average
>> AS SELECT *, sum(value) OVER (PARTITION BY host
>> ORDER BY rid
>> ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
>> ) as rolling_average FROM daily_vals;
> The above gives just the rolling sum, you need to divide by the number of
> rows in the sum to get the average (I assume you want the arithmetic mean,
> as the are many types of average!).
>
> CREATE OR REPLACE
> VIEW weekly_average
> AS SELECT
> *,
> round((sum(value) OVER mywindow / LEAST(6, (row_number() OVER
> mywindow))), 4) AS rolling_average
> FROM daily_vals
> WINDOW mywindow AS
> (
> PARTITION BY host
> ORDER BY rid
> ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
> );
Why not
CREATE OR REPLACE
VIEW weekly_average
AS SELECT *, avg(value) OVER (PARTITION BY host
ORDER BY rid
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as rolling_average FROM daily_vals;
What did I miss?
Kind regards
robert
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
On 01/08/11 19:18, Robert Klemme wrote: > On Sat, Jul 30, 2011 at 3:01 AM, Gavin Flower > <GavinFlower@archidevsys.co.nz> wrote: >> On 24/07/11 03:58, alan wrote: >>>> My first approach would be to remove WeekAvg and MonthAvg from the >>>> table and create a view which calculates appropriate values. >>> Thanks Robert, I had to upgrade to 9.0.4 to use the extended windowing >>> features. >>> Here is how I set it up. If anyone sees an issue, please let me know. >>> I'm new to postgres. >>> >>> Basically, my "daily_vals" table contains HOST, DATE,& VALUE columns. >>> What I wanted was a way to automatically populate a 4th column >>> called "rolling_average", which would be the sum of<n> preceding >>> columns. > There seems to be contradiction in the naming here. Did you mean "avg > of<n> preceding columns."? > >>> I created a view called weekly_average using this VIEW statement. >>> >>> CREATE OR REPLACE >>> VIEW weekly_average >>> AS SELECT *, sum(value) OVER (PARTITION BY host >>> ORDER BY rid >>> ROWS BETWEEN 6 PRECEDING AND CURRENT ROW >>> ) as rolling_average FROM daily_vals; >> The above gives just the rolling sum, you need to divide by the number of >> rows in the sum to get the average (I assume you want the arithmetic mean, >> as the are many types of average!). >> >> CREATE OR REPLACE >> VIEW weekly_average >> AS SELECT >> *, >> round((sum(value) OVER mywindow / LEAST(6, (row_number() OVER >> mywindow))), 4) AS rolling_average >> FROM daily_vals >> WINDOW mywindow AS >> ( >> PARTITION BY host >> ORDER BY rid >> ROWS BETWEEN 6 PRECEDING AND CURRENT ROW >> ); > Why not > > CREATE OR REPLACE > VIEW weekly_average > AS SELECT *, avg(value) OVER (PARTITION BY host > ORDER BY rid > ROWS BETWEEN 6 PRECEDING AND CURRENT ROW > ) as rolling_average FROM daily_vals; > > What did I miss? > > Kind regards > > robert > <Chuckle> Your fix is much more elegant and efficient, though both approaches work!