Обсуждение: graphing time series data
Hi, I have times series data in a 'price' table: price(id_price, price, id_product, created, modified) Prices are polled daily and a new 'price' row is created only if the price of id_product changes, else modified is updated to now(). Now, I'd like to make a graph of average prices per week, per id_product. As some prices don't vary much, distribution would not be ideal if I simply 'group by extract(week from p.modified)'. Ideally I'd generate_series() a list of weeks between min(p.created) and max(p.modified) and then average prices 'group by p.modified < week'. What would be the best way to tackle this? Thanks,
On Wed, Apr 14, 2010 at 7:54 AM, Louis-David Mitterrand
<vindex+lists-pgsql-sql@apartia.org> wrote:
> Now, I'd like to make a graph of average prices per week, per
> id_product. As some prices don't vary much, distribution would not be
> ideal if I simply 'group by extract(week from p.modified)'.
I created a view for a similar problem that I had. Only I was
calculating the counts per day. this query could be crafted to work
for you.
CREATE OR REPLACE VIEW opendiscrepencydailycounts ASWITH opendays(day) AS ( SELECT gs.day::date AS day
FROMgenerate_series((( SELECT
min(discrepencylist.discstartdt) AS min FROM discrepencylist))::timestamp without time
zone, 'now'::text::date::timestamp without time zone, '1
day'::interval) gs(day) )SELECT opendays.day, ds.resolvingparty, count(opendays.day) AS
opendiscrepancies FROM discrepencylist ds, opendays WHERE opendays.day >= ds.discstartdt AND opendays.day <=
LEAST('now'::text::date, ds.resolutiondate) GROUP BY opendays.day, ds.resolvingparty ORDER BY opendays.day,
ds.resolvingparty;
--
Regards,
Richard Broersma Jr.
Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug
On Wed, Apr 14, 2010 at 08:46:13AM -0700, Richard Broersma wrote:
> On Wed, Apr 14, 2010 at 7:54 AM, Louis-David Mitterrand
> <vindex+lists-pgsql-sql@apartia.org> wrote:
>
> > Now, I'd like to make a graph of average prices per week, per
> > id_product. As some prices don't vary much, distribution would not be
> > ideal if I simply 'group by extract(week from p.modified)'.
>
> I created a view for a similar problem that I had. Only I was
> calculating the counts per day. this query could be crafted to work
> for you.
>
> CREATE OR REPLACE VIEW opendiscrepencydailycounts AS
> WITH opendays(day) AS (
> SELECT gs.day::date AS day
> FROM generate_series((( SELECT
> min(discrepencylist.discstartdt) AS min
> FROM discrepencylist))::timestamp without time
> zone, 'now'::text::date::timestamp without time zone, '1
> day'::interval) gs(day)
> )
> SELECT opendays.day, ds.resolvingparty, count(opendays.day) AS
> opendiscrepancies
> FROM discrepencylist ds, opendays
> WHERE opendays.day >= ds.discstartdt AND opendays.day <=
> LEAST('now'::text::date, ds.resolutiondate)
> GROUP BY opendays.day, ds.resolvingparty
> ORDER BY opendays.day, ds.resolvingparty;
You confirm my modus operandi. I tried the following which seems to give
me optimal price distribution:
select w.week,count( p.id_price) from (select generate_series(min(p.created_on),max(p.modified_on),'1 week') as
weekfrom price p) as w join price p on (p.created_on < w.week + '7 days' and p.modified_on > w.week + '7 days')
groupby w.week order by w.week
week | count
------------------------+--------2010-02-10 15:32:18+01 | 1253692010-02-17 15:32:18+01 | 1268822010-02-24 15:32:18+01 |
1283072010-03-0315:32:18+01 | 1267422010-03-10 15:32:18+01 | 1335962010-03-17 15:32:18+01 | 1490192010-03-24
15:32:18+01| 1499082010-03-31 15:32:18+02 | 147617
The rest should be easy from there!
Thanks for your input,
2010/4/14 Louis-David Mitterrand <vindex+lists-pgsql-sql@apartia.org>:
> On Wed, Apr 14, 2010 at 08:46:13AM -0700, Richard Broersma wrote:
>> On Wed, Apr 14, 2010 at 7:54 AM, Louis-David Mitterrand
>> <vindex+lists-pgsql-sql@apartia.org> wrote:
>>
>> > Now, I'd like to make a graph of average prices per week, per
>> > id_product. As some prices don't vary much, distribution would not be
>> > ideal if I simply 'group by extract(week from p.modified)'.
>>
>> I created a view for a similar problem that I had. Only I was
>> calculating the counts per day. this query could be crafted to work
>> for you.
>>
>> CREATE OR REPLACE VIEW opendiscrepencydailycounts AS
>> WITH opendays(day) AS (
>> SELECT gs.day::date AS day
>> FROM generate_series((( SELECT
>> min(discrepencylist.discstartdt) AS min
>> FROM discrepencylist))::timestamp without time
>> zone, 'now'::text::date::timestamp without time zone, '1
>> day'::interval) gs(day)
>> )
>> SELECT opendays.day, ds.resolvingparty, count(opendays.day) AS
>> opendiscrepancies
>> FROM discrepencylist ds, opendays
>> WHERE opendays.day >= ds.discstartdt AND opendays.day <=
>> LEAST('now'::text::date, ds.resolutiondate)
>> GROUP BY opendays.day, ds.resolvingparty
>> ORDER BY opendays.day, ds.resolvingparty;
>
> You confirm my modus operandi. I tried the following which seems to give
> me optimal price distribution:
>
> select w.week,count( p.id_price) from
> (select generate_series(min(p.created_on),max(p.modified_on),'1
> week') as week from price p) as w join price p on (p.created_on <
> w.week + '7 days' and p.modified_on > w.week + '7 days') group by
> w.week order by w.week
>
> week | count
> ------------------------+--------
> 2010-02-10 15:32:18+01 | 125369
> 2010-02-17 15:32:18+01 | 126882
> 2010-02-24 15:32:18+01 | 128307
> 2010-03-03 15:32:18+01 | 126742
> 2010-03-10 15:32:18+01 | 133596
> 2010-03-17 15:32:18+01 | 149019
> 2010-03-24 15:32:18+01 | 149908
> 2010-03-31 15:32:18+02 | 147617
>
> The rest should be easy from there!
I don't understand well. Why you don't use a function date_trunc(),
select date_trunc('week', created), count(*)
from price
group by date_trunc('week', created)
Regards
Pavel Stehuke
>
> Thanks for your input,
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>
On Wed, Apr 14, 2010 at 06:06:59PM +0200, Pavel Stehule wrote:
> I don't understand well. Why you don't use a function date_trunc(),
>
> select date_trunc('week', created), count(*)
> from price
> group by date_trunc('week', created)
Because if a price doesn't change for more than a week, then some weeks
will have bad statistical distribution (i.e not including prices which
only have their 'modified' updated). So I (think I) need to (1) generate
the weeks separately and (2) average prices that are current for each
week.
But I could be missing something obvious.
2010/4/14 Louis-David Mitterrand <vindex+lists-pgsql-sql@apartia.org>:
> On Wed, Apr 14, 2010 at 06:06:59PM +0200, Pavel Stehule wrote:
>> I don't understand well. Why you don't use a function date_trunc(),
>>
>> select date_trunc('week', created), count(*)
>> from price
>> group by date_trunc('week', created)
>
> Because if a price doesn't change for more than a week, then some weeks
> will have bad statistical distribution (i.e not including prices which
> only have their 'modified' updated). So I (think I) need to (1) generate
> the weeks separately and (2) average prices that are current for each
> week.
>
> But I could be missing something obvious.
ok
Pavel
>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>