Re: Help needed structuring Postgresql correlation query

Поиск
Список
Период
Сортировка
От Tim Smith
Тема Re: Help needed structuring Postgresql correlation query
Дата
Msg-id CA+HuS5FQLZKBFhoBenaK8Zd7perw5azjpLX16yvgOZf414V70g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Help needed structuring Postgresql correlation query  (Alban Hertroys <haramrae@gmail.com>)
Список pgsql-general
Thanks for that, looks like something to sink my teeth into !

On 21 June 2016 at 13:29, Alban Hertroys <haramrae@gmail.com> wrote:
>
>> On 19 Jun 2016, at 10:58, Tim Smith <randomdev4+postgres@gmail.com> wrote:
>>
>> Hi,
>>
>> My postgresql-fu is not good enough to write a query to achieve this
>> (some may well say r is a better suited tool to achieve this !).
>>
>> I need to calculate what I would call a correlation window on a time
>> series of data, my table looks like this :
>>
>> create table data(data_date date,data_measurement numeric);
>> insert into data values('2016-01-01',16.23);
>> <etc>
>> insert into data values('2016-06-19',30.54);
>>
>> My "target sample" would be the N most recent samples in the table
>> (e.g. 20, the most recent 20 days)
>>
>> My "potential sample" would be a moving window of size N (the same
>> size N as above), starting at T0 (i.e. 2016-01-01 in this example) and
>> incrementing by one (i.e. 2016-01-01 + 20, then 2016-01-02+20 etc),
>> but the "target sample" would obviously be excluded.
>>
>> The output needs to display window date range (or at least the start
>> date of the "potential sample" window) and the result
>> corr(target,potential).
>>
>> Hope that makes sense
>
> Something like this could do the trick (untested):
>
>         with recursive sample (nr, start_date) as (
>                 select 1 as nr, data_date as start_date, SUM(data_measurement) as total
>                 from generate_series(0, 19) range(step)
>                 left join data on (data_date = start_date + range.step)
>
>                 union all
>
>                 select nr + 1, sample.start_date +1, SUM(data_measurement) as total
>                 from sample
>                 join generate_series(0, 19) range(step)
>                 left join data on (data_date = start_date +1 + range.step)
>                 where start_date +1 +19 <= (select MAX(data_date) from data)
>                 group by 1, 2
>         )
>         select * from sample where start_date >= '2016-01-01';
>
> Not sure how best to go about parameterising sample size N, a stored function seems like a good option.
>
>
> Another approach would be to move a (cumulative) window-function with 20 items over your data set and for each row
subtractthe first value of the previous window from the total of the current window (that is, assuming you're
calculatinga SUM of data_measurement for each window of 20 records). 
>
> Visually that looks something like this for sample size 4:
> sample 1: (A + B + C + D)
> sample 2: (A + B + C + D) + E - A = (B + C + D + E)
> sample 3: (B + C + D + E) + F - B = (C + D + E + F)
> etc.
>
> To accomplish this, you calculate two cumulative totals (often misnamed as running totals, but AFAIK that's something
different),one from the start, and one lagging N rows behind (you can use the lag() window function for that) and
subtractthe two. 
>
> Good luck!
>
> Alban Hertroys
> --
> If you can't see the forest for the trees,
> cut the trees and you'll find there is no forest.
>


В списке pgsql-general по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: optimizing a query
Следующее
От: Jonathan Vanasco
Дата:
Сообщение: Re: optimizing a query