Обсуждение: range intervals in window function frames

Поиск
Список
Период
Сортировка

range intervals in window function frames

От
Daniel Popowich
Дата:
Hello all!

I need to do moving averages over time series data and was hoping
window functions could solve the problem for me, but it doesn't look
like 8.4 or even 9.0 implementations are quite there, yet.

Currently, if I have this table:

  create table sample (
      ts    timestamp,
      value integer
  );
  create index sample_ts on sample (ts);

and say I want a moving average of value over a fixed interval of five
minutes (note that this could mean varying numbers of records in each
"frame"), then I can do this:

  select *, avg_over_interval(ts, interval '5 min') from sample order by ts;

Where avg_over_interval() is defined like this:

  create or replace function avg_over_interval(timestamp, interval)
      returns numeric as $$
         select avg(value) from sample where (($1-$2) <= ts) and (ts <= $1);
      $$ language sql;

What I would LIKE to do is this:

  select *, avg(ts) over(order by ts range (interval '5 min') preceding)
         from sample order by ts;

Which is way cleaner and, I assume, more efficient.

Questions:

  1) Is there active work on window functions with frames over
     interval ranges?

  2) If not, how can I help with that?

  3) Until the functionality is in 9.x, can I make what I'm doing more
     efficient?  Is there a better way to do this without window
     functions?  (I tried to come up with a subquery in my
     select-list, but couldn't come up with legal syntax to reference
     the static value of ts for the current row in the subselect.)

Thanks all for you help.

Dan

Re: range intervals in window function frames

От
Vincent Veyron
Дата:
[sent you a personal email by mistake - slightly corrected query posted
to the list]

Le dimanche 12 décembre 2010 à 10:55 -0500, Daniel Popowich a écrit :
> Hello all!
>
> I need to do moving averages over time series data and was hoping
> window functions could solve the problem for me, but it doesn't look
> like 8.4 or even 9.0 implementations are quite there, yet.
>
> Currently, if I have this table:
>
>   create table sample (
>       ts    timestamp,
>       value integer
>   );
>   create index sample_ts on sample (ts);

> What I would LIKE to do is this:
>
>   select *, avg(ts) over(order by ts range (interval '5 min') preceding)
>          from sample order by ts;
>

This?


select t1.ts, t1.value, (select avg(t2.value) from (select value from
sample where (t1.ts-ts)::INTERVAL <= interval '5 minutes' and ts<t1.ts)
as t2) from sample t1;

--
Vincent Veyron
http://marica.fr/
Progiciel de gestion des dossiers de contentieux et d'assurance pour le service juridique




Re: range intervals in window function frames

От
Daniel Popowich
Дата:
Vincent Veyron writes:
> > What I would LIKE to do is this:
> >
> >   select *, avg(ts) over(order by ts range (interval '5 min') preceding)
> >          from sample order by ts;
> >
>
> This?
>
>
> select t1.ts, t1.value, (select avg(t2.value) from (select value from
> sample where (t1.ts-ts)::INTERVAL <= interval '5 minutes' and ts<t1.ts)
> as t2) from sample t1;

Close.  Your where clause needed to have (ts<=t1.ts).  It can also be
simplified to this:

select t1.ts, t1.value, (select avg(t2.value)
                                   from sample t2
                                   where (t1.ts - t2.ts) <= interval '5 min'
                                          and t2.ts <= t1.ts)
          from sample t1 order by t1.ts;

HOWEVER, the performance is horrible compared to using the
avg_over_interval() function!  On a table with 53,622 rows:

   EXPLAIN ANALYZE [the above select statement];

     ...
     Total runtime: 481235.867 ms

But this:

   EXPLAIN ANALYZE select ts, value,
                          avg_over_interval(ts, interval '5 min')
                          from sample order by ts;

     ...
     Total runtime: 3934.755 ms

8 minutes vs 4 seconds!

I'm imagining window functions would be even more efficient.  Yes?
Can anyone answer when range intervals will be implemented for window
functions, as in the quoted select at the top of this message?


Cheers,

Dan




Re: range intervals in window function frames

От
Tom Lane
Дата:
Daniel Popowich <danielpopowich@gmail.com> writes:
> Close.  Your where clause needed to have (ts<=t1.ts).  It can also be
> simplified to this:

> select t1.ts, t1.value, (select avg(t2.value)
>                                    from sample t2
>                                    where (t1.ts - t2.ts) <= interval '5 min'
>                                           and t2.ts <= t1.ts)
>           from sample t1 order by t1.ts;

> HOWEVER, the performance is horrible compared to using the
> avg_over_interval() function!

The reason for that is the WHERE clause got rewritten into a form that
can't be used efficiently with the index on t2.  Phrase it the same way
as in the function, ie

                                   where (t1.ts - interval '5 min') <= t2.ts
                                          and t2.ts <= t1.ts

and you'll probably get similar results.  Of course, since this isn't
anything except inlining the function into the query, it's probably not
all that exciting to you.

> Can anyone answer when range intervals will be implemented for window
> functions, as in the quoted select at the top of this message?

Nope.  There was a patch submitted, it was rejected on a couple of
grounds, and I don't know if anyone is actively working on the problem
or not.

            regards, tom lane

Re: range intervals in window function frames

От
Daniel Popowich
Дата:
Tom Lane writes:
> Daniel Popowich <danielpopowich@gmail.com> writes:
> > Close.  Your where clause needed to have (ts<=t1.ts).  It can also be
> > simplified to this:
>
> > select t1.ts, t1.value, (select avg(t2.value)
> >                                    from sample t2
> >                                    where (t1.ts - t2.ts) <= interval '5 min'
> >                                           and t2.ts <= t1.ts)
> >           from sample t1 order by t1.ts;
>
> > HOWEVER, the performance is horrible compared to using the
> > avg_over_interval() function!
>
> The reason for that is the WHERE clause got rewritten into a form that
> can't be used efficiently with the index on t2.  Phrase it the same way
> as in the function, ie
>
>                                    where (t1.ts - interval '5 min') <= t2.ts
>                                           and t2.ts <= t1.ts
>
> and you'll probably get similar results.

Thanks, Tom, that explains it.  EXPLAIN ANALYZE with the re-written
WHERE brings the inline version down to 8.5 seconds, still twice as
slow, but that's a heck of a lot better than 122 times as slow!  :)

> Of course, since this isn't anything except inlining the function
> into the query, it's probably not all that exciting to you.

Not terribly, but it's good to discover the function version is twice
as fast.  (not to mention that the function is much easier to read.)

> > Can anyone answer when range intervals will be implemented for window
> > functions, as in the quoted select at the top of this message?
>
> Nope.  There was a patch submitted, it was rejected on a couple of
> grounds, and I don't know if anyone is actively working on the problem
> or not.

Bummer.  I may go ask in hackers.

Thanks,

Dan