Обсуждение: range intervals in window function frames
Hello all! I need to do moving averages over time series data and was hoping window functions could solve the problem for me, but it doesn't look like 8.4 or even 9.0 implementations are quite there, yet. Currently, if I have this table: create table sample ( ts timestamp, value integer ); create index sample_ts on sample (ts); and say I want a moving average of value over a fixed interval of five minutes (note that this could mean varying numbers of records in each "frame"), then I can do this: select *, avg_over_interval(ts, interval '5 min') from sample order by ts; Where avg_over_interval() is defined like this: create or replace function avg_over_interval(timestamp, interval) returns numeric as $$ select avg(value) from sample where (($1-$2) <= ts) and (ts <= $1); $$ language sql; What I would LIKE to do is this: select *, avg(ts) over(order by ts range (interval '5 min') preceding) from sample order by ts; Which is way cleaner and, I assume, more efficient. Questions: 1) Is there active work on window functions with frames over interval ranges? 2) If not, how can I help with that? 3) Until the functionality is in 9.x, can I make what I'm doing more efficient? Is there a better way to do this without window functions? (I tried to come up with a subquery in my select-list, but couldn't come up with legal syntax to reference the static value of ts for the current row in the subselect.) Thanks all for you help. Dan
[sent you a personal email by mistake - slightly corrected query posted to the list] Le dimanche 12 décembre 2010 à 10:55 -0500, Daniel Popowich a écrit : > Hello all! > > I need to do moving averages over time series data and was hoping > window functions could solve the problem for me, but it doesn't look > like 8.4 or even 9.0 implementations are quite there, yet. > > Currently, if I have this table: > > create table sample ( > ts timestamp, > value integer > ); > create index sample_ts on sample (ts); > What I would LIKE to do is this: > > select *, avg(ts) over(order by ts range (interval '5 min') preceding) > from sample order by ts; > This? select t1.ts, t1.value, (select avg(t2.value) from (select value from sample where (t1.ts-ts)::INTERVAL <= interval '5 minutes' and ts<t1.ts) as t2) from sample t1; -- Vincent Veyron http://marica.fr/ Progiciel de gestion des dossiers de contentieux et d'assurance pour le service juridique
Vincent Veyron writes: > > What I would LIKE to do is this: > > > > select *, avg(ts) over(order by ts range (interval '5 min') preceding) > > from sample order by ts; > > > > This? > > > select t1.ts, t1.value, (select avg(t2.value) from (select value from > sample where (t1.ts-ts)::INTERVAL <= interval '5 minutes' and ts<t1.ts) > as t2) from sample t1; Close. Your where clause needed to have (ts<=t1.ts). It can also be simplified to this: select t1.ts, t1.value, (select avg(t2.value) from sample t2 where (t1.ts - t2.ts) <= interval '5 min' and t2.ts <= t1.ts) from sample t1 order by t1.ts; HOWEVER, the performance is horrible compared to using the avg_over_interval() function! On a table with 53,622 rows: EXPLAIN ANALYZE [the above select statement]; ... Total runtime: 481235.867 ms But this: EXPLAIN ANALYZE select ts, value, avg_over_interval(ts, interval '5 min') from sample order by ts; ... Total runtime: 3934.755 ms 8 minutes vs 4 seconds! I'm imagining window functions would be even more efficient. Yes? Can anyone answer when range intervals will be implemented for window functions, as in the quoted select at the top of this message? Cheers, Dan
Daniel Popowich <danielpopowich@gmail.com> writes: > Close. Your where clause needed to have (ts<=t1.ts). It can also be > simplified to this: > select t1.ts, t1.value, (select avg(t2.value) > from sample t2 > where (t1.ts - t2.ts) <= interval '5 min' > and t2.ts <= t1.ts) > from sample t1 order by t1.ts; > HOWEVER, the performance is horrible compared to using the > avg_over_interval() function! The reason for that is the WHERE clause got rewritten into a form that can't be used efficiently with the index on t2. Phrase it the same way as in the function, ie where (t1.ts - interval '5 min') <= t2.ts and t2.ts <= t1.ts and you'll probably get similar results. Of course, since this isn't anything except inlining the function into the query, it's probably not all that exciting to you. > Can anyone answer when range intervals will be implemented for window > functions, as in the quoted select at the top of this message? Nope. There was a patch submitted, it was rejected on a couple of grounds, and I don't know if anyone is actively working on the problem or not. regards, tom lane
Tom Lane writes: > Daniel Popowich <danielpopowich@gmail.com> writes: > > Close. Your where clause needed to have (ts<=t1.ts). It can also be > > simplified to this: > > > select t1.ts, t1.value, (select avg(t2.value) > > from sample t2 > > where (t1.ts - t2.ts) <= interval '5 min' > > and t2.ts <= t1.ts) > > from sample t1 order by t1.ts; > > > HOWEVER, the performance is horrible compared to using the > > avg_over_interval() function! > > The reason for that is the WHERE clause got rewritten into a form that > can't be used efficiently with the index on t2. Phrase it the same way > as in the function, ie > > where (t1.ts - interval '5 min') <= t2.ts > and t2.ts <= t1.ts > > and you'll probably get similar results. Thanks, Tom, that explains it. EXPLAIN ANALYZE with the re-written WHERE brings the inline version down to 8.5 seconds, still twice as slow, but that's a heck of a lot better than 122 times as slow! :) > Of course, since this isn't anything except inlining the function > into the query, it's probably not all that exciting to you. Not terribly, but it's good to discover the function version is twice as fast. (not to mention that the function is much easier to read.) > > Can anyone answer when range intervals will be implemented for window > > functions, as in the quoted select at the top of this message? > > Nope. There was a patch submitted, it was rejected on a couple of > grounds, and I don't know if anyone is actively working on the problem > or not. Bummer. I may go ask in hackers. Thanks, Dan