Обсуждение: selecting records X minutes apart
I have a table that, at a minimum, has ID and timestamp columns. Records are inserted into with random IDs and timestamps. Duplicate IDs are allowed. I want to select records grouped by ID, ordered by timestamp that are X minutes apart. In this case X is 5. Note, the intervals are not X minute wall clock intervals, they are X minute intervals from the last accepted record, per-id. For instance here is some sample input data: ID TS (HH:MM) ------------------- 0 20:00 1 20:03 1 20:04 0 20:05 1 20:05 0 20:08 1 20:09 0 20:10 I'd want the select to return: ID TS (HH:MM) ------------------- 0 20:00 0 20:05 0 20:10 1 20:03 1 20:09 Does my question make sense? Thanks in advance, Wayne
Will you be using a full timestamp with that or are you only concerned about hours and minutes? If you want a full timestamp do you care about the seconds? For example, do you want to be able to do this for '2011-06-01 23:59:04' and '2011-06-02 00:04:04'? On Fri, Jun 3, 2011 at 12:52 PM, <lists-pgsql@useunix.net> wrote: > I have a table that, at a minimum, has ID and timestamp columns. Records > are inserted into with random IDs and timestamps. Duplicate IDs are allowed. > > I want to select records grouped by ID, ordered by timestamp that are X minutes > apart. In this case X is 5. > > Note, the intervals are not X minute wall clock intervals, they are X minute > intervals from the last accepted record, per-id. > > For instance here is some sample input data: > > ID TS (HH:MM) > ------------------- > 0 20:00 > 1 20:03 > 1 20:04 > 0 20:05 > 1 20:05 > 0 20:08 > 1 20:09 > 0 20:10 > > I'd want the select to return: > > ID TS (HH:MM) > ------------------- > 0 20:00 > 0 20:05 > 0 20:10 > 1 20:03 > 1 20:09 > > > Does my question make sense? > > Thanks in advance, > Wayne > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql >
On 2011-06-03, lists-pgsql@useunix.net <lists-pgsql@useunix.net> wrote: > > ID TS (HH:MM) > ------------------- > 0 20:00 > 0 20:05 > 0 20:10 > 1 20:03 > 1 20:09 > > > Does my question make sense? no, why is (1,20:04) excluded, but (0,20:05) included? both records are 5 minutes from the newest. -- ⚂⚃ 100% natural
It looks like maybe he is trying to fetch records that either have no previous entries or have another record with a timestamp 5 minutes before them at the time they are inserted... On Sat, Jun 4, 2011 at 4:45 AM, Jasen Betts <jasen@xnet.co.nz> wrote: > On 2011-06-03, lists-pgsql@useunix.net <lists-pgsql@useunix.net> wrote: >> >> ID TS (HH:MM) >> ------------------- >> 0 20:00 >> 0 20:05 >> 0 20:10 >> 1 20:03 >> 1 20:09 >> >> >> Does my question make sense? > > no, why is (1,20:04) excluded, but (0,20:05) included? > both records are 5 minutes from the newest. > > -- > ⚂⚃ 100% natural > > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql >
The TS column type is actually a timestamp with out timezone and yes I want to take seconds into account so both of your entries would be included in the result. On Fri, Jun 03, 2011 at 06:01:53PM -0700, Kevin Crain wrote: > Will you be using a full timestamp with that or are you only concerned > about hours and minutes? If you want a full timestamp do you care > about the seconds? For example, do you want to be able to do this for > '2011-06-01 23:59:04' and '2011-06-02 00:04:04'? > > On Fri, Jun 3, 2011 at 12:52 PM, <lists-pgsql@useunix.net> wrote: > > I have a table that, at a minimum, has ID and timestamp columns. Records > > are inserted into with random IDs and timestamps. Duplicate IDs are allowed. > > > > I want to select records grouped by ID, ordered by timestamp that are X minutes > > apart. In this case X is 5. > > > > Note, the intervals are not X minute wall clock intervals, they are X minute > > intervals from the last accepted record, per-id. > > > > For instance here is some sample input data: > > > > ID TS (HH:MM) > > ------------------- > > 0 20:00 > > 1 20:03 > > 1 20:04 > > 0 20:05 > > 1 20:05 > > 0 20:08 > > 1 20:09 > > 0 20:10 > > > > I'd want the select to return: > > > > ID TS (HH:MM) > > ------------------- > > 0 20:00 > > 0 20:05 > > 0 20:10 > > 1 20:03 > > 1 20:09 > > > > > > Does my question make sense? > > > > Thanks in advance, > > Wayne > > > > -- > > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-sql > >
On Sat, Jun 04, 2011 at 11:45:08AM +0000, Jasen Betts wrote: > On 2011-06-03, lists-pgsql@useunix.net <lists-pgsql@useunix.net> wrote: > > > > ID TS (HH:MM) > > ------------------- > > 0 20:00 > > 0 20:05 > > 0 20:10 > > 1 20:03 > > 1 20:09 > > > > > > Does my question make sense? > > no, why is (1,20:04) excluded, but (0,20:05) included? > both records are 5 minutes from the newest. Jasen, (1,20:04) is excluded because it's timestamp is less than 5 minutes from the previous record with the same ID (1,20:03), (0,20:05) is included for the opposite reason. Let me restate my requirement again with a little more detail. I want to select records grouped by ID, ordered by timestamp, in ascending order so I'm starting with the oldest, that are at least X minutes apart. I hope that helps. Thanks again, Wayne
On Sat, Jun 4, 2011 at 12:15 PM, <lists-pgsql@useunix.net> wrote: > I want to > select records grouped by ID, ordered by timestamp, in ascending order so I'm > starting with the oldest, that are at least X minutes apart. Here my guess: SELECT id, ts FROM Yourtable AS A AND NOT EXISTS ( SELECT * FROM Yourtable AS B WHEREB.id = A.id AND B.ts > A.ts - INTERVAL '5 MINUTES' AND B.tx < A.ts ) ORDER BY id, ts; -- Regards, Richard Broersma Jr.
Did you mean WHERE in place of your first AND? If so I already had something like this but it only returns one set, the oldest group of entries for each ID. On Sat, Jun 04, 2011 at 01:09:39PM -0700, Richard Broersma wrote: > On Sat, Jun 4, 2011 at 12:15 PM, <lists-pgsql@useunix.net> wrote: > > I want to > > select records grouped by ID, ordered by timestamp, in ascending order so I'm > > starting with the oldest, that are at least X minutes apart. > > > Here my guess: > > SELECT id, ts > FROM Yourtable AS A > AND NOT EXISTS ( SELECT * > FROM Yourtable AS B > WHERE B.id = A.id > AND B.ts > A.ts - INTERVAL '5 MINUTES' > AND B.tx < A.ts ) > > ORDER BY id, ts; > > -- > Regards, > Richard Broersma Jr.
Why is (0,20:10) listed in your expected results when there is a (0,20:08)? On Fri, Jun 3, 2011 at 12:52 PM, <lists-pgsql@useunix.net> wrote: > I have a table that, at a minimum, has ID and timestamp columns. Records > are inserted into with random IDs and timestamps. Duplicate IDs are allowed. > > I want to select records grouped by ID, ordered by timestamp that are X minutes > apart. In this case X is 5. > > Note, the intervals are not X minute wall clock intervals, they are X minute > intervals from the last accepted record, per-id. > > For instance here is some sample input data: > > ID TS (HH:MM) > ------------------- > 0 20:00 > 1 20:03 > 1 20:04 > 0 20:05 > 1 20:05 > 0 20:08 > 1 20:09 > 0 20:10 > > I'd want the select to return: > > ID TS (HH:MM) > ------------------- > 0 20:00 > 0 20:05 > 0 20:10 > 1 20:03 > 1 20:09 > > > Does my question make sense? > > Thanks in advance, > Wayne > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql >
My approach would be to add a column for LAST_TS and place a trigger on insert that populates this new column. Then you have something you can put in your WHERE clause to test on. On Fri, Jun 3, 2011 at 12:52 PM, <lists-pgsql@useunix.net> wrote: > I have a table that, at a minimum, has ID and timestamp columns. Records > are inserted into with random IDs and timestamps. Duplicate IDs are allowed. > > I want to select records grouped by ID, ordered by timestamp that are X minutes > apart. In this case X is 5. > > Note, the intervals are not X minute wall clock intervals, they are X minute > intervals from the last accepted record, per-id. > > For instance here is some sample input data: > > ID TS (HH:MM) > ------------------- > 0 20:00 > 1 20:03 > 1 20:04 > 0 20:05 > 1 20:05 > 0 20:08 > 1 20:09 > 0 20:10 > > I'd want the select to return: > > ID TS (HH:MM) > ------------------- > 0 20:00 > 0 20:05 > 0 20:10 > 1 20:03 > 1 20:09 > > > Does my question make sense? > > Thanks in advance, > Wayne > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql >
Let's a take a look at just the input set for ID 0. 0 20:00 0 20:05 0 20:08 0 20:10 I want records, starting from the oldest record (20:00), that are at least 5 minutes apart. So 20:00, 20:05, 20:10 but 20:08 - 20:05 is only 3 minutes so it is to be ignored. I was hoping to do this with a single SQL query that renders good runtime performance but it may not possible. But I'm by no means proficient in SQL. On Sat, Jun 04, 2011 at 05:51:18PM -0700, Kevin Crain wrote: > Why is (0,20:10) listed in your expected results when there is a (0,20:08)? > > > On Fri, Jun 3, 2011 at 12:52 PM, <lists-pgsql@useunix.net> wrote: > > I have a table that, at a minimum, has ID and timestamp columns. Records > > are inserted into with random IDs and timestamps. Duplicate IDs are allowed. > > > > I want to select records grouped by ID, ordered by timestamp that are X minutes > > apart. In this case X is 5. > > > > Note, the intervals are not X minute wall clock intervals, they are X minute > > intervals from the last accepted record, per-id. > > > > For instance here is some sample input data: > > > > ID TS (HH:MM) > > ------------------- > > 0 20:00 > > 1 20:03 > > 1 20:04 > > 0 20:05 > > 1 20:05 > > 0 20:08 > > 1 20:09 > > 0 20:10 > > > > I'd want the select to return: > > > > ID TS (HH:MM) > > ------------------- > > 0 20:00 > > 0 20:05 > > 0 20:10 > > 1 20:03 > > 1 20:09 > > > > > > Does my question make sense? > > > > Thanks in advance, > > Wayne > > > > -- > > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-sql > > > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql
On Sat, Jun 4, 2011 at 6:36 PM, <lists-pgsql@useunix.net> wrote: > 0 20:00 > 0 20:05 > 0 20:08 > 0 20:10 > > I want records, starting from the oldest record (20:00), that are at least 5 > minutes apart. So 20:00, 20:05, 20:10 but 20:08 - 20:05 is only 3 minutes so > it is to be ignored. Here is my next guess: --although I guess my use of WITH Recursive is correct WITH RECURSIVE Accum_ts( id, ts ) AS ( SELECT id, MIN( ts ) FROM Yourtable GROUP BY id ) UNION ALL ( SELECT B.id, MIN( B.ts ) FROM Yourtable AS B WHERE B.id = Accum_ts.id ANDB.ts >= Accum_ts.ts - INTERVAL '5 MINUTES' GROUP BY B.id HAVING MIN( B.ts ) <= MAX( B.ts )) SELECT id, ts FROM Accum_ts ORDER BY id, ts; -- Regards, Richard Broersma Jr.
On Fri, 3 Jun 2011 15:52:53 -0400 lists-pgsql@useunix.net wrote: I also think you might want to use WITH RECURSIVE clause. This SQL searches the case of an interval of 5 minutes or more, and sets a relationship between a parent to its child. CREATE TABLE tbl(id integer, ts time) ; INSERT INTO tbl VALUES (0, '20:00'), (0, '20:05'), (0, '20:08'), (0, '20:10'), (0, '20:11'), (1, '20:03'), (1, '20:04'), (1, '20:05'), (1, '20:09'), (1, '20:16'); SELECT * FROM tbl; -- WITH RECURSIVE rec(id , ts_p, ts_c) AS ( SELECT a1.id, min(a1.ts), min(b1.ts) FROM tbl AS a1, tbl AS b1 WHERE a1.id=b1.id AND a1.ts + interval'5 minute' <= b1.ts GROUP BY a1.id UNION ALL SELECT t2.id, t2.ts_p, t2.ts_c FROM rec AS t1 INNER JOIN (SELECT a2.id, a2.ts as ts_p, min(b2.ts) AS ts_c FROM tblAS a2, tbl AS b2 WHERE a2.id = b2.id AND a2.ts + interval'5 minute' <= b2.ts GROUP BY a2.id, a2.ts UNIONALL SELECT a3.id, a3.ts, null FROM tbl AS a3 ) AS t2 ON t1.id = t2.id AND t1.ts_c=t2.ts_p ) SELECT DISTINCT id, ts_p AS ts FROM rec ORDER BY 1,2; > I have a table that, at a minimum, has ID and timestamp columns. Records > are inserted into with random IDs and timestamps. Duplicate IDs are allowed. > > I want to select records grouped by ID, ordered by timestamp that are X minutes > apart. In this case X is 5. > > Note, the intervals are not X minute wall clock intervals, they are X minute > intervals from the last accepted record, per-id. > > For instance here is some sample input data: > > ID TS (HH:MM) > ------------------- > 0 20:00 > 1 20:03 > 1 20:04 > 0 20:05 > 1 20:05 > 0 20:08 > 1 20:09 > 0 20:10 > > I'd want the select to return: > > ID TS (HH:MM) > ------------------- > 0 20:00 > 0 20:05 > 0 20:10 > 1 20:03 > 1 20:09 > > > Does my question make sense? > > Thanks in advance, > Wayne > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql
On Fri, 3 Jun 2011 15:52:53 -0400 lists-pgsql@useunix.net wrote: I also think you might want to use WITH RECURSIVE clause. This SQL searches the case of an interval of 5 minutes or more, and sets a relationship between a parent to its child. CREATE TABLE tbl(id integer, ts time) ; INSERT INTO tbl VALUES (0, '20:00'), (0, '20:05'), (0, '20:08'), (0, '20:10'), (0, '20:11'), -- added as a sample. (1, '20:03'), (1, '20:04'), (1, '20:05'), (1, '20:09'), (1, '20:16'); -- added as a sample. SELECT * FROM tbl; -- WITH RECURSIVE rec(id , ts_p, ts_c) AS ( SELECT a1.id, min(a1.ts), min(b1.ts) FROM tbl AS a1, tbl AS b1 WHERE a1.id=b1.id AND a1.ts + interval'5 minute' <= b1.ts GROUP BY a1.id UNION ALL SELECT t2.id, t2.ts_p, t2.ts_c FROM rec AS t1 INNER JOIN (SELECT a2.id, a2.ts as ts_p, min(b2.ts) AS ts_c FROM tblAS a2, tbl AS b2 WHERE a2.id = b2.id AND a2.ts + interval'5 minute' <= b2.ts GROUP BY a2.id, a2.ts UNIONALL SELECT a3.id, a3.ts, null FROM tbl AS a3 ) AS t2 ON t1.id = t2.id AND t1.ts_c=t2.ts_p ) SELECT DISTINCT id, ts_p AS ts FROM rec ORDER BY 1,2; Regards, Masaru Sugawara > I have a table that, at a minimum, has ID and timestamp columns. Records > are inserted into with random IDs and timestamps. Duplicate IDs are allowed. > > I want to select records grouped by ID, ordered by timestamp that are X minutes > apart. In this case X is 5. > > Note, the intervals are not X minute wall clock intervals, they are X minute > intervals from the last accepted record, per-id. > > For instance here is some sample input data: > > ID TS (HH:MM) > ------------------- > 0 20:00 > 1 20:03 > 1 20:04 > 0 20:05 > 1 20:05 > 0 20:08 > 1 20:09 > 0 20:10 > > I'd want the select to return: > > ID TS (HH:MM) > ------------------- > 0 20:00 > 0 20:05 > 0 20:10 > 1 20:03 > 1 20:09 > > > Does my question make sense? > > Thanks in advance, > Wayne > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql
Thank you all who replied!! It looks like Sugawara's recursive solution does the trick. Unfortunately performance is quite poor for the sample dataset I'm working with which is a table of about 50000 records. Indeed, there are indexes applied to the table. I believe the recursive select is being executed a great number of times causing the first part of the query to take a long time. The fastest solution I've come up with is a plpgsql procedure the loops over a select where the result is ordered by (id,tstamp) and examines the tstamp values and only returns rows that meet the interval criteria. This technique takes roughly 2 seconds to filter out records over my 50000 record sample set.... which is acceptable but not nearly as elegant as a single SQL statement. Again, thank you for all the replies. Wayne On Sun, Jun 05, 2011 at 08:52:30PM +0900, Masaru Sugawara wrote: > On Fri, 3 Jun 2011 15:52:53 -0400 > lists-pgsql@useunix.net wrote: > > > I also think you might want to use WITH RECURSIVE clause. > This SQL searches the case of an interval of 5 minutes or more, > and sets a relationship between a parent to its child. > > > CREATE TABLE tbl(id integer, ts time) ; > INSERT INTO tbl VALUES > (0, '20:00'), > (0, '20:05'), > (0, '20:08'), > (0, '20:10'), > (0, '20:11'), > (1, '20:03'), > (1, '20:04'), > (1, '20:05'), > (1, '20:09'), > (1, '20:16'); > > SELECT * FROM tbl; > > > -- > WITH RECURSIVE rec(id , ts_p, ts_c) AS ( > SELECT a1.id, min(a1.ts), min(b1.ts) > FROM tbl AS a1, tbl AS b1 > WHERE a1.id=b1.id AND a1.ts + interval'5 minute' <= b1.ts > GROUP BY a1.id > UNION ALL > SELECT t2.id, t2.ts_p, t2.ts_c > FROM rec AS t1 INNER JOIN > (SELECT a2.id, a2.ts as ts_p, min(b2.ts) AS ts_c > FROM tbl AS a2, tbl AS b2 > WHERE a2.id = b2.id AND a2.ts + interval'5 minute' <= b2.ts > GROUP BY a2.id, a2.ts > UNION ALL > SELECT a3.id, a3.ts, null > FROM tbl AS a3 > ) AS t2 ON t1.id = t2.id AND t1.ts_c=t2.ts_p > ) > SELECT DISTINCT id, ts_p AS ts FROM rec > ORDER BY 1,2; > > > > > > > > I have a table that, at a minimum, has ID and timestamp columns. Records > > are inserted into with random IDs and timestamps. Duplicate IDs are allowed. > > > > I want to select records grouped by ID, ordered by timestamp that are X minutes > > apart. In this case X is 5. > > > > Note, the intervals are not X minute wall clock intervals, they are X minute > > intervals from the last accepted record, per-id. > > > > For instance here is some sample input data: > > > > ID TS (HH:MM) > > ------------------- > > 0 20:00 > > 1 20:03 > > 1 20:04 > > 0 20:05 > > 1 20:05 > > 0 20:08 > > 1 20:09 > > 0 20:10 > > > > I'd want the select to return: > > > > ID TS (HH:MM) > > ------------------- > > 0 20:00 > > 0 20:05 > > 0 20:10 > > 1 20:03 > > 1 20:09 > > > > > > Does my question make sense? > > > > Thanks in advance, > > Wayne > > > > -- > > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-sql > > > > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql
How about this (that does not require special functions nor triggers: DROP TABLE IF EXISTS val; CREATE TABLE val ( id int, ts timestamp ); INSERT INTO val VALUES (0, '1-Jan-2010 20:00'), (1, '1-Jan-2010 20:03'), (1, '1-Jan-2010 20:04'), (0, '1-Jan-201020:05'), (1, '1-Jan-2010 20:05'), (0, '1-Jan-2010 20:08'), (1, '1-Jan-2010 20:09'), (0, '1-Jan-201020:10'); WITH val_first AS ( SELECT id, min(ts) AS ts FROM val GROUP BY id ) SELECT v.id, v.ts::time FROM val v, val_first vf WHERE v.id = vf.id AND EXTRACT(EPOCH FROM v.ts - vf.ts)::int % 300 = 0 ORDER BY id, ts;