Обсуждение: selecting records X minutes apart

Поиск
Список
Период
Сортировка

selecting records X minutes apart

От
lists-pgsql@useunix.net
Дата:
I have a table that, at a minimum, has ID and timestamp columns.  Records
are inserted into with random IDs and timestamps.  Duplicate IDs are allowed. 

I want to select records grouped by ID, ordered by timestamp that are X minutes
apart. In this case X is 5.

Note, the intervals are not X minute wall clock intervals, they are X minute
intervals from the last accepted record, per-id.

For instance here is some sample input data:

ID    TS (HH:MM)
-------------------
0    20:00
1    20:03
1    20:04
0    20:05
1    20:05
0    20:08
1    20:09
0    20:10

I'd want the select to return:

ID    TS (HH:MM)
-------------------
0    20:00
0    20:05
0    20:10
1    20:03
1    20:09


Does my question make sense?

Thanks in advance,
Wayne


Re: selecting records X minutes apart

От
Kevin Crain
Дата:
Will you be using a full timestamp with that or are you only concerned
about hours and minutes? If you want a full timestamp do you care
about the seconds? For example, do you want to be able to do this for
'2011-06-01 23:59:04' and '2011-06-02 00:04:04'?

On Fri, Jun 3, 2011 at 12:52 PM,  <lists-pgsql@useunix.net> wrote:
> I have a table that, at a minimum, has ID and timestamp columns.  Records
> are inserted into with random IDs and timestamps.  Duplicate IDs are allowed.
>
> I want to select records grouped by ID, ordered by timestamp that are X minutes
> apart. In this case X is 5.
>
> Note, the intervals are not X minute wall clock intervals, they are X minute
> intervals from the last accepted record, per-id.
>
> For instance here is some sample input data:
>
> ID      TS (HH:MM)
> -------------------
> 0       20:00
> 1       20:03
> 1       20:04
> 0       20:05
> 1       20:05
> 0       20:08
> 1       20:09
> 0       20:10
>
> I'd want the select to return:
>
> ID      TS (HH:MM)
> -------------------
> 0       20:00
> 0       20:05
> 0       20:10
> 1       20:03
> 1       20:09
>
>
> Does my question make sense?
>
> Thanks in advance,
> Wayne
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>


Re: selecting records X minutes apart

От
Jasen Betts
Дата:
On 2011-06-03, lists-pgsql@useunix.net <lists-pgsql@useunix.net> wrote:
>
> ID    TS (HH:MM)
> -------------------
> 0    20:00
> 0    20:05
> 0    20:10
> 1    20:03
> 1    20:09
>
>
> Does my question make sense?

no, why is (1,20:04) excluded, but (0,20:05) included?
both records are 5 minutes from the newest.

-- 
⚂⚃ 100% natural



Re: selecting records X minutes apart

От
Kevin Crain
Дата:
It looks like maybe he is trying to fetch records that either have no
previous entries or have another record with a timestamp 5 minutes
before them at the time they are inserted...


On Sat, Jun 4, 2011 at 4:45 AM, Jasen Betts <jasen@xnet.co.nz> wrote:
> On 2011-06-03, lists-pgsql@useunix.net <lists-pgsql@useunix.net> wrote:
>>
>> ID    TS (HH:MM)
>> -------------------
>> 0     20:00
>> 0     20:05
>> 0     20:10
>> 1     20:03
>> 1     20:09
>>
>>
>> Does my question make sense?
>
> no, why is (1,20:04) excluded, but (0,20:05) included?
> both records are 5 minutes from the newest.
>
> --
> ⚂⚃ 100% natural
>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>


Re: selecting records X minutes apart

От
lists-pgsql@useunix.net
Дата:
The TS column type is actually a timestamp with out timezone and yes I want to
take seconds into account so both of your entries would be included in the
result.

On Fri, Jun 03, 2011 at 06:01:53PM -0700, Kevin Crain wrote:
> Will you be using a full timestamp with that or are you only concerned
> about hours and minutes? If you want a full timestamp do you care
> about the seconds? For example, do you want to be able to do this for
> '2011-06-01 23:59:04' and '2011-06-02 00:04:04'?
> 
> On Fri, Jun 3, 2011 at 12:52 PM,  <lists-pgsql@useunix.net> wrote:
> > I have a table that, at a minimum, has ID and timestamp columns.  Records
> > are inserted into with random IDs and timestamps.  Duplicate IDs are allowed.
> >
> > I want to select records grouped by ID, ordered by timestamp that are X minutes
> > apart. In this case X is 5.
> >
> > Note, the intervals are not X minute wall clock intervals, they are X minute
> > intervals from the last accepted record, per-id.
> >
> > For instance here is some sample input data:
> >
> > ID      TS (HH:MM)
> > -------------------
> > 0       20:00
> > 1       20:03
> > 1       20:04
> > 0       20:05
> > 1       20:05
> > 0       20:08
> > 1       20:09
> > 0       20:10
> >
> > I'd want the select to return:
> >
> > ID      TS (HH:MM)
> > -------------------
> > 0       20:00
> > 0       20:05
> > 0       20:10
> > 1       20:03
> > 1       20:09
> >
> >
> > Does my question make sense?
> >
> > Thanks in advance,
> > Wayne
> >
> > --
> > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-sql
> >


Re: selecting records X minutes apart

От
lists-pgsql@useunix.net
Дата:
On Sat, Jun 04, 2011 at 11:45:08AM +0000, Jasen Betts wrote:
> On 2011-06-03, lists-pgsql@useunix.net <lists-pgsql@useunix.net> wrote:
> >
> > ID    TS (HH:MM)
> > -------------------
> > 0    20:00
> > 0    20:05
> > 0    20:10
> > 1    20:03
> > 1    20:09
> >
> >
> > Does my question make sense?
> 
> no, why is (1,20:04) excluded, but (0,20:05) included?
> both records are 5 minutes from the newest.

Jasen,

(1,20:04) is excluded because it's timestamp is less than 5 minutes from the
previous record with the same ID (1,20:03), (0,20:05) is included for the
opposite reason.

Let me restate my requirement again with a little more detail.  I want to
select records grouped by ID, ordered by timestamp, in ascending order so I'm
starting with the oldest, that are at least X minutes apart.

I hope that helps.

Thanks again,
Wayne


Re: selecting records X minutes apart

От
Richard Broersma
Дата:
On Sat, Jun 4, 2011 at 12:15 PM,  <lists-pgsql@useunix.net> wrote:
>  I want to
> select records grouped by ID, ordered by timestamp, in ascending order so I'm
> starting with the oldest, that are at least X minutes apart.


Here my guess:

SELECT id, ts FROM Yourtable AS A  AND NOT EXISTS ( SELECT *                     FROM Yourtable AS B
WHEREB.id = A.id                      AND B.ts > A.ts - INTERVAL '5 MINUTES'                      AND B.tx < A.ts ) 

ORDER BY id, ts;

--
Regards,
Richard Broersma Jr.


Re: selecting records X minutes apart

От
lists-pgsql@useunix.net
Дата:
Did you mean WHERE in place of your first AND?  If so I already had something
like this but it only returns one set, the oldest group of entries for each
ID.


On Sat, Jun 04, 2011 at 01:09:39PM -0700, Richard Broersma wrote:
> On Sat, Jun 4, 2011 at 12:15 PM,  <lists-pgsql@useunix.net> wrote:
> >  I want to
> > select records grouped by ID, ordered by timestamp, in ascending order so I'm
> > starting with the oldest, that are at least X minutes apart.
> 
> 
> Here my guess:
> 
> SELECT id, ts
>   FROM Yourtable AS A
>    AND NOT EXISTS ( SELECT *
>                       FROM Yourtable AS B
>                      WHERE B.id = A.id
>                        AND B.ts > A.ts - INTERVAL '5 MINUTES'
>                        AND B.tx < A.ts )
> 
> ORDER BY id, ts;
> 
> -- 
> Regards,
> Richard Broersma Jr.


Re: selecting records X minutes apart

От
Kevin Crain
Дата:
Why is (0,20:10) listed in your expected results when there is a (0,20:08)?


On Fri, Jun 3, 2011 at 12:52 PM,  <lists-pgsql@useunix.net> wrote:
> I have a table that, at a minimum, has ID and timestamp columns.  Records
> are inserted into with random IDs and timestamps.  Duplicate IDs are allowed.
>
> I want to select records grouped by ID, ordered by timestamp that are X minutes
> apart. In this case X is 5.
>
> Note, the intervals are not X minute wall clock intervals, they are X minute
> intervals from the last accepted record, per-id.
>
> For instance here is some sample input data:
>
> ID      TS (HH:MM)
> -------------------
> 0       20:00
> 1       20:03
> 1       20:04
> 0       20:05
> 1       20:05
> 0       20:08
> 1       20:09
> 0       20:10
>
> I'd want the select to return:
>
> ID      TS (HH:MM)
> -------------------
> 0       20:00
> 0       20:05
> 0       20:10
> 1       20:03
> 1       20:09
>
>
> Does my question make sense?
>
> Thanks in advance,
> Wayne
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>


Re: selecting records X minutes apart

От
Kevin Crain
Дата:
My approach would be to add a column for LAST_TS and place a trigger
on insert that populates this new column.  Then you have something you
can put in your WHERE clause to test on.

On Fri, Jun 3, 2011 at 12:52 PM,  <lists-pgsql@useunix.net> wrote:
> I have a table that, at a minimum, has ID and timestamp columns.  Records
> are inserted into with random IDs and timestamps.  Duplicate IDs are allowed.
>
> I want to select records grouped by ID, ordered by timestamp that are X minutes
> apart. In this case X is 5.
>
> Note, the intervals are not X minute wall clock intervals, they are X minute
> intervals from the last accepted record, per-id.
>
> For instance here is some sample input data:
>
> ID      TS (HH:MM)
> -------------------
> 0       20:00
> 1       20:03
> 1       20:04
> 0       20:05
> 1       20:05
> 0       20:08
> 1       20:09
> 0       20:10
>
> I'd want the select to return:
>
> ID      TS (HH:MM)
> -------------------
> 0       20:00
> 0       20:05
> 0       20:10
> 1       20:03
> 1       20:09
>
>
> Does my question make sense?
>
> Thanks in advance,
> Wayne
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>


Re: selecting records X minutes apart

От
lists-pgsql@useunix.net
Дата:
Let's a take a look at just the input set for ID 0.

0       20:00
0       20:05
0       20:08
0       20:10

I want records, starting from the oldest record (20:00), that are at least 5
minutes apart.  So 20:00, 20:05, 20:10 but 20:08 - 20:05 is only 3 minutes so
it is to be ignored.

I was hoping to do this with a single SQL query that renders good runtime
performance but it may not possible. But I'm by no means proficient in SQL.

On Sat, Jun 04, 2011 at 05:51:18PM -0700, Kevin Crain wrote:
> Why is (0,20:10) listed in your expected results when there is a (0,20:08)?
> 
> 
> On Fri, Jun 3, 2011 at 12:52 PM,  <lists-pgsql@useunix.net> wrote:
> > I have a table that, at a minimum, has ID and timestamp columns.  Records
> > are inserted into with random IDs and timestamps.  Duplicate IDs are allowed.
> >
> > I want to select records grouped by ID, ordered by timestamp that are X minutes
> > apart. In this case X is 5.
> >
> > Note, the intervals are not X minute wall clock intervals, they are X minute
> > intervals from the last accepted record, per-id.
> >
> > For instance here is some sample input data:
> >
> > ID      TS (HH:MM)
> > -------------------
> > 0       20:00
> > 1       20:03
> > 1       20:04
> > 0       20:05
> > 1       20:05
> > 0       20:08
> > 1       20:09
> > 0       20:10
> >
> > I'd want the select to return:
> >
> > ID      TS (HH:MM)
> > -------------------
> > 0       20:00
> > 0       20:05
> > 0       20:10
> > 1       20:03
> > 1       20:09
> >
> >
> > Does my question make sense?
> >
> > Thanks in advance,
> > Wayne
> >
> > --
> > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-sql
> >
> 
> -- 
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql


Re: selecting records X minutes apart

От
Richard Broersma
Дата:
On Sat, Jun 4, 2011 at 6:36 PM,  <lists-pgsql@useunix.net> wrote:

> 0       20:00
> 0       20:05
> 0       20:08
> 0       20:10
>
> I want records, starting from the oldest record (20:00), that are at least 5
> minutes apart.  So 20:00, 20:05, 20:10 but 20:08 - 20:05 is only 3 minutes so
> it is to be ignored.

Here is my next guess: --although I guess my use of WITH Recursive is correct

WITH RECURSIVE Accum_ts( id, ts ) AS    ( SELECT id, MIN( ts )        FROM Yourtable    GROUP BY id )
UNION ALL ( SELECT B.id, MIN( B.ts )             FROM Yourtable AS B            WHERE B.id = Accum_ts.id
ANDB.ts >= Accum_ts.ts - INTERVAL '5 MINUTES'         GROUP BY B.id             HAVING MIN( B.ts ) <= MAX( B.ts )) 
 SELECT id, ts   FROM Accum_ts
ORDER BY id, ts;



--
Regards,
Richard Broersma Jr.


Re: selecting records X minutes apart

От
Masaru Sugawara
Дата:
On Fri, 3 Jun 2011 15:52:53 -0400
lists-pgsql@useunix.net wrote:


I also think you might want to use WITH RECURSIVE clause. 
This SQL searches the case of an interval of 5 minutes or more, 
and sets a relationship between a parent to its child.


CREATE TABLE tbl(id integer, ts time) ;
INSERT INTO tbl VALUES
(0, '20:00'),
(0, '20:05'),
(0, '20:08'),
(0, '20:10'),
(0, '20:11'),
(1, '20:03'),
(1, '20:04'),
(1, '20:05'),
(1, '20:09'),
(1, '20:16');

SELECT * FROM tbl;


--
WITH RECURSIVE rec(id , ts_p, ts_c) AS (
SELECT a1.id, min(a1.ts), min(b1.ts)  FROM tbl AS a1, tbl AS b1   WHERE a1.id=b1.id AND a1.ts + interval'5 minute' <=
b1.ts  GROUP BY a1.id
 
UNION ALL
SELECT t2.id, t2.ts_p, t2.ts_c  FROM rec AS t1 INNER JOIN   (SELECT a2.id, a2.ts as ts_p, min(b2.ts) AS ts_c      FROM
tblAS a2, tbl AS b2       WHERE a2.id = b2.id AND a2.ts + interval'5 minute' <= b2.ts       GROUP BY a2.id, a2.ts
UNIONALL   SELECT a3.id, a3.ts, null      FROM tbl AS a3         ) AS t2 ON t1.id = t2.id AND t1.ts_c=t2.ts_p
 
)
SELECT DISTINCT id, ts_p AS ts FROM rec
ORDER BY 1,2;






> I have a table that, at a minimum, has ID and timestamp columns.  Records
> are inserted into with random IDs and timestamps.  Duplicate IDs are allowed. 
> 
> I want to select records grouped by ID, ordered by timestamp that are X minutes
> apart. In this case X is 5.
> 
> Note, the intervals are not X minute wall clock intervals, they are X minute
> intervals from the last accepted record, per-id.
> 
> For instance here is some sample input data:
> 
> ID    TS (HH:MM)
> -------------------
> 0    20:00
> 1    20:03
> 1    20:04
> 0    20:05
> 1    20:05
> 0    20:08
> 1    20:09
> 0    20:10
> 
> I'd want the select to return:
> 
> ID    TS (HH:MM)
> -------------------
> 0    20:00
> 0    20:05
> 0    20:10
> 1    20:03
> 1    20:09
> 
> 
> Does my question make sense?
> 
> Thanks in advance,
> Wayne
> 
> -- 
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql





Re: selecting records X minutes apart

От
Masaru Sugawara
Дата:
On Fri, 3 Jun 2011 15:52:53 -0400
lists-pgsql@useunix.net wrote:


I also think you might want to use WITH RECURSIVE clause. 
This SQL searches the case of an interval of 5 minutes or more, 
and sets a relationship between a parent to its child.


CREATE TABLE tbl(id integer, ts time) ;
INSERT INTO tbl VALUES
(0, '20:00'),
(0, '20:05'),
(0, '20:08'),
(0, '20:10'),
(0, '20:11'),   -- added as a sample. 
(1, '20:03'),
(1, '20:04'),
(1, '20:05'),
(1, '20:09'),
(1, '20:16');   -- added as a sample. 

SELECT * FROM tbl;


--
WITH RECURSIVE rec(id , ts_p, ts_c) AS (
SELECT a1.id, min(a1.ts), min(b1.ts)  FROM tbl AS a1, tbl AS b1   WHERE a1.id=b1.id AND a1.ts + interval'5 minute' <=
b1.ts  GROUP BY a1.id
 
UNION ALL
SELECT t2.id, t2.ts_p, t2.ts_c  FROM rec AS t1 INNER JOIN   (SELECT a2.id, a2.ts as ts_p, min(b2.ts) AS ts_c      FROM
tblAS a2, tbl AS b2       WHERE a2.id = b2.id AND a2.ts + interval'5 minute' <= b2.ts       GROUP BY a2.id, a2.ts
UNIONALL   SELECT a3.id, a3.ts, null      FROM tbl AS a3         ) AS t2 ON t1.id = t2.id AND t1.ts_c=t2.ts_p
 
)
SELECT DISTINCT id, ts_p AS ts FROM rec
ORDER BY 1,2;




Regards,
Masaru Sugawara


> I have a table that, at a minimum, has ID and timestamp columns.  Records
> are inserted into with random IDs and timestamps.  Duplicate IDs are allowed. 
> 
> I want to select records grouped by ID, ordered by timestamp that are X minutes
> apart. In this case X is 5.
> 
> Note, the intervals are not X minute wall clock intervals, they are X minute
> intervals from the last accepted record, per-id.
> 
> For instance here is some sample input data:
> 
> ID    TS (HH:MM)
> -------------------
> 0    20:00
> 1    20:03
> 1    20:04
> 0    20:05
> 1    20:05
> 0    20:08
> 1    20:09
> 0    20:10
> 
> I'd want the select to return:
> 
> ID    TS (HH:MM)
> -------------------
> 0    20:00
> 0    20:05
> 0    20:10
> 1    20:03
> 1    20:09
> 
> 
> Does my question make sense?
> 
> Thanks in advance,
> Wayne
> 
> -- 
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql





Re: selecting records X minutes apart

От
lists-pgsql@useunix.net
Дата:
Thank you all who replied!!  It looks like Sugawara's recursive solution does
the trick.  Unfortunately performance is quite poor for the sample dataset I'm
working with which is a table of about 50000 records.  Indeed, there are
indexes applied to the table.  I believe the recursive select is being
executed a great number of times causing the first part of the query to take a
long time.

The fastest solution I've come up with is a plpgsql procedure the loops over a
select where the result is ordered by (id,tstamp) and examines the tstamp
values and only returns rows that meet the interval criteria. This technique
takes roughly 2 seconds to filter out records over my 50000 record sample
set.... which is acceptable but not nearly as elegant as a single SQL
statement.

Again, thank you for all the replies.

Wayne

On Sun, Jun 05, 2011 at 08:52:30PM +0900, Masaru Sugawara wrote:
> On Fri, 3 Jun 2011 15:52:53 -0400
> lists-pgsql@useunix.net wrote:
> 
> 
> I also think you might want to use WITH RECURSIVE clause. 
> This SQL searches the case of an interval of 5 minutes or more, 
> and sets a relationship between a parent to its child.
> 
> 
> CREATE TABLE tbl(id integer, ts time) ;
> INSERT INTO tbl VALUES
> (0, '20:00'),
> (0, '20:05'),
> (0, '20:08'),
> (0, '20:10'),
> (0, '20:11'),
> (1, '20:03'),
> (1, '20:04'),
> (1, '20:05'),
> (1, '20:09'),
> (1, '20:16');
> 
> SELECT * FROM tbl;
> 
> 
> --
> WITH RECURSIVE rec(id , ts_p, ts_c) AS (
> SELECT a1.id, min(a1.ts), min(b1.ts)
>    FROM tbl AS a1, tbl AS b1 
>    WHERE a1.id=b1.id AND a1.ts + interval'5 minute' <= b1.ts 
>    GROUP BY a1.id
> UNION ALL
> SELECT t2.id, t2.ts_p, t2.ts_c
>    FROM rec AS t1 INNER JOIN 
>    (SELECT a2.id, a2.ts as ts_p, min(b2.ts) AS ts_c
>        FROM tbl AS a2, tbl AS b2 
>        WHERE a2.id = b2.id AND a2.ts + interval'5 minute' <= b2.ts 
>        GROUP BY a2.id, a2.ts
>     UNION ALL
>     SELECT a3.id, a3.ts, null
>        FROM tbl AS a3      
>     ) AS t2 ON t1.id = t2.id AND t1.ts_c=t2.ts_p
> )
> SELECT DISTINCT id, ts_p AS ts FROM rec
> ORDER BY 1,2;
> 
> 
> 
> 
> 
> 
> > I have a table that, at a minimum, has ID and timestamp columns.  Records
> > are inserted into with random IDs and timestamps.  Duplicate IDs are allowed. 
> > 
> > I want to select records grouped by ID, ordered by timestamp that are X minutes
> > apart. In this case X is 5.
> > 
> > Note, the intervals are not X minute wall clock intervals, they are X minute
> > intervals from the last accepted record, per-id.
> > 
> > For instance here is some sample input data:
> > 
> > ID    TS (HH:MM)
> > -------------------
> > 0    20:00
> > 1    20:03
> > 1    20:04
> > 0    20:05
> > 1    20:05
> > 0    20:08
> > 1    20:09
> > 0    20:10
> > 
> > I'd want the select to return:
> > 
> > ID    TS (HH:MM)
> > -------------------
> > 0    20:00
> > 0    20:05
> > 0    20:10
> > 1    20:03
> > 1    20:09
> > 
> > 
> > Does my question make sense?
> > 
> > Thanks in advance,
> > Wayne
> > 
> > -- 
> > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-sql
> 
> 
> 
> 
> -- 
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql


Re: selecting records X minutes apart

От
Gavin Flower
Дата:
How about this (that does not require special functions nor triggers:


DROP TABLE IF EXISTS val;

CREATE TABLE val
(    id int,    ts timestamp
);

INSERT INTO val    VALUES    (0, '1-Jan-2010 20:00'),    (1, '1-Jan-2010 20:03'),    (1, '1-Jan-2010 20:04'),    (0,
'1-Jan-201020:05'),    (1, '1-Jan-2010 20:05'),    (0, '1-Jan-2010 20:08'),    (1, '1-Jan-2010 20:09'),    (0,
'1-Jan-201020:10');
 

WITH val_first AS
(    SELECT        id,        min(ts) AS ts    FROM        val    GROUP BY        id
)
SELECT    v.id,    v.ts::time
FROM    val v,    val_first vf
WHERE    v.id = vf.id AND    EXTRACT(EPOCH FROM v.ts - vf.ts)::int % 300 = 0
ORDER BY    id,    ts;