Обсуждение: SQL Help - Finding Next Lowest Value of Current Row Value

Поиск
Список
Период
Сортировка

SQL Help - Finding Next Lowest Value of Current Row Value

От
Jeff Adams
Дата:
Greetings,

I have a large table (~19 million records). Records contains a field
identifying a vessel and a field containing an time (epoch). Using the
current rows vessel and time values, I need to be able to find the next
lowest time value for the vessel and use it to compute how much time has
elapsed between the records. I have tried a scalar subquery in the SELECT,
which works, but it runs quite slowly. Is there an alternative approach that
might perform better for this type of query. Any information would be
greatly appreciated. Thanks...

Jeff



Re: SQL Help - Finding Next Lowest Value of Current Row Value

От
Chris Travers
Дата:
On Sat, Oct 1, 2011 at 2:30 PM, Jeff Adams <Jeff.Adams@noaa.gov> wrote:
> Greetings,
>
> I have a large table (~19 million records). Records contains a field
> identifying a vessel and a field containing an time (epoch). Using the
> current rows vessel and time values, I need to be able to find the next
> lowest time value for the vessel and use it to compute how much time has
> elapsed between the records. I have tried a scalar subquery in the SELECT,
> which works, but it runs quite slowly. Is there an alternative approach that
> might perform better for this type of query. Any information would be
> greatly appreciated. Thanks...
>
How many rows per vessel?

Either way it seems to me we are talking about selecting the current
record, and then selecting the max of another record within a
timeframe.  I would probably use a plpgsql function that would do this
in two stages, ideally being able to do an index scan twice....

Best Wishes,
Chris Travers

Re: SQL Help - Finding Next Lowest Value of Current Row Value

От
Chris Curvey
Дата:
On Sat, Oct 1, 2011 at 5:30 PM, Jeff Adams <Jeff.Adams@noaa.gov> wrote:
Greetings,

I have a large table (~19 million records). Records contains a field
identifying a vessel and a field containing an time (epoch). Using the
current rows vessel and time values, I need to be able to find the next
lowest time value for the vessel and use it to compute how much time has
elapsed between the records. I have tried a scalar subquery in the SELECT,
which works, but it runs quite slowly. Is there an alternative approach that
might perform better for this type of query. Any information would be
greatly appreciated. Thanks...

Jeff



Would a self-join with a MAX() help, like this?  (Where "v" is your vessel_id and "e" is your time value?)

create table stuff
(  v int
,  e timestamp
);

insert into stuff (v, e) values (1, '1/1/2011');
insert into stuff (v, e) values (1, '1/2/2011');
insert into stuff (v, e) values (1, '1/3/2011');

insert into stuff (v, e) values (2, '2/1/2011');
insert into stuff (v, e) values (2, '2/2/2011');

select a.v, a.e, max(b.e), a.e - max(b.e)
from stuff a
join stuff b on a.v = b.v
where a.e > b.e
group by a.v, a.e
 
I don't have a multi-million row table handy, but I'd be interested to hear your results.

-- 
e-Mail is the equivalent of a postcard written in pencil.  This message may not have been sent by me, or intended for you.  It may have been read or even modified while in transit.  e-Mail disclaimers have the same force in law as a note passed in study hall.  If your corporate attorney says that you need an disclaimer in your signature, you need a new corporate attorney.

Re: SQL Help - Finding Next Lowest Value of Current Row Value

От
Thomas Kellerer
Дата:
Jeff Adams wrote on 01.10.2011 23:30:
> Greetings,
>
> I have a large table (~19 million records). Records contains a field
> identifying a vessel and a field containing an time (epoch). Using the
> current rows vessel and time values, I need to be able to find the next
> lowest time value for the vessel and use it to compute how much time has
> elapsed between the records. I have tried a scalar subquery in the SELECT,
> which works, but it runs quite slowly. Is there an alternative approach that
> might perform better for this type of query. Any information would be
> greatly appreciated. Thanks...
>
> Jeff

Something like:


SELECT vessel,
        time_column,
        time_column - lag(time_column) over (partition by vessel order by time_column) as diff
FROM your_table

No sure how good that performs though.



Re: SQL Help - Finding Next Lowest Value of Current Row Value

От
Jeff Adams
Дата:

Thanks for the suggestions Chris (and Chris). After a bit more investigation I stumbled upon the Window functions. The approach below turned out to be much more efficient that a function or self join approach. The SQL that I used is provided below (event_id and mmsi uniquely identify a vessel transit for which I wished to compute how much time had elapsed between successive records):

 

SELECT

  a.event_id,

  a.mmsi,

  (a.epoch - lag(epoch) OVER (PARTITION BY event_id, mmsi ORDER BY epoch ASC))/60 AS elapsed

FROM

   dmas_ais a

 

Jeff

 

From: ccurvey@gmail.com [mailto:ccurvey@gmail.com] On Behalf Of Chris Curvey
Sent: Saturday, October 01, 2011 10:55 PM
To: Jeff Adams
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] SQL Help - Finding Next Lowest Value of Current Row Value

 

On Sat, Oct 1, 2011 at 5:30 PM, Jeff Adams <Jeff.Adams@noaa.gov> wrote:

Greetings,

I have a large table (~19 million records). Records contains a field
identifying a vessel and a field containing an time (epoch). Using the
current rows vessel and time values, I need to be able to find the next
lowest time value for the vessel and use it to compute how much time has
elapsed between the records. I have tried a scalar subquery in the SELECT,
which works, but it runs quite slowly. Is there an alternative approach that
might perform better for this type of query. Any information would be
greatly appreciated. Thanks...

Jeff

 

Would a self-join with a MAX() help, like this?  (Where "v" is your vessel_id and "e" is your time value?)

 

create table stuff

(  v int

,  e timestamp

);

 

insert into stuff (v, e) values (1, '1/1/2011');

insert into stuff (v, e) values (1, '1/2/2011');

insert into stuff (v, e) values (1, '1/3/2011');

 

insert into stuff (v, e) values (2, '2/1/2011');

insert into stuff (v, e) values (2, '2/2/2011');

 

select a.v, a.e, max(b.e), a.e - max(b.e)

from stuff a

join stuff b on a.v = b.v

where a.e > b.e

group by a.v, a.e

 

I don't have a multi-million row table handy, but I'd be interested to hear your results.

 

-- 

e-Mail is the equivalent of a postcard written in pencil.  This message may not have been sent by me, or intended for you.  It may have been read or even modified while in transit.  e-Mail disclaimers have the same force in law as a note passed in study hall.  If your corporate attorney says that you need an disclaimer in your signature, you need a new corporate attorney.