Обсуждение: Picking out the most recent row using a time stamp column

Поиск

Список

Период

Сортировка

Picking out the most recent row using a time stamp column

От

Dave Crooke

Дата:

24 февраля 2011 г., 18:55:42

Hi foks

This is an old chestnut which I've found a number of online threads for, and never seen a clever answer to. It seems a common enough idiom that there might be some slicker way to do it, so I thought I might inquire with this august group if such a clever answer exists ....

Consider the following table

create table data
   (id_key int,
    time_stamp timestamp without time zone,
    value double precision);

create unique index data_idx on data (id_key, time_stamp);

with around 1m rows, with 3500 or so distinct values of id_key.

I need to find the most recent value for each distinct value of id_key. There is no elegant (that I know of) syntax for this, and there are two ways I've typically seen it done:

1. Use a dependent subquery to find the most recent time stamp, i.e.

select
   a.id_key, a.time_stamp, a.value
from
   data a
where
a.time_stamp=
     (select max(time_stamp)
      from data b
      where a.id_key=b.id_key)

2. Define a temporary table / view with the most recent time stamp for each key, and join against it:

select
   a.id_key, a.time_stamp, a.value
from
   data a,
   (select id_key, max(time_stamp) as mts
    from data group by id_key) b
where
   a.id_key=b.id_key and a.time_stamp=b.mts

I've found that for my data set, PG 8.4.2 selects the "obvious" / "do it as written" plan in each case, and that method 2. is much quicker (2.6 sec vs. 2 min on my laptop) ....

Is there a more elegant way to write this, perhaps using PG-specific extensions?

Cheers
Dave

Re: Picking out the most recent row using a time stamp column

От

Merlin Moncure

Дата:

24 февраля 2011 г., 19:11:38

On Thu, Feb 24, 2011 at 1:55 PM, Dave Crooke <dcrooke@gmail.com> wrote:
> Hi foks
>
> This is an old chestnut which I've found a number of online threads for, and
> never seen a clever answer to. It seems a common enough idiom that there
> might be some slicker way to do it, so I thought I might inquire with this
> august group if such a clever answer exists ....
>
> Consider the following table
>
> create table data
>    (id_key int,
>     time_stamp timestamp without time zone,
>     value double precision);
>
> create unique index data_idx on data (id_key, time_stamp);
>
> with around 1m rows, with 3500 or so distinct values of id_key.
>
> I need to find the most recent value for each distinct value of id_key.
> There is no elegant (that I know of) syntax for this, and there are two ways
> I've typically seen it done:
>
> 1. Use a dependent subquery to find the most recent time stamp, i.e.
>
> select
>    a.id_key, a.time_stamp, a.value
> from
>    data a
> where
>   a.time_stamp=
>      (select max(time_stamp)
>       from data b
>       where a.id_key=b.id_key)
>
> 2. Define a temporary table / view with the most recent time stamp for each
> key, and join against it:
>
> select
>    a.id_key, a.time_stamp, a.value
> from
>    data a,
>    (select id_key, max(time_stamp) as mts
>     from data group by id_key) b
> where
>    a.id_key=b.id_key and a.time_stamp=b.mts
>
> I've found that for my data set, PG 8.4.2 selects the "obvious" / "do it as
> written" plan in each case, and that method 2. is much quicker (2.6 sec vs.
> 2 min on my laptop) ....
>
> Is there a more elegant way to write this, perhaps using PG-specific
> extensions?

one pg specific method that a lot of people overlook for this sort of
problem is custom aggregates.

create or replace function maxfoo(foo, foo) returns foo as
$$
  select case when $1.t > $2.t then $1 else $2 end;
$$ language sql immutable;

create aggregate aggfoo(foo)
(
  sfunc=maxfoo,
  stype=foo
);

create table foo(id int, t timestamptz default now());
insert into foo values (1);
insert into foo values (1);

select (f).* from (select aggfoo(foo) as f from foo group by id) q;

postgres=# select (f).* from (select aggfoo(foo) as f from foo group by id) q;
 id |             t
----+----------------------------
  1 | 2011-02-24 14:01:20.051-06
(1 row)


where this approach can be useful is when you have a very complicated
aggregation condition that can be awkward to express in a join.

merlin

Re: Picking out the most recent row using a time stamp column

От

"Kevin Grittner"

Дата:

24 февраля 2011 г., 19:19:11

Dave Crooke <dcrooke@gmail.com> wrote:

> create table data
>    (id_key int,
>     time_stamp timestamp without time zone,
>     value double precision);
>
> create unique index data_idx on data (id_key, time_stamp);

> I need to find the most recent value for each distinct value of
> id_key.

Well, unless you use timestamp WITH time zone, you might not be able
to do that at all.  There are very few places where timestamp
WITHOUT time zone actually makes sense.

> There is no elegant (that I know of) syntax for this

How about this?:

select distinct on (id_key) * from data order by id_key, time_stamp;

> select
>    a.id_key, a.time_stamp, a.value
> from
>    data a
> where
>   a.time_stamp=
>      (select max(time_stamp)
>       from data b
>       where a.id_key=b.id_key)

Rather than the above, I typically find this much faster:

select
   a.id_key, a.time_stamp, a.value
from
   data a
where not exists
  (select * from data b
   where b.id_key=a.id_key and b.time_stamp > a.time_stamp)

-Kevin

Re: Picking out the most recent row using a time stamp column

От

Michael Glaesemann

Дата:

24 февраля 2011 г., 19:22:03

On Feb 24, 2011, at 14:55, Dave Crooke wrote:

> Is there a more elegant way to write this, perhaps using PG-specific
> extensions?

SELECT DISTINCT ON (data.id_key)
       data.id_key, data.time_stamp, data.value
  FROM data
  ORDER BY data.id_key, data.time_stamp DESC;

Michael Glaesemann
grzm seespotcode net

Re: Picking out the most recent row using a time stamp column

От

"Kevin Grittner"

Дата:

24 февраля 2011 г., 19:24:28

Michael Glaesemann <grzm@seespotcode.net> wrote:

> SELECT DISTINCT ON (data.id_key)
>        data.id_key, data.time_stamp, data.value
>   FROM data
>   ORDER BY data.id_key, data.time_stamp DESC;

Dang!  I forgot the DESC in my post!  Thanks for showing the
*correct* version.

-Kevin

Re: Picking out the most recent row using a time stamp column

От

Merlin Moncure

Дата:

24 февраля 2011 г., 20:20:38

On Thu, Feb 24, 2011 at 2:18 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Dave Crooke <dcrooke@gmail.com> wrote:
>
>> create table data
>>    (id_key int,
>>     time_stamp timestamp without time zone,
>>     value double precision);
>>
>> create unique index data_idx on data (id_key, time_stamp);
>
>> I need to find the most recent value for each distinct value of
>> id_key.
>
> Well, unless you use timestamp WITH time zone, you might not be able
> to do that at all.  There are very few places where timestamp
> WITHOUT time zone actually makes sense.
>
>> There is no elegant (that I know of) syntax for this
>
> How about this?:
>
> select distinct on (id_key) * from data order by id_key, time_stamp;
>
>> select
>>    a.id_key, a.time_stamp, a.value
>> from
>>    data a
>> where
>>   a.time_stamp=
>>      (select max(time_stamp)
>>       from data b
>>       where a.id_key=b.id_key)
>
> Rather than the above, I typically find this much faster:
>
> select
>   a.id_key, a.time_stamp, a.value
> from
>   data a
> where not exists
>  (select * from data b
>   where b.id_key=a.id_key and b.time_stamp > a.time_stamp)

hm. not only is it faster, but much more flexible...that's definitely
the way to go.

merlin

Re: Picking out the most recent row using a time stamp column

От

Dave Crooke

Дата:

24 февраля 2011 г., 22:38:46

Thanks to all .... I had a tickling feeling at the back of my mind that there was a neater answer here. For the record, times (all from in-memory cached data, averaged over a bunch of runs):

Dependent subquery = 117.9 seconds
Join to temp table = 2.7 sec
DISTINCT ON = 2.7 sec

So the DISTINCT ON may not be quicker, but it sure is tidier.

Cheers
Dave

On Thu, Feb 24, 2011 at 2:24 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:

Michael Glaesemann <grzm@seespotcode.net> wrote:

> SELECT DISTINCT ON (data.id_key)
> data.id_key, data.time_stamp, data.value
> FROM data
> ORDER BY data.id_key, data.time_stamp DESC;

Dang! I forgot the DESC in my post! Thanks for showing the
*correct* version.

-Kevin

Re: Picking out the most recent row using a time stamp column

От

Josh Berkus

Дата:

24 февраля 2011 г., 23:20:20

On 2/24/11 3:38 PM, Dave Crooke wrote:
> Thanks to all .... I had a tickling feeling at the back of my mind that
> there was a neater answer here. For the record, times (all from
> in-memory cached data, averaged over a bunch of runs):
>
> Dependent subquery = 117.9 seconds
> Join to temp table = 2.7 sec
> DISTINCT ON = 2.7 sec

But wait, there's more!  You haven't tested the Windowing Function
solution.  I'll bet it's even faster.


SELECT id_key, time_stamp, value
FROM (
    SELECT id_key, time_stamp, value,
        row_number()
        OVER ( PARTITION BY id_key
            ORDER BY time_stamp DESC)
         as ranking
    FROM thetable
      ) as filtered_table
WHERE ranking = 1

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

Re: Picking out the most recent row using a time stamp column

От

Shaun Thomas

Дата:

24 февраля 2011 г., 23:58:49

On 02/24/2011 06:20 PM, Josh Berkus wrote:

> SELECT id_key, time_stamp, value
> FROM (
>     SELECT id_key, time_stamp, value,
>         row_number()
>         OVER ( PARTITION BY id_key
>             ORDER BY time_stamp DESC)
>           as ranking
>     FROM thetable
>        ) as filtered_table
> WHERE ranking = 1

Why did you use row_number instead of rank?

I am now curious how the speed compares though. I still think the
DISTINCT ON will be faster, but it would be a great surprise.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

Re: Picking out the most recent row using a time stamp column

От

Josh Berkus

Дата:

25 февраля 2011 г., 00:52:38

> Why did you use row_number instead of rank?

Because I assumed he only wanted one row in the event of ties.

Hmmm, although with that schema, there won't be ties.  So it's pretty
much arbitrary then.

> I am now curious how the speed compares though. I still think the
> DISTINCT ON will be faster, but it would be a great surprise.

Hopefully we'll find out!  The windowing functions are usually much
faster for me.  I think in 9.0 or 9.1 someone replumbed DISTINCT ON to
use a bunch of the window function internals, at which point it'll cease
to matter.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

Re: Picking out the most recent row using a time stamp column

От

Dave Johansen

Дата:

25 февраля 2011 г., 15:50:50

On Thu, Feb 24, 2011 at 4:38 PM, Dave Crooke <dcrooke@gmail.com> wrote:

Thanks to all .... I had a tickling feeling at the back of my mind that there was a neater answer here. For the record, times (all from in-memory cached data, averaged over a bunch of runs):

Dependent subquery = 117.9 seconds
Join to temp table = 2.7 sec
DISTINCT ON = 2.7 sec

So the DISTINCT ON may not be quicker, but it sure is tidier.

Cheers
Dave

I'm using 8.3.3 and I have a similar sort of setup and just thought I'd add another point of reference, here's the timing from doing the same sort of queries on my dataset of ~700,000 records with ~10,000 unique "id_key"s.

I also added a 4th version that uses a permanent table that's auto-populated by a trigger with the rid of the most recent entry from the main table, so it's a simple join to get the latest entries.

Dependent subquery = (killed it after it ran for over 10 minutes)
Join on temp table = 1.5 seconds
DISTINCT ON = 2.9 seconds
Join on auto-populated table = 0.8 seconds

Dave

Re: Picking out the most recent row using a time stamp column

От

Dave Crooke

Дата:

25 февраля 2011 г., 19:52:05

Hi Dave

Yes, 100% the best solution .... I did the same thing a while back, I just have a separate copy of the data in a "latest" table and the Java code just runs a second SQL statement to update it when writing a new record (I've never been a trigger fan).

I found myself looking at the "find the latest" query again though in the process of building a "demo mode" into our application, which will replay a finite set of data on a rolling loop by moving it forward in time, and also has to simulate the continuous updating of the "latest" table so the the business logic will be appropriately fooled.

My next tweak will be to cache the "latest" table in the Java layer ;-)

Cheers
Dave

On Fri, Feb 25, 2011 at 10:50 AM, Dave Johansen <davejohansen@gmail.com> wrote:

On Thu, Feb 24, 2011 at 4:38 PM, Dave Crooke <dcrooke@gmail.com> wrote:
Thanks to all .... I had a tickling feeling at the back of my mind that there was a neater answer here. For the record, times (all from in-memory cached data, averaged over a bunch of runs):

Dependent subquery = 117.9 seconds
Join to temp table = 2.7 sec
DISTINCT ON = 2.7 sec

So the DISTINCT ON may not be quicker, but it sure is tidier.

Cheers
Dave

I'm using 8.3.3 and I have a similar sort of setup and just thought I'd add another point of reference, here's the timing from doing the same sort of queries on my dataset of ~700,000 records with ~10,000 unique "id_key"s.

I also added a 4th version that uses a permanent table that's auto-populated by a trigger with the rid of the most recent entry from the main table, so it's a simple join to get the latest entries.

Dependent subquery = (killed it after it ran for over 10 minutes)
Join on temp table = 1.5 seconds
DISTINCT ON = 2.9 seconds
Join on auto-populated table = 0.8 seconds

Dave

Re: Picking out the most recent row using a time stamp column

От

Dave Johansen

Дата:

26 февраля 2011 г., 12:44:37

On Fri, Feb 25, 2011 at 1:45 PM, Dave Crooke <dcrooke@gmail.com> wrote:

Hi Dave

Yes, 100% the best solution .... I did the same thing a while back, I just have a separate copy of the data in a "latest" table and the Java code just runs a second SQL statement to update it when writing a new record (I've never been a trigger fan).

I found myself looking at the "find the latest" query again though in the process of building a "demo mode" into our application, which will replay a finite set of data on a rolling loop by moving it forward in time, and also has to simulate the continuous updating of the "latest" table so the the business logic will be appropriately fooled.

My next tweak will be to cache the "latest" table in the Java layer ;-)

Cheers
Dave

Our application has what sounds like a similar functionality that we call "playback". The way that we did it was to have a schema called "playback" with identical tables to those that we want to have repopulated. All the other tables exist in only the "public" schema and then we don't have to do any duplication of that data. Then during playback it just runs a query to copy from the "public" table to the "playback" table and the trigger will populate the "latest" table in the "playback" schema automatically just like when the program is running normally and populating the "public" version.

The secret sauce comes in by setting "SET search_path TO playback, public;" because then your application runs all the same queries to get the data and doesn't have to know that anything different is going on other than the copy coperation that it's doing. It's nice because it takes all of the data management burden off of the application and then allows the database to do the hard work for you. It's obviously not the perfect solution but it wasn't too hard to setup and we've really liked the way it works.

Dave

Re: Picking out the most recent row using a time stamp column

От

Josh Berkus

Дата:

26 февраля 2011 г., 20:06:15

Dave,

Why not test the windowing version I posted?

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

Re: Picking out the most recent row using a time stamp column

От

Dave Johansen

Дата:

26 февраля 2011 г., 20:38:14

Unfortunately, I'm running 8.3.3 and to my knowledge the windowing stuff wasn't added til 8.4.
Dave

On Feb 26, 2011 2:06 PM, "Josh Berkus" <josh@agliodbs.com> wrote:
> Dave,
>
> Why not test the windowing version I posted?
>
> --
> -- Josh Berkus
> PostgreSQL Experts Inc.
> http://www.pgexperts.com

Re: Picking out the most recent row using a time stamp column

От

Florian Weimer

Дата:

26 февраля 2011 г., 20:39:12

* Kevin Grittner:

> Dave Crooke <dcrooke@gmail.com> wrote:
>
>> create table data
>>    (id_key int,
>>     time_stamp timestamp without time zone,
>>     value double precision);
>>
>> create unique index data_idx on data (id_key, time_stamp);
>
>> I need to find the most recent value for each distinct value of
>> id_key.
>
> Well, unless you use timestamp WITH time zone, you might not be able
> to do that at all.  There are very few places where timestamp
> WITHOUT time zone actually makes sense.

I don't think PostgreSQL keeps track of actual time zone values, just
as it doesn't keep track of the character encoding of TEXT columns.
Unless suppressed with WITHOUT TIME ZONE, PostgreSQL makes up some
time zone on demand.  This makes TIMESTAMP WITH TIME ZONE not that
useful, and it's often to use TIMESTAMP WITHOUT TIME ZONE with times
in UTC.

Re: Picking out the most recent row using a time stamp column

От

Dave Johansen

Дата:

05 апреля 2013 г., 19:54:19

On Sat, Feb 26, 2011 at 2:38 PM, Dave Johansen <davejohansen@gmail.com>
wrote:
>
> Unfortunately, I'm running 8.3.3 and to my knowledge the windowing stuff
> wasn't added til 8.4.
> Dave
>
> On Feb 26, 2011 2:06 PM, "Josh Berkus" <josh@agliodbs.com> wrote:
> > Dave,
> >
> > Why not test the windowing version I posted?

We finally have moved over to 8.4 and so I just wanted to post the
time comparison numbers to show the times on 8.4 as well. This is also
a newer data set with ~700k rows and ~4k distinct id_key values.

1) Dependent subquery
SELECT a.id_key, a.time_stamp, a.value FROM data AS a WHERE
a.time_stamp = (SELECT MAX(time_stamp) FROM data AS b WHERE a.id_key =
b.id_key);
8.3.3: Killed it after a few minutes
8.4.13: Killed it after a few minutes

2) Join against temporary table
SELECT a.id_key, a.time_stamp, a.value FROM data AS a JOIN (SELECT
id_key, MAX(time_stamp) AS max_time_stamp FROM data GROUP BY id_key)
AS b WHERE a.id_key = b.id_key AND a.time_stamp = b.max_time_stamp;
8.3.3: 1.4 s
8.4.13: 0.5 s

3) DISTINCT ON:
SELECT DISTINCT ON (id_key) id_key, time_stamp, value FROM data ORDER
BY id_key, time_stamp DESC;
Without Index:
8.3.3: 34.1 s
8.4.13: 98.7 s
With Index (data(id_key, time_stamp DESC)):
8.3.3: 3.4 s
8.4.13: 1.3 s

4) Auto-populated table
SELECT id_key, time_stamp, value FROM data WHERE rid IN (SELECT rid
FROM latestdata);
8.3.3: 0.2 s
8.4.13: 0.06 s

5) Windowing
SELECT id_key, time_stamp, value FROM (SELECT id_key, time_stamp,
value, row_number() OVER (PARTITION BY id_key ORDER BY time_stamp
DESC) AS ranking FROM data) AS a WHERE ranking=1;
8.3.3: N/A
8.4.13: 1.6 s

So the auto-populated table (#4) is the fastest by an order of
magnitude, but the join against the temporary table (#2) is the next
best option based on speed and doesn't require the extra multi-column
index that DISTINCT ON (#3) does.

On a related note though, is there a way to make the multi-column
index used in the DISTINCT ON more efficient. Based on the results, it
appears that the multi-column index is actually a single index with
the ordering of the tree based on the first value and then the second
value. Is there a way to make it be a "multi-level index"? What I mean
is that the first value is basically a tree/hash that then points to
the second index because if that's possible then that would probably
make the DISTINCT ON (#3) version as fast or faster than the
auto-populated table (#4). Is there a way to create an index like that
in postgres?

Thanks,
Dave

Re: Picking out the most recent row using a time stamp column

От

Merlin Moncure

Дата:

05 апреля 2013 г., 21:40:24

On Fri, Apr 5, 2013 at 11:54 AM, Dave Johansen <davejohansen@gmail.com> wrote:
> On Sat, Feb 26, 2011 at 2:38 PM, Dave Johansen <davejohansen@gmail.com>
> wrote:
>>
>> Unfortunately, I'm running 8.3.3 and to my knowledge the windowing stuff
>> wasn't added til 8.4.
>> Dave
>>
>> On Feb 26, 2011 2:06 PM, "Josh Berkus" <josh@agliodbs.com> wrote:
>> > Dave,
>> >
>> > Why not test the windowing version I posted?
>
> We finally have moved over to 8.4 and so I just wanted to post the
> time comparison numbers to show the times on 8.4 as well. This is also
> a newer data set with ~700k rows and ~4k distinct id_key values.
>
> 1) Dependent subquery
> SELECT a.id_key, a.time_stamp, a.value FROM data AS a WHERE
> a.time_stamp = (SELECT MAX(time_stamp) FROM data AS b WHERE a.id_key =
> b.id_key);
> 8.3.3: Killed it after a few minutes
> 8.4.13: Killed it after a few minutes
>
> 2) Join against temporary table
> SELECT a.id_key, a.time_stamp, a.value FROM data AS a JOIN (SELECT
> id_key, MAX(time_stamp) AS max_time_stamp FROM data GROUP BY id_key)
> AS b WHERE a.id_key = b.id_key AND a.time_stamp = b.max_time_stamp;
> 8.3.3: 1.4 s
> 8.4.13: 0.5 s
>
> 3) DISTINCT ON:
> SELECT DISTINCT ON (id_key) id_key, time_stamp, value FROM data ORDER
> BY id_key, time_stamp DESC;
> Without Index:
> 8.3.3: 34.1 s
> 8.4.13: 98.7 s
> With Index (data(id_key, time_stamp DESC)):
> 8.3.3: 3.4 s
> 8.4.13: 1.3 s
>
> 4) Auto-populated table
> SELECT id_key, time_stamp, value FROM data WHERE rid IN (SELECT rid
> FROM latestdata);
> 8.3.3: 0.2 s
> 8.4.13: 0.06 s
>
> 5) Windowing
> SELECT id_key, time_stamp, value FROM (SELECT id_key, time_stamp,
> value, row_number() OVER (PARTITION BY id_key ORDER BY time_stamp
> DESC) AS ranking FROM data) AS a WHERE ranking=1;
> 8.3.3: N/A
> 8.4.13: 1.6 s

I would also test:

*) EXISTS()

SELECT a.id_key, a.time_stamp, a.value FROM data
WHERE NOT EXISTS
(
  SELECT 1 FROM data b
  WHERE
    a.id_key = b.id_key
    and b.time_stamp > a.time_stamp
);

*) custom aggregate (this will not be the fastest option but is a good
technique to know -- it can be a real life saver when selection
criteria is complex)

CREATE FUNCTION agg_latest_data(data, data) returns data AS
$$
  SELECT CASE WHEN $1 > $2 THEN $1 ELSE $2 END;
$$ LANGUAGE SQL IMMUTABLE;

CREATE AGGREGATE latest_data (
  SFUNC=agg_latest_data,
  STYPE=data
);

SELECT latest_data(d) FROM data d group by d.id_key;

the above returns the composite, not the fields, but that can be worked around.

merlin

Re: Picking out the most recent row using a time stamp column

От

Dave Johansen

Дата:

09 апреля 2013 г., 00:10:58

On Fri, Apr 5, 2013 at 11:40 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>
> On Fri, Apr 5, 2013 at 11:54 AM, Dave Johansen <davejohansen@gmail.com> wrote:
> > On Sat, Feb 26, 2011 at 2:38 PM, Dave Johansen <davejohansen@gmail.com>
> > wrote:
> >>
> >> Unfortunately, I'm running 8.3.3 and to my knowledge the windowing stuff
> >> wasn't added til 8.4.
> >> Dave
> >>
> >> On Feb 26, 2011 2:06 PM, "Josh Berkus" <josh@agliodbs.com> wrote:
> >> > Dave,
> >> >
> >> > Why not test the windowing version I posted?
> >
> > We finally have moved over to 8.4 and so I just wanted to post the
> > time comparison numbers to show the times on 8.4 as well. This is also
> > a newer data set with ~700k rows and ~4k distinct id_key values.
> >
> > 1) Dependent subquery
> > SELECT a.id_key, a.time_stamp, a.value FROM data AS a WHERE
> > a.time_stamp = (SELECT MAX(time_stamp) FROM data AS b WHERE a.id_key =
> > b.id_key);
> > 8.3.3: Killed it after a few minutes
> > 8.4.13: Killed it after a few minutes
> >
> > 2) Join against temporary table
> > SELECT a.id_key, a.time_stamp, a.value FROM data AS a JOIN (SELECT
> > id_key, MAX(time_stamp) AS max_time_stamp FROM data GROUP BY id_key)
> > AS b WHERE a.id_key = b.id_key AND a.time_stamp = b.max_time_stamp;
> > 8.3.3: 1.4 s
> > 8.4.13: 0.5 s
> >
> > 3) DISTINCT ON:
> > SELECT DISTINCT ON (id_key) id_key, time_stamp, value FROM data ORDER
> > BY id_key, time_stamp DESC;
> > Without Index:
> > 8.3.3: 34.1 s
> > 8.4.13: 98.7 s
> > With Index (data(id_key, time_stamp DESC)):
> > 8.3.3: 3.4 s
> > 8.4.13: 1.3 s
> >
> > 4) Auto-populated table
> > SELECT id_key, time_stamp, value FROM data WHERE rid IN (SELECT rid
> > FROM latestdata);
> > 8.3.3: 0.2 s
> > 8.4.13: 0.06 s
> >
> > 5) Windowing
> > SELECT id_key, time_stamp, value FROM (SELECT id_key, time_stamp,
> > value, row_number() OVER (PARTITION BY id_key ORDER BY time_stamp
> > DESC) AS ranking FROM data) AS a WHERE ranking=1;
> > 8.3.3: N/A
> > 8.4.13: 1.6 s
>
> I would also test:
>
> *) EXISTS()
>
> SELECT a.id_key, a.time_stamp, a.value FROM data
> WHERE NOT EXISTS
> (
>   SELECT 1 FROM data b
>   WHERE
>     a.id_key = b.id_key
>     and b.time_stamp > a.time_stamp
> );

I tried this and it was slow:
8.3.3: 674.4 s
8.4.13: 40.4 s

>
> *) custom aggregate (this will not be the fastest option but is a good
> technique to know -- it can be a real life saver when selection
> criteria is complex)
>
> CREATE FUNCTION agg_latest_data(data, data) returns data AS
> $$
>   SELECT CASE WHEN $1 > $2 THEN $1 ELSE $2 END;
> $$ LANGUAGE SQL IMMUTABLE;
>
> CREATE AGGREGATE latest_data (
>   SFUNC=agg_latest_data,
>   STYPE=data
> );
>
> SELECT latest_data(d) FROM data d group by d.id_key;
>
> the above returns the composite, not the fields, but that can be worked around.

My real table actually returns/needs all the values from the row so I
didn't feel like messing with aggregate stuff.

Thanks,
Dave

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Picking out the most recent row using a time stamp column