Обсуждение: Picking out the most recent row using a time stamp column
Hi foks
This is an old chestnut which I've found a number of online threads for, and never seen a clever answer to. It seems a common enough idiom that there might be some slicker way to do it, so I thought I might inquire with this august group if such a clever answer exists ....
Consider the following table
create table data
(id_key int,
time_stamp timestamp without time zone,
value double precision);
create unique index data_idx on data (id_key, time_stamp);
with around 1m rows, with 3500 or so distinct values of id_key.
I need to find the most recent value for each distinct value of id_key. There is no elegant (that I know of) syntax for this, and there are two ways I've typically seen it done:
1. Use a dependent subquery to find the most recent time stamp, i.e.
select
a.id_key, a.time_stamp, a.value
from
data a
where
a.time_stamp=
(select max(time_stamp)
from data b
where a.id_key=b.id_key)
2. Define a temporary table / view with the most recent time stamp for each key, and join against it:
select
a.id_key, a.time_stamp, a.value
from
data a,
(select id_key, max(time_stamp) as mts
from data group by id_key) b
where
a.id_key=b.id_key and a.time_stamp=b.mts
I've found that for my data set, PG 8.4.2 selects the "obvious" / "do it as written" plan in each case, and that method 2. is much quicker (2.6 sec vs. 2 min on my laptop) ....
Is there a more elegant way to write this, perhaps using PG-specific extensions?
Cheers
Dave
This is an old chestnut which I've found a number of online threads for, and never seen a clever answer to. It seems a common enough idiom that there might be some slicker way to do it, so I thought I might inquire with this august group if such a clever answer exists ....
Consider the following table
create table data
(id_key int,
time_stamp timestamp without time zone,
value double precision);
create unique index data_idx on data (id_key, time_stamp);
with around 1m rows, with 3500 or so distinct values of id_key.
I need to find the most recent value for each distinct value of id_key. There is no elegant (that I know of) syntax for this, and there are two ways I've typically seen it done:
1. Use a dependent subquery to find the most recent time stamp, i.e.
select
a.id_key, a.time_stamp, a.value
from
data a
where
a.time_stamp=
(select max(time_stamp)
from data b
where a.id_key=b.id_key)
2. Define a temporary table / view with the most recent time stamp for each key, and join against it:
select
a.id_key, a.time_stamp, a.value
from
data a,
(select id_key, max(time_stamp) as mts
from data group by id_key) b
where
a.id_key=b.id_key and a.time_stamp=b.mts
I've found that for my data set, PG 8.4.2 selects the "obvious" / "do it as written" plan in each case, and that method 2. is much quicker (2.6 sec vs. 2 min on my laptop) ....
Is there a more elegant way to write this, perhaps using PG-specific extensions?
Cheers
Dave
On Thu, Feb 24, 2011 at 1:55 PM, Dave Crooke <dcrooke@gmail.com> wrote: > Hi foks > > This is an old chestnut which I've found a number of online threads for, and > never seen a clever answer to. It seems a common enough idiom that there > might be some slicker way to do it, so I thought I might inquire with this > august group if such a clever answer exists .... > > Consider the following table > > create table data > (id_key int, > time_stamp timestamp without time zone, > value double precision); > > create unique index data_idx on data (id_key, time_stamp); > > with around 1m rows, with 3500 or so distinct values of id_key. > > I need to find the most recent value for each distinct value of id_key. > There is no elegant (that I know of) syntax for this, and there are two ways > I've typically seen it done: > > 1. Use a dependent subquery to find the most recent time stamp, i.e. > > select > a.id_key, a.time_stamp, a.value > from > data a > where > a.time_stamp= > (select max(time_stamp) > from data b > where a.id_key=b.id_key) > > 2. Define a temporary table / view with the most recent time stamp for each > key, and join against it: > > select > a.id_key, a.time_stamp, a.value > from > data a, > (select id_key, max(time_stamp) as mts > from data group by id_key) b > where > a.id_key=b.id_key and a.time_stamp=b.mts > > I've found that for my data set, PG 8.4.2 selects the "obvious" / "do it as > written" plan in each case, and that method 2. is much quicker (2.6 sec vs. > 2 min on my laptop) .... > > Is there a more elegant way to write this, perhaps using PG-specific > extensions? one pg specific method that a lot of people overlook for this sort of problem is custom aggregates. create or replace function maxfoo(foo, foo) returns foo as $$ select case when $1.t > $2.t then $1 else $2 end; $$ language sql immutable; create aggregate aggfoo(foo) ( sfunc=maxfoo, stype=foo ); create table foo(id int, t timestamptz default now()); insert into foo values (1); insert into foo values (1); select (f).* from (select aggfoo(foo) as f from foo group by id) q; postgres=# select (f).* from (select aggfoo(foo) as f from foo group by id) q; id | t ----+---------------------------- 1 | 2011-02-24 14:01:20.051-06 (1 row) where this approach can be useful is when you have a very complicated aggregation condition that can be awkward to express in a join. merlin
Dave Crooke <dcrooke@gmail.com> wrote: > create table data > (id_key int, > time_stamp timestamp without time zone, > value double precision); > > create unique index data_idx on data (id_key, time_stamp); > I need to find the most recent value for each distinct value of > id_key. Well, unless you use timestamp WITH time zone, you might not be able to do that at all. There are very few places where timestamp WITHOUT time zone actually makes sense. > There is no elegant (that I know of) syntax for this How about this?: select distinct on (id_key) * from data order by id_key, time_stamp; > select > a.id_key, a.time_stamp, a.value > from > data a > where > a.time_stamp= > (select max(time_stamp) > from data b > where a.id_key=b.id_key) Rather than the above, I typically find this much faster: select a.id_key, a.time_stamp, a.value from data a where not exists (select * from data b where b.id_key=a.id_key and b.time_stamp > a.time_stamp) -Kevin
On Feb 24, 2011, at 14:55, Dave Crooke wrote: > Is there a more elegant way to write this, perhaps using PG-specific > extensions? SELECT DISTINCT ON (data.id_key) data.id_key, data.time_stamp, data.value FROM data ORDER BY data.id_key, data.time_stamp DESC; Michael Glaesemann grzm seespotcode net
Michael Glaesemann <grzm@seespotcode.net> wrote: > SELECT DISTINCT ON (data.id_key) > data.id_key, data.time_stamp, data.value > FROM data > ORDER BY data.id_key, data.time_stamp DESC; Dang! I forgot the DESC in my post! Thanks for showing the *correct* version. -Kevin
On Thu, Feb 24, 2011 at 2:18 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Dave Crooke <dcrooke@gmail.com> wrote: > >> create table data >> (id_key int, >> time_stamp timestamp without time zone, >> value double precision); >> >> create unique index data_idx on data (id_key, time_stamp); > >> I need to find the most recent value for each distinct value of >> id_key. > > Well, unless you use timestamp WITH time zone, you might not be able > to do that at all. There are very few places where timestamp > WITHOUT time zone actually makes sense. > >> There is no elegant (that I know of) syntax for this > > How about this?: > > select distinct on (id_key) * from data order by id_key, time_stamp; > >> select >> a.id_key, a.time_stamp, a.value >> from >> data a >> where >> a.time_stamp= >> (select max(time_stamp) >> from data b >> where a.id_key=b.id_key) > > Rather than the above, I typically find this much faster: > > select > a.id_key, a.time_stamp, a.value > from > data a > where not exists > (select * from data b > where b.id_key=a.id_key and b.time_stamp > a.time_stamp) hm. not only is it faster, but much more flexible...that's definitely the way to go. merlin
Thanks to all .... I had a tickling feeling at the back of my mind that there was a neater answer here. For the record, times (all from in-memory cached data, averaged over a bunch of runs):
Dependent subquery = 117.9 seconds
Join to temp table = 2.7 sec
DISTINCT ON = 2.7 sec
So the DISTINCT ON may not be quicker, but it sure is tidier.
Cheers
Dave
Dependent subquery = 117.9 seconds
Join to temp table = 2.7 sec
DISTINCT ON = 2.7 sec
So the DISTINCT ON may not be quicker, but it sure is tidier.
Cheers
Dave
On Thu, Feb 24, 2011 at 2:24 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
Michael Glaesemann <grzm@seespotcode.net> wrote:Dang! I forgot the DESC in my post! Thanks for showing the
> SELECT DISTINCT ON (data.id_key)
> data.id_key, data.time_stamp, data.value
> FROM data
> ORDER BY data.id_key, data.time_stamp DESC;
*correct* version.
-Kevin
On 2/24/11 3:38 PM, Dave Crooke wrote: > Thanks to all .... I had a tickling feeling at the back of my mind that > there was a neater answer here. For the record, times (all from > in-memory cached data, averaged over a bunch of runs): > > Dependent subquery = 117.9 seconds > Join to temp table = 2.7 sec > DISTINCT ON = 2.7 sec But wait, there's more! You haven't tested the Windowing Function solution. I'll bet it's even faster. SELECT id_key, time_stamp, value FROM ( SELECT id_key, time_stamp, value, row_number() OVER ( PARTITION BY id_key ORDER BY time_stamp DESC) as ranking FROM thetable ) as filtered_table WHERE ranking = 1 -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On 02/24/2011 06:20 PM, Josh Berkus wrote: > SELECT id_key, time_stamp, value > FROM ( > SELECT id_key, time_stamp, value, > row_number() > OVER ( PARTITION BY id_key > ORDER BY time_stamp DESC) > as ranking > FROM thetable > ) as filtered_table > WHERE ranking = 1 Why did you use row_number instead of rank? I am now curious how the speed compares though. I still think the DISTINCT ON will be faster, but it would be a great surprise. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email
> Why did you use row_number instead of rank? Because I assumed he only wanted one row in the event of ties. Hmmm, although with that schema, there won't be ties. So it's pretty much arbitrary then. > I am now curious how the speed compares though. I still think the > DISTINCT ON will be faster, but it would be a great surprise. Hopefully we'll find out! The windowing functions are usually much faster for me. I think in 9.0 or 9.1 someone replumbed DISTINCT ON to use a bunch of the window function internals, at which point it'll cease to matter. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Thu, Feb 24, 2011 at 4:38 PM, Dave Crooke <dcrooke@gmail.com> wrote:
I'm using 8.3.3 and I have a similar sort of setup and just thought I'd add another point of reference, here's the timing from doing the same sort of queries on my dataset of ~700,000 records with ~10,000 unique "id_key"s.
I also added a 4th version that uses a permanent table that's auto-populated by a trigger with the rid of the most recent entry from the main table, so it's a simple join to get the latest entries.
Dependent subquery = (killed it after it ran for over 10 minutes)
Join on temp table = 1.5 seconds
DISTINCT ON = 2.9 seconds
Join on auto-populated table = 0.8 seconds
Dave
Thanks to all .... I had a tickling feeling at the back of my mind that there was a neater answer here. For the record, times (all from in-memory cached data, averaged over a bunch of runs):
Dependent subquery = 117.9 seconds
Join to temp table = 2.7 sec
DISTINCT ON = 2.7 sec
So the DISTINCT ON may not be quicker, but it sure is tidier.
Cheers
Dave
I'm using 8.3.3 and I have a similar sort of setup and just thought I'd add another point of reference, here's the timing from doing the same sort of queries on my dataset of ~700,000 records with ~10,000 unique "id_key"s.
I also added a 4th version that uses a permanent table that's auto-populated by a trigger with the rid of the most recent entry from the main table, so it's a simple join to get the latest entries.
Dependent subquery = (killed it after it ran for over 10 minutes)
Join on temp table = 1.5 seconds
DISTINCT ON = 2.9 seconds
Join on auto-populated table = 0.8 seconds
Dave
Hi Dave
Yes, 100% the best solution .... I did the same thing a while back, I just have a separate copy of the data in a "latest" table and the Java code just runs a second SQL statement to update it when writing a new record (I've never been a trigger fan).
I found myself looking at the "find the latest" query again though in the process of building a "demo mode" into our application, which will replay a finite set of data on a rolling loop by moving it forward in time, and also has to simulate the continuous updating of the "latest" table so the the business logic will be appropriately fooled.
My next tweak will be to cache the "latest" table in the Java layer ;-)
Cheers
Dave
Yes, 100% the best solution .... I did the same thing a while back, I just have a separate copy of the data in a "latest" table and the Java code just runs a second SQL statement to update it when writing a new record (I've never been a trigger fan).
I found myself looking at the "find the latest" query again though in the process of building a "demo mode" into our application, which will replay a finite set of data on a rolling loop by moving it forward in time, and also has to simulate the continuous updating of the "latest" table so the the business logic will be appropriately fooled.
My next tweak will be to cache the "latest" table in the Java layer ;-)
Cheers
Dave
On Fri, Feb 25, 2011 at 10:50 AM, Dave Johansen <davejohansen@gmail.com> wrote:
On Thu, Feb 24, 2011 at 4:38 PM, Dave Crooke <dcrooke@gmail.com> wrote:I'm using 8.3.3 and I have a similar sort of setup and just thought I'd add another point of reference, here's the timing from doing the same sort of queries on my dataset of ~700,000 records with ~10,000 unique "id_key"s.Thanks to all .... I had a tickling feeling at the back of my mind that there was a neater answer here. For the record, times (all from in-memory cached data, averaged over a bunch of runs):
Dependent subquery = 117.9 seconds
Join to temp table = 2.7 sec
DISTINCT ON = 2.7 sec
So the DISTINCT ON may not be quicker, but it sure is tidier.
Cheers
Dave
I also added a 4th version that uses a permanent table that's auto-populated by a trigger with the rid of the most recent entry from the main table, so it's a simple join to get the latest entries.
Dependent subquery = (killed it after it ran for over 10 minutes)
Join on temp table = 1.5 seconds
DISTINCT ON = 2.9 seconds
Join on auto-populated table = 0.8 seconds
Dave
On Fri, Feb 25, 2011 at 1:45 PM, Dave Crooke <dcrooke@gmail.com> wrote:
Hi Dave
Yes, 100% the best solution .... I did the same thing a while back, I just have a separate copy of the data in a "latest" table and the Java code just runs a second SQL statement to update it when writing a new record (I've never been a trigger fan).
I found myself looking at the "find the latest" query again though in the process of building a "demo mode" into our application, which will replay a finite set of data on a rolling loop by moving it forward in time, and also has to simulate the continuous updating of the "latest" table so the the business logic will be appropriately fooled.
My next tweak will be to cache the "latest" table in the Java layer ;-)
Cheers
Dave
Our application has what sounds like a similar functionality that we call "playback". The way that we did it was to have a schema called "playback" with identical tables to those that we want to have repopulated. All the other tables exist in only the "public" schema and then we don't have to do any duplication of that data. Then during playback it just runs a query to copy from the "public" table to the "playback" table and the trigger will populate the "latest" table in the "playback" schema automatically just like when the program is running normally and populating the "public" version.
The secret sauce comes in by setting "SET search_path TO playback, public;" because then your application runs all the same queries to get the data and doesn't have to know that anything different is going on other than the copy coperation that it's doing. It's nice because it takes all of the data management burden off of the application and then allows the database to do the hard work for you. It's obviously not the perfect solution but it wasn't too hard to setup and we've really liked the way it works.
Dave
Dave, Why not test the windowing version I posted? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Unfortunately, I'm running 8.3.3 and to my knowledge the windowing stuff wasn't added til 8.4.
Dave
On Feb 26, 2011 2:06 PM, "Josh Berkus" <josh@agliodbs.com> wrote:
> Dave,
>
> Why not test the windowing version I posted?
>
> --
> -- Josh Berkus
> PostgreSQL Experts Inc.
> http://www.pgexperts.com
> Dave,
>
> Why not test the windowing version I posted?
>
> --
> -- Josh Berkus
> PostgreSQL Experts Inc.
> http://www.pgexperts.com
* Kevin Grittner: > Dave Crooke <dcrooke@gmail.com> wrote: > >> create table data >> (id_key int, >> time_stamp timestamp without time zone, >> value double precision); >> >> create unique index data_idx on data (id_key, time_stamp); > >> I need to find the most recent value for each distinct value of >> id_key. > > Well, unless you use timestamp WITH time zone, you might not be able > to do that at all. There are very few places where timestamp > WITHOUT time zone actually makes sense. I don't think PostgreSQL keeps track of actual time zone values, just as it doesn't keep track of the character encoding of TEXT columns. Unless suppressed with WITHOUT TIME ZONE, PostgreSQL makes up some time zone on demand. This makes TIMESTAMP WITH TIME ZONE not that useful, and it's often to use TIMESTAMP WITHOUT TIME ZONE with times in UTC.
On Sat, Feb 26, 2011 at 2:38 PM, Dave Johansen <davejohansen@gmail.com> wrote: > > Unfortunately, I'm running 8.3.3 and to my knowledge the windowing stuff > wasn't added til 8.4. > Dave > > On Feb 26, 2011 2:06 PM, "Josh Berkus" <josh@agliodbs.com> wrote: > > Dave, > > > > Why not test the windowing version I posted? We finally have moved over to 8.4 and so I just wanted to post the time comparison numbers to show the times on 8.4 as well. This is also a newer data set with ~700k rows and ~4k distinct id_key values. 1) Dependent subquery SELECT a.id_key, a.time_stamp, a.value FROM data AS a WHERE a.time_stamp = (SELECT MAX(time_stamp) FROM data AS b WHERE a.id_key = b.id_key); 8.3.3: Killed it after a few minutes 8.4.13: Killed it after a few minutes 2) Join against temporary table SELECT a.id_key, a.time_stamp, a.value FROM data AS a JOIN (SELECT id_key, MAX(time_stamp) AS max_time_stamp FROM data GROUP BY id_key) AS b WHERE a.id_key = b.id_key AND a.time_stamp = b.max_time_stamp; 8.3.3: 1.4 s 8.4.13: 0.5 s 3) DISTINCT ON: SELECT DISTINCT ON (id_key) id_key, time_stamp, value FROM data ORDER BY id_key, time_stamp DESC; Without Index: 8.3.3: 34.1 s 8.4.13: 98.7 s With Index (data(id_key, time_stamp DESC)): 8.3.3: 3.4 s 8.4.13: 1.3 s 4) Auto-populated table SELECT id_key, time_stamp, value FROM data WHERE rid IN (SELECT rid FROM latestdata); 8.3.3: 0.2 s 8.4.13: 0.06 s 5) Windowing SELECT id_key, time_stamp, value FROM (SELECT id_key, time_stamp, value, row_number() OVER (PARTITION BY id_key ORDER BY time_stamp DESC) AS ranking FROM data) AS a WHERE ranking=1; 8.3.3: N/A 8.4.13: 1.6 s So the auto-populated table (#4) is the fastest by an order of magnitude, but the join against the temporary table (#2) is the next best option based on speed and doesn't require the extra multi-column index that DISTINCT ON (#3) does. On a related note though, is there a way to make the multi-column index used in the DISTINCT ON more efficient. Based on the results, it appears that the multi-column index is actually a single index with the ordering of the tree based on the first value and then the second value. Is there a way to make it be a "multi-level index"? What I mean is that the first value is basically a tree/hash that then points to the second index because if that's possible then that would probably make the DISTINCT ON (#3) version as fast or faster than the auto-populated table (#4). Is there a way to create an index like that in postgres? Thanks, Dave
On Fri, Apr 5, 2013 at 11:54 AM, Dave Johansen <davejohansen@gmail.com> wrote: > On Sat, Feb 26, 2011 at 2:38 PM, Dave Johansen <davejohansen@gmail.com> > wrote: >> >> Unfortunately, I'm running 8.3.3 and to my knowledge the windowing stuff >> wasn't added til 8.4. >> Dave >> >> On Feb 26, 2011 2:06 PM, "Josh Berkus" <josh@agliodbs.com> wrote: >> > Dave, >> > >> > Why not test the windowing version I posted? > > We finally have moved over to 8.4 and so I just wanted to post the > time comparison numbers to show the times on 8.4 as well. This is also > a newer data set with ~700k rows and ~4k distinct id_key values. > > 1) Dependent subquery > SELECT a.id_key, a.time_stamp, a.value FROM data AS a WHERE > a.time_stamp = (SELECT MAX(time_stamp) FROM data AS b WHERE a.id_key = > b.id_key); > 8.3.3: Killed it after a few minutes > 8.4.13: Killed it after a few minutes > > 2) Join against temporary table > SELECT a.id_key, a.time_stamp, a.value FROM data AS a JOIN (SELECT > id_key, MAX(time_stamp) AS max_time_stamp FROM data GROUP BY id_key) > AS b WHERE a.id_key = b.id_key AND a.time_stamp = b.max_time_stamp; > 8.3.3: 1.4 s > 8.4.13: 0.5 s > > 3) DISTINCT ON: > SELECT DISTINCT ON (id_key) id_key, time_stamp, value FROM data ORDER > BY id_key, time_stamp DESC; > Without Index: > 8.3.3: 34.1 s > 8.4.13: 98.7 s > With Index (data(id_key, time_stamp DESC)): > 8.3.3: 3.4 s > 8.4.13: 1.3 s > > 4) Auto-populated table > SELECT id_key, time_stamp, value FROM data WHERE rid IN (SELECT rid > FROM latestdata); > 8.3.3: 0.2 s > 8.4.13: 0.06 s > > 5) Windowing > SELECT id_key, time_stamp, value FROM (SELECT id_key, time_stamp, > value, row_number() OVER (PARTITION BY id_key ORDER BY time_stamp > DESC) AS ranking FROM data) AS a WHERE ranking=1; > 8.3.3: N/A > 8.4.13: 1.6 s I would also test: *) EXISTS() SELECT a.id_key, a.time_stamp, a.value FROM data WHERE NOT EXISTS ( SELECT 1 FROM data b WHERE a.id_key = b.id_key and b.time_stamp > a.time_stamp ); *) custom aggregate (this will not be the fastest option but is a good technique to know -- it can be a real life saver when selection criteria is complex) CREATE FUNCTION agg_latest_data(data, data) returns data AS $$ SELECT CASE WHEN $1 > $2 THEN $1 ELSE $2 END; $$ LANGUAGE SQL IMMUTABLE; CREATE AGGREGATE latest_data ( SFUNC=agg_latest_data, STYPE=data ); SELECT latest_data(d) FROM data d group by d.id_key; the above returns the composite, not the fields, but that can be worked around. merlin
On Fri, Apr 5, 2013 at 11:40 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > > On Fri, Apr 5, 2013 at 11:54 AM, Dave Johansen <davejohansen@gmail.com> wrote: > > On Sat, Feb 26, 2011 at 2:38 PM, Dave Johansen <davejohansen@gmail.com> > > wrote: > >> > >> Unfortunately, I'm running 8.3.3 and to my knowledge the windowing stuff > >> wasn't added til 8.4. > >> Dave > >> > >> On Feb 26, 2011 2:06 PM, "Josh Berkus" <josh@agliodbs.com> wrote: > >> > Dave, > >> > > >> > Why not test the windowing version I posted? > > > > We finally have moved over to 8.4 and so I just wanted to post the > > time comparison numbers to show the times on 8.4 as well. This is also > > a newer data set with ~700k rows and ~4k distinct id_key values. > > > > 1) Dependent subquery > > SELECT a.id_key, a.time_stamp, a.value FROM data AS a WHERE > > a.time_stamp = (SELECT MAX(time_stamp) FROM data AS b WHERE a.id_key = > > b.id_key); > > 8.3.3: Killed it after a few minutes > > 8.4.13: Killed it after a few minutes > > > > 2) Join against temporary table > > SELECT a.id_key, a.time_stamp, a.value FROM data AS a JOIN (SELECT > > id_key, MAX(time_stamp) AS max_time_stamp FROM data GROUP BY id_key) > > AS b WHERE a.id_key = b.id_key AND a.time_stamp = b.max_time_stamp; > > 8.3.3: 1.4 s > > 8.4.13: 0.5 s > > > > 3) DISTINCT ON: > > SELECT DISTINCT ON (id_key) id_key, time_stamp, value FROM data ORDER > > BY id_key, time_stamp DESC; > > Without Index: > > 8.3.3: 34.1 s > > 8.4.13: 98.7 s > > With Index (data(id_key, time_stamp DESC)): > > 8.3.3: 3.4 s > > 8.4.13: 1.3 s > > > > 4) Auto-populated table > > SELECT id_key, time_stamp, value FROM data WHERE rid IN (SELECT rid > > FROM latestdata); > > 8.3.3: 0.2 s > > 8.4.13: 0.06 s > > > > 5) Windowing > > SELECT id_key, time_stamp, value FROM (SELECT id_key, time_stamp, > > value, row_number() OVER (PARTITION BY id_key ORDER BY time_stamp > > DESC) AS ranking FROM data) AS a WHERE ranking=1; > > 8.3.3: N/A > > 8.4.13: 1.6 s > > I would also test: > > *) EXISTS() > > SELECT a.id_key, a.time_stamp, a.value FROM data > WHERE NOT EXISTS > ( > SELECT 1 FROM data b > WHERE > a.id_key = b.id_key > and b.time_stamp > a.time_stamp > ); I tried this and it was slow: 8.3.3: 674.4 s 8.4.13: 40.4 s > > *) custom aggregate (this will not be the fastest option but is a good > technique to know -- it can be a real life saver when selection > criteria is complex) > > CREATE FUNCTION agg_latest_data(data, data) returns data AS > $$ > SELECT CASE WHEN $1 > $2 THEN $1 ELSE $2 END; > $$ LANGUAGE SQL IMMUTABLE; > > CREATE AGGREGATE latest_data ( > SFUNC=agg_latest_data, > STYPE=data > ); > > SELECT latest_data(d) FROM data d group by d.id_key; > > the above returns the composite, not the fields, but that can be worked around. My real table actually returns/needs all the values from the row so I didn't feel like messing with aggregate stuff. Thanks, Dave