Обсуждение: Grabbing Newest Records From Duplicates
Given the following test table, I want to grab only the newest record and disregard any older duplicates based on name. Isthis the most efficient way to do it? Will the indicies even make a difference? The table below demonstrates a simple proofof concept. My final table will have millions of records; however, the test I'm doing now does not seem to use the indicies(probably because there are too few records?). <br /><br />optin=# \d test<br /> Table "public.test"<br /> Column| Type | Modifiers<br />--------+---------+-----------<br /> name | text |<br /> time | date |<br /> id | integer |<br />Indexes:<br /> "idx_name" btree (name) <br /> "idx_time" btree ("time")<br /><br />optin=#explain select * from test t1 where not exists (select 1 from test t2 where <a href="http://t2.name">t2.name</a>= <a href="http://t1.name">t1.name</a> and t2.time > t1.time);<br /><br />Thanks,<br/>Travis<br /><br />
"Travis Whitton" <tinymountain@gmail.com> writes: > Given the following test table, I want to grab only the newest record and > disregard any older duplicates based on name. Is this the most efficient way > to do it? No, it's gonna be pretty awful. The best way I know of involves DISTINCT ON (see the "weather reports" example in the SELECT reference page). Unfortunately that's a Postgres-only construct. If you want something portable then you'll need something messy with subqueries... regards, tom lane
One last question, using the "weather report" example, is it going to improve performance for the DISTINCT ON query to add an index to the location and time columns?
Thanks a lot,
Travis
Thanks a lot,
Travis
On 3/16/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Travis Whitton" <tinymountain@gmail.com> writes:
> Given the following test table, I want to grab only the newest record and
> disregard any older duplicates based on name. Is this the most efficient way
> to do it?
No, it's gonna be pretty awful. The best way I know of involves
DISTINCT ON (see the "weather reports" example in the SELECT reference
page). Unfortunately that's a Postgres-only construct. If you want
something portable then you'll need something messy with subqueries...
regards, tom lane
"Travis Whitton" <tinymountain@gmail.com> writes: > One last question, using the "weather report" example, is it going to > improve performance for the DISTINCT ON query to add an index to the > location and time columns? A two-column index matching the ORDER BY that you use might help, or then again might not --- for full-table scans it's often faster to sort than to try to use an index. regards, tom lane