Обсуждение: How to use indexes for GROUP BY
Hi folks, I have a table like so: create table tagRecord ( uid varchar(60) primary key, [bunch of other fields] location varchar(32), creationTS timestamp ); create index idx_tagdata_loc_creationTS on tagRecord(location, creationTS); The number of individual values in location is small (e.g. 2). I want to simply get the latest "creationTS" for each location, but that seems to result in a full table scan: tts_server_db=# explain analyze select location, max(creationTS) from tagrecord group by location; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=5330.53..5330.55 rows=2 width=18) (actual time=286.161..286.165 rows=3 loops=1) -> Seq Scan on tagrecord (cost=0.00..4771.35 rows=111835 width=18) (actual time=0.059..119.828 rows=111739 loops=1) Total runtime: 286.222 ms Now I have the idx_tagdata_loc_creationTS, and it seemed to me that it should be able to use it to quickly figure out the max creationTS for each location. Any way I can make this more efficient? BTW, I am using postgresql-server-8.1.22-1.el5_5.1 -- Dimi Paun <dimi@lattica.com> Lattica, Inc.
On 01/24/2011 12:29 PM, Dimi Paun wrote: > I want to simply get the latest "creationTS" for each location, > but that seems to result in a full table scan: > > tts_server_db=# explain analyze select location, max(creationTS) from > tagrecord group by location; Try this, it *might* work: select DISTINCT ON (location) location, creationTS from tagrecord order by location, creationTS DESC; Secondly... Postgresql 8.1? Really? If at all possible, upgrade. There is a lot you're missing from the last six years of PostgreSQL releases. For instance, your MAX means a reverse index scan for each location, which is far more expensive than an ordered index scan, so the planner may be ignoring it, if the planner in 8.1 is even that intelligent. If you were running 8.3, for instance, your index could be: create index idx_tagdata_loc_creationTS on tagRecord(location, creationTS DESC); And then suddenly it just has to use the first match for that index for each location. Older PG versions are... flaky when it comes to optimization. I'm not sure if 8.1 used MAX as an internal construct or treated it like a function. If it's the latter, it has to read every value to find out which is the "max", which is why using ORDER BY *may* fix your problem. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email
On Mon, Jan 24, 2011 at 11:29 AM, Dimi Paun <dimi@lattica.com> wrote: Two very quick points: > tts_server_db=# explain analyze select location, max(creationTS) from tagrecord group by location; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------- > HashAggregate (cost=5330.53..5330.55 rows=2 width=18) (actual time=286.161..286.165 rows=3 loops=1) > -> Seq Scan on tagrecord (cost=0.00..4771.35 rows=111835 width=18) (actual time=0.059..119.828 rows=111739 loops=1) > Total runtime: 286.222 ms Most of your run time is the hashaggregate running, not the seq scan > BTW, I am using postgresql-server-8.1.22-1.el5_5.1 As another poster observed, you're running an ancient version of pgsql from a performance perspective. Upgrading to 8.4 or 9.0 would make a huge difference in overall performance, not just with one or two queries.
On Mon, Jan 24, 2011 at 01:29:01PM -0500, Dimi Paun wrote: > Hi folks, > > I have a table like so: > > create table tagRecord ( > uid varchar(60) primary key, > [bunch of other fields] > location varchar(32), > creationTS timestamp > ); > create index idx_tagdata_loc_creationTS on tagRecord(location, creationTS); > > The number of individual values in location is small (e.g. 2). > > I want to simply get the latest "creationTS" for each location, > but that seems to result in a full table scan: > > tts_server_db=# explain analyze select location, max(creationTS) from tagrecord group by location; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------- > HashAggregate (cost=5330.53..5330.55 rows=2 width=18) (actual time=286.161..286.165 rows=3 loops=1) > -> Seq Scan on tagrecord (cost=0.00..4771.35 rows=111835 width=18) (actual time=0.059..119.828 rows=111739 loops=1) > Total runtime: 286.222 ms you can use technique described in here: http://www.depesz.com/index.php/2009/07/10/getting-list-of-unique-elements/ Best regards, depesz
On Mon, 2011-01-24 at 12:33 -0700, Scott Marlowe wrote: > As another poster observed, you're running an ancient version of pgsql > from a performance perspective. Upgrading to 8.4 or 9.0 would make a > huge difference in overall performance, not just with one or two > queries. Thanks for the trips. I'll try to first upgrade, and I'll report back if that doesn't help :) -- Dimi Paun <dimi@lattica.com> Lattica, Inc.