Обсуждение: Index on a function and SELECT DISTINCT
If I have this table, function and index in Postgres 7.3.6 ...
"""
CREATE TABLE news_stories (
id serial primary key NOT NULL,
pub_date timestamp with time zone NOT NULL,
...
)
CREATE OR REPLACE FUNCTION get_year_trunc(timestamp with time zone) returns
timestamp with time zone AS 'SELECT date_trunc(\'year\',$1);' LANGUAGE 'SQL'
IMMUTABLE;
CREATE INDEX news_stories_pub_date_year_trunc ON
news_stories( get_year_trunc(pub_date) );
"""
...why does this query not use the index?
db=# EXPLAIN SELECT DISTINCT get_year_trunc(pub_date) FROM news_stories;
QUERY PLAN
---------------------------------------------------------------------------------
Unique (cost=59597.31..61311.13 rows=3768 width=8)
-> Sort (cost=59597.31..60454.22 rows=342764 width=8)
Sort Key: date_trunc('year'::text, pub_date)
-> Seq Scan on news_stories (cost=0.00..23390.55 rows=342764
width=8)
(4 rows)
The query is noticably slow (2 seconds) on a database with 150,000+ records.
How can I speed it up?
Thanks,
Adrian
On Fri, 14 Jan 2005 12:32:12 -0600
Adrian Holovaty <postgresql@holovaty.com> wrote:
> If I have this table, function and index in Postgres 7.3.6 ...
>
> """
> CREATE TABLE news_stories (
> id serial primary key NOT NULL,
> pub_date timestamp with time zone NOT NULL,
> ...
> )
> CREATE OR REPLACE FUNCTION get_year_trunc(timestamp with time zone)
> returns timestamp with time zone AS 'SELECT date_trunc(\'year\',$1);'
> LANGUAGE 'SQL' IMMUTABLE;
> CREATE INDEX news_stories_pub_date_year_trunc ON
> news_stories( get_year_trunc(pub_date) );
> """
>
> ...why does this query not use the index?
>
> db=# EXPLAIN SELECT DISTINCT get_year_trunc(pub_date) FROM
> news_stories;
> QUERY PLAN
> ---------------------------------------------------------------------
> ------------
> Unique (cost=59597.31..61311.13 rows=3768 width=8)
> -> Sort (cost=59597.31..60454.22 rows=342764 width=8)
> Sort Key: date_trunc('year'::text, pub_date)
> -> Seq Scan on news_stories (cost=0.00..23390.55
> rows=342764
> width=8)
> (4 rows)
>
> The query is noticably slow (2 seconds) on a database with 150,000+
> records. How can I speed it up?
It's doing a sequence scan because you're not limiting the query in
the FROM clause. No point in using an index when you're asking for
the entire table. :)
---------------------------------
Frank Wiles <frank@wiles.org>
http://www.wiles.org
---------------------------------
Frank Wiles wrote:
> Adrian Holovaty <postgresql@holovaty.com> wrote:
> > If I have this table, function and index in Postgres 7.3.6 ...
> >
> > """
> > CREATE TABLE news_stories (
> > id serial primary key NOT NULL,
> > pub_date timestamp with time zone NOT NULL,
> > ...
> > )
> > CREATE OR REPLACE FUNCTION get_year_trunc(timestamp with time zone)
> > returns timestamp with time zone AS 'SELECT date_trunc(\'year\',$1);'
> > LANGUAGE 'SQL' IMMUTABLE;
> > CREATE INDEX news_stories_pub_date_year_trunc ON
> > news_stories( get_year_trunc(pub_date) );
> > """
> >
> > ...why does this query not use the index?
> >
> > db=# EXPLAIN SELECT DISTINCT get_year_trunc(pub_date) FROM
> > news_stories;
> > QUERY PLAN
> > ---------------------------------------------------------------------
> > ------------
> > Unique (cost=59597.31..61311.13 rows=3768 width=8)
> > -> Sort (cost=59597.31..60454.22 rows=342764 width=8)
> > Sort Key: date_trunc('year'::text, pub_date)
> > -> Seq Scan on news_stories (cost=0.00..23390.55
> > rows=342764
> > width=8)
> > (4 rows)
> >
> > The query is noticably slow (2 seconds) on a database with 150,000+
> > records. How can I speed it up?
>
> It's doing a sequence scan because you're not limiting the query in
> the FROM clause. No point in using an index when you're asking for
> the entire table. :)
Ah, that makes sense. So is there a way to optimize SELECT DISTINCT queries
that have no WHERE clause?
Adrian
Try :
EXPLAIN SELECT get_year_trunc(pub_date) as foo FROM ... GROUP BY foo
Apart from that, you could use a materialized view...
>> > db=# EXPLAIN SELECT DISTINCT get_year_trunc(pub_date) FROM
> Ah, that makes sense. So is there a way to optimize SELECT DISTINCT
> queries
> that have no WHERE clause?
>
> Adrian
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>