Обсуждение: can't win
I have a query that it seems is destined to be slow one way or another. I
have a table of around 30k articles, categorized by topic and ordered by date:
create table articles (
topic varchar(50),
created date,
data text
);
create index articles_topic_idx on articles(topic);
create index articles_created_idx on articles(created);
If I want to get the 5 most recent articles in a topic, I get a nice query
plan and a fast query:
# explain select * from articles where topic = 'Example' order by created desc
limit 5 ;
QUERY PLAN
-------------------------------------------------------------------------------
-----------------------------
Limit (cost=0.00..646.71 rows=5 width=828)
-> Index Scan Backward using articles_created_idx on articles
(cost=0.00..85202.16 rows=659 width=828)
Filter: (topic = 'Example'::character varying)
(3 rows)
# select * from articles where topic = 'Example' order by created desc limit 5
;
[.....]
Time: 18.42 ms
However, if the topic happens to not exist, this query takes a very long time:
# select * from articles where topic = 'NO-Example' order by created desc
limit 5 ;
[.....]
Time: 1075.36 ms
If I drop the date index or get more articles (so it doesn't do the backward
scan on articles_created_idx), then the situation is reversed: getting the
most recent articles for a topic that exists takes a fair amount of time,
while getting a topic that does not exist is nearly instantaneous.
Is there any way I can get the best of both worlds?
-J
It might be worthwhile to experiment with 2 new indexes: Create UNIQUE index articles_created_topic_idx on articles(created, topic); Create UNIQUE index articles_topic_created_idx on articles(topic, created); Probably, one of the two should become your primary key. That will give the optimizer some new choices for plans. > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Jeff Rogers > Sent: Friday, June 18, 2004 12:28 PM > To: pgsql-general@postgresql.org > Subject: [GENERAL] can't win > > > > I have a query that it seems is destined to be slow one way > or another. I > have a table of around 30k articles, categorized by topic and > ordered by date: > > create table articles ( > topic varchar(50), > created date, > data text > ); > > create index articles_topic_idx on articles(topic); > create index articles_created_idx on articles(created); > > If I want to get the 5 most recent articles in a topic, I get > a nice query > plan and a fast query: > # explain select * from articles where topic = 'Example' > order by created desc > limit 5 ; > QUERY PLAN > > > -------------------------------------------------------------- > ----------------- > ----------------------------- > Limit (cost=0.00..646.71 rows=5 width=828) > -> Index Scan Backward using articles_created_idx on articles > (cost=0.00..85202.16 rows=659 width=828) > Filter: (topic = 'Example'::character varying) > (3 rows) > > # select * from articles where topic = 'Example' order by > created desc limit 5 > ; > [.....] > Time: 18.42 ms > > However, if the topic happens to not exist, this query takes > a very long time: # select * from articles where topic = > 'NO-Example' order by created desc > limit 5 ; > [.....] > Time: 1075.36 ms > > If I drop the date index or get more articles (so it doesn't > do the backward > scan on articles_created_idx), then the situation is > reversed: getting the > most recent articles for a topic that exists takes a fair > amount of time, > while getting a topic that does not exist is nearly instantaneous. > > Is there any way I can get the best of both worlds? > > -J > > > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so > that your > message can get through to the mailing list cleanly >
On Fri, 18 Jun 2004, Jeff Rogers wrote: > create table articles ( > topic varchar(50), > created date, > data text > ); > > create index articles_topic_idx on articles(topic); > create index articles_created_idx on articles(created); > > If I want to get the 5 most recent articles in a topic, I get a nice query > plan and a fast query: > # explain select * from articles where topic = 'Example' order by created desc > limit 5 ; > QUERY PLAN > > ------------------------------------------------------------------------------- > ----------------------------- > Limit (cost=0.00..646.71 rows=5 width=828) > -> Index Scan Backward using articles_created_idx on articles > (cost=0.00..85202.16 rows=659 width=828) > Filter: (topic = 'Example'::character varying) > (3 rows) > > # select * from articles where topic = 'Example' order by created desc limit 5 > ; > [.....] > Time: 18.42 ms > > However, if the topic happens to not exist, this query takes a very long time: > # select * from articles where topic = 'NO-Example' order by created desc > limit 5 ; > [.....] > Time: 1075.36 ms > > If I drop the date index or get more articles (so it doesn't do the backward > scan on articles_created_idx), then the situation is reversed: getting the > most recent articles for a topic that exists takes a fair amount of time, > while getting a topic that does not exist is nearly instantaneous. > > Is there any way I can get the best of both worlds? Hmm, how about a multiple column index on (topic, created)? You might need to use select * from articles where topic='Example' order by topic desc, created desc limit 5; to get it to use the index though.
>> Is there any way I can get the best of both worlds? > > Hmm, how about a multiple column index on (topic, created)? You might > need to use > select * from articles where topic='Example' order by topic desc, created > desc limit 5; > to get it to use the index though. That works beautifully, thanks! -J