Обсуждение: Why those queries do not utilize indexes?

Поиск
Список
Период
Сортировка

Why those queries do not utilize indexes?

От
Artimenko Igor
Дата:
Hi everybody!

Here is my queries:

1. explain SELECT * FROM messageinfo WHERE user_id = CAST( 20000 AS BIGINT ) and msgstatus = CAST(
0 AS smallint );

2. explain SELECT * FROM messageinfo WHERE messageinfo.user_id = 20000::int8 and msgstatus =
0::smallint;

In both cases Explain command shows:
1. Sequential search and very high cost if set enable_seqscan to on;
Seq scan on messageinfo ( cost=0.00..24371.30, rows =36802 )

2. Index scan but even bigger cost if set enable_seqscan to off;
Index �messagesStatus� on messageinfo ( Cost=0.00..27220.72, rows=36802 )

messageinfo table has 200 records which meet this criteria and 662420 in total:

CREATE TABLE messageinfo
(
  user_id int8 NOT NULL,
  msgstatus int2 NOT NULL DEFAULT (0)::smallint,
  receivedtime timestamp NOT NULL DEFAULT now(),
  �
  msgread bool DEFAULT false,
  �
  CONSTRAINT "$1" FOREIGN KEY (user_id) REFERENCES users (id) ON UPDATE CASCADE ON DELETE CASCADE,
  )
WITH OIDS;

CREATE INDEX msgstatus
  ON messageinfo
  USING btree
  (user_id, msgstatus);

CREATE INDEX "messagesStatus"
  ON messageinfo
  USING btree
  (msgstatus);

CREATE INDEX msgread
  ON messageinfo
  USING btree
  (user_id, msgread);

CREATE INDEX "receivedTime"
  ON messageinfo
  USING btree
  (receivedtime);


MY QUESTIONS ARE:

1.    Should I afraid of high cost indexes? Or query will still be very efficient?

2.    Postgres does not use the index I need. For my data sets it�s always msgstatus index is
narrowest compare with �messagesStatus� one. Is any way to �enforce� to use a particular index?
What�s the logic when Postgres chooses one index compare with the other.

3.    I can change db structure to utilize Postgres specifics if you can tell them to me.

4.    Also, originally I had �messagesStatus� index having 2 components ( �msgstatus�, �user_id� ).
But query SELECT * FROM messageinfo WHERE msgstatus = 0 did not utilize indexes in this case. It
only worked if both index components are in WHERE part. So I have to remove 2-nd component
�user_id� from messagesStatus index even I wanted it. Is any way that where clause has only 1-st
component but index is utilized?

Igor Artimenko



__________________________________
Do you Yahoo!?
Yahoo! Mail Address AutoComplete - You start. We finish.
http://promotions.yahoo.com/new_mail

Re: Why those queries do not utilize indexes?

От
Dennis Bjorklund
Дата:
On Fri, 27 Aug 2004, Artimenko Igor wrote:

> 1. Sequential search and very high cost if set enable_seqscan to on;
> Seq scan on messageinfo ( cost=0.00..24371.30, rows =36802 )
>
> 2. Index scan but even bigger cost if set enable_seqscan to off;
> Index “messagesStatus” on messageinfo ( Cost=0.00..27220.72, rows=36802 )

So pg thinks that a sequential scan will be a little bit faster (The cost
is a little bit smaller). If you compare the actual runtimes maybe you
will see that pg was right. In this case the cost is almost the same so
the runtime is probably almost the same.

When you have more data pg will start to use the index since then it will
be faster to use an index compared to a seq. scan.

--
/Dennis Björklund


Re: Why those queries do not utilize indexes?

От
Artimenko Igor
Дата:
I could force Postgres to use the best index by removing condition "msgstatus = CAST( 0 AS
smallint );" from WHERE clause & set enable_seqscan to off;
Total runtime in this case dropped from 1883 ms ( sequential reads ) to 1.598 ms ( best index ).

But unfortunatelly It does not resolve my problem. I can not remove above condition. I need to
find a way to use whole condition "WHERE user_id = CAST( 20000 AS BIGINT ) and msgstatus = CAST( 0
AS smallint );" and still utilyze index.

Yes you are right. Using "messagesStatus" index is even worse for my data set then sequential
scan.

Igor Artimenko

--- Dennis Bjorklund <db@zigo.dhs.org> wrote:

> On Fri, 27 Aug 2004, Artimenko Igor wrote:
>
> > 1. Sequential search and very high cost if set enable_seqscan to on;
> > Seq scan on messageinfo ( cost=0.00..24371.30, rows =36802 )
> >
> > 2. Index scan but even bigger cost if set enable_seqscan to off;
> > Index �messagesStatus� on messageinfo ( Cost=0.00..27220.72, rows=36802 )
>
> So pg thinks that a sequential scan will be a little bit faster (The cost
> is a little bit smaller). If you compare the actual runtimes maybe you
> will see that pg was right. In this case the cost is almost the same so
> the runtime is probably almost the same.
>
> When you have more data pg will start to use the index since then it will
> be faster to use an index compared to a seq. scan.
>
> --
> /Dennis Bj�rklund
>
>




_______________________________
Do you Yahoo!?
Win 1 of 4,000 free domain names from Yahoo! Enter now.
http://promotions.yahoo.com/goldrush

Re: Why those queries do not utilize indexes?

От
Christopher Kings-Lynne
Дата:
First things first:  try vacuum full analyze on all the tables involved.

> 1.    Should I afraid of high cost indexes? Or query will still be very efficient?

Not necessarily.  However, EXPLAIN output is pretty much useless for us
for helping you.  You need to post EXPLAIN ANALYZE output.

Then, you need to use explain analyze to check the speed difference
between the index and seq scan versions.  Is the seq scan actually slower?

> 2.    Postgres does not use the index I need. For my data sets it’s always msgstatus index is
> narrowest compare with ‘messagesStatus’ one. Is any way to “enforce” to use a particular index?
> What’s the logic when Postgres chooses one index compare with the other.

It's complicated, but it's based on teh statistics in pg_statistic that
the vacuum analyze command gathers.

> 3.    I can change db structure to utilize Postgres specifics if you can tell them to me.

I avoid using int8 and int2 in the first place :)  In PostgreSQL 8.0,
they will be less troublesome, but I've never seen a need for them!

> 4.    Also, originally I had “messagesStatus” index having 2 components ( “msgstatus”, “user_id” ).
> But query SELECT * FROM messageinfo WHERE msgstatus = 0 did not utilize indexes in this case. It
> only worked if both index components are in WHERE part. So I have to remove 2-nd component
> “user_id” from messagesStatus index even I wanted it. Is any way that where clause has only 1-st
> component but index is utilized?

So long as your where clause matches a subset of the columns in the
index in left to right order, the index can be used.  For example, if
your index is over (a, b, c) then select * where a=1 and b=2; can use
the index.

Chris