Re: [HACKERS] Costs: Index vs Non-Index

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] Costs: Index vs Non-Index
Дата
Msg-id 6112.947348116@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Costs: Index vs Non-Index  (The Hermit Hacker <scrappy@hub.org>)
Ответы Re: [HACKERS] Costs: Index vs Non-Index  (The Hermit Hacker <scrappy@hub.org>)
Список pgsql-hackers
The Hermit Hacker <scrappy@hub.org> writes:
> Hrmmm...if I'm reading this right, its more costly to create an index then
> to leave it as a sequential scan, but it returns more rows?  Yet, it
> returns, if I do the query with a count() around the return value, 288
> rows, not 334 or 1154...

This doesn't have anything to do with index vs sequential scan, but it
does have to do with whether you've done a VACUUM ANALYZE lately.
You haven't ;-)

> udmsearch=> explain select next_index_time from url where next_index_time <= 947317073;
> NOTICE:  QUERY PLAN:
> Seq Scan on url  (cost=43.00 rows=334 width=4)

IIRC, rows=334 is the default estimate of result rows you will get for
this query in the absence of any information whatever.  (Default table
size guess is 1000 rows, and default selectivity guess for <= is 1/3,
so...)  If you have not vacuumed, it's sheer coincidence that this is
even within hailing distance of the correct figure of 288.

> udmsearch=> create index url_next_index_time on url using btree ( next_index_time);
> CREATE
> udmsearch=> explain select next_index_time from url where next_index_time <= 947317073;
> NOTICE:  QUERY PLAN:
> Index Scan using url_next_index_time on url  (cost=271.68 rows=1154 width=4)

I believe that a side-effect of CREATE INDEX is to update the
number-of-pages-and-rows statistics in pg_class for the target table.
So after you do that, the optimizer has a correct idea of the table's
size, but still no more info about the selectivity of the WHERE clause.
(I infer that your table has size 1154*3 rows.)  If you now drop the
index and repeat EXPLAIN, it'll go back to a seq scan, but it will now
say 1154 rows --- and the cost estimate will be higher, too.

If you do VACUUM ANALYZE, then the optimizer will also know the min and
max values of next_index_time, and will have some shot at making a
correct estimate of the output row count.  I'd be interested to know
what it predicts then...
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: [HACKERS] ECPG patch for exec sql ifdef etc.
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] Another index "buglet"?