Обсуждение: Costs: Index vs Non-Index

Поиск
Список
Период
Сортировка

Costs: Index vs Non-Index

От
The Hermit Hacker
Дата:
Hrmmm...if I'm reading this right, its more costly to create an index then
to leave it as a sequential scan, but it returns more rows?  Yet, it
returns, if I do the query with a count() around the return value, 288
rows, not 334 or 1154...

udmsearch=> explain select next_index_time from url where next_index_time <= 947317073;
NOTICE:  QUERY PLAN:

Seq Scan on url  (cost=43.00 rows=334 width=4)

EXPLAIN
udmsearch=> create index url_next_index_time on url using btree ( next_index_time);
CREATE
udmsearch=> explain select next_index_time from url where next_index_time <= 947317073;
NOTICE:  QUERY PLAN:

Index Scan using url_next_index_time on url  (cost=271.68 rows=1154 width=4)

EXPLAIN



Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: [HACKERS] Costs: Index vs Non-Index

От
Tom Lane
Дата:
The Hermit Hacker <scrappy@hub.org> writes:
> Hrmmm...if I'm reading this right, its more costly to create an index then
> to leave it as a sequential scan, but it returns more rows?  Yet, it
> returns, if I do the query with a count() around the return value, 288
> rows, not 334 or 1154...

This doesn't have anything to do with index vs sequential scan, but it
does have to do with whether you've done a VACUUM ANALYZE lately.
You haven't ;-)

> udmsearch=> explain select next_index_time from url where next_index_time <= 947317073;
> NOTICE:  QUERY PLAN:
> Seq Scan on url  (cost=43.00 rows=334 width=4)

IIRC, rows=334 is the default estimate of result rows you will get for
this query in the absence of any information whatever.  (Default table
size guess is 1000 rows, and default selectivity guess for <= is 1/3,
so...)  If you have not vacuumed, it's sheer coincidence that this is
even within hailing distance of the correct figure of 288.

> udmsearch=> create index url_next_index_time on url using btree ( next_index_time);
> CREATE
> udmsearch=> explain select next_index_time from url where next_index_time <= 947317073;
> NOTICE:  QUERY PLAN:
> Index Scan using url_next_index_time on url  (cost=271.68 rows=1154 width=4)

I believe that a side-effect of CREATE INDEX is to update the
number-of-pages-and-rows statistics in pg_class for the target table.
So after you do that, the optimizer has a correct idea of the table's
size, but still no more info about the selectivity of the WHERE clause.
(I infer that your table has size 1154*3 rows.)  If you now drop the
index and repeat EXPLAIN, it'll go back to a seq scan, but it will now
say 1154 rows --- and the cost estimate will be higher, too.

If you do VACUUM ANALYZE, then the optimizer will also know the min and
max values of next_index_time, and will have some shot at making a
correct estimate of the output row count.  I'd be interested to know
what it predicts then...
        regards, tom lane


Re: [HACKERS] Costs: Index vs Non-Index

От
The Hermit Hacker
Дата:
Okay, I had remembered to VACUUM, but I always forget to VACUUM ANALYZE :(
results come out much better now:

udmsearch=> explain select next_index_time from url where next_index_time <= 947317073;
NOTICE:  QUERY PLAN:

Seq Scan on url  (cost=3368.58 rows=12623 width=4)

EXPLAIN
udmsearch=> select (next_index_time) from url where next_index_time <= 947317073;                        
next_index_time
---------------
(0 rows)

udmsearch=> create index url_next_index_time on url using btree ( next_index_time);
CREATE
udmsearch=> explain select next_index_time from url where next_index_time <= 947317073;
NOTICE:  QUERY PLAN:

Index Scan using url_next_index_time on url  (cost=1364.10 rows=12623 width=4)

EXPLAIN



On Sat, 8 Jan 2000, Tom Lane wrote:

> The Hermit Hacker <scrappy@hub.org> writes:
> > Hrmmm...if I'm reading this right, its more costly to create an index then
> > to leave it as a sequential scan, but it returns more rows?  Yet, it
> > returns, if I do the query with a count() around the return value, 288
> > rows, not 334 or 1154...
> 
> This doesn't have anything to do with index vs sequential scan, but it
> does have to do with whether you've done a VACUUM ANALYZE lately.
> You haven't ;-)
> 
> > udmsearch=> explain select next_index_time from url where next_index_time <= 947317073;
> > NOTICE:  QUERY PLAN:
> > Seq Scan on url  (cost=43.00 rows=334 width=4)
> 
> IIRC, rows=334 is the default estimate of result rows you will get for
> this query in the absence of any information whatever.  (Default table
> size guess is 1000 rows, and default selectivity guess for <= is 1/3,
> so...)  If you have not vacuumed, it's sheer coincidence that this is
> even within hailing distance of the correct figure of 288.
> 
> > udmsearch=> create index url_next_index_time on url using btree ( next_index_time);
> > CREATE
> > udmsearch=> explain select next_index_time from url where next_index_time <= 947317073;
> > NOTICE:  QUERY PLAN:
> > Index Scan using url_next_index_time on url  (cost=271.68 rows=1154 width=4)
> 
> I believe that a side-effect of CREATE INDEX is to update the
> number-of-pages-and-rows statistics in pg_class for the target table.
> So after you do that, the optimizer has a correct idea of the table's
> size, but still no more info about the selectivity of the WHERE clause.
> (I infer that your table has size 1154*3 rows.)  If you now drop the
> index and repeat EXPLAIN, it'll go back to a seq scan, but it will now
> say 1154 rows --- and the cost estimate will be higher, too.
> 
> If you do VACUUM ANALYZE, then the optimizer will also know the min and
> max values of next_index_time, and will have some shot at making a
> correct estimate of the output row count.  I'd be interested to know
> what it predicts then...
> 
>             regards, tom lane
> 

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org