Sequential Scan with LIMIT

Поиск
Список
Период
Сортировка
От John Meinel
Тема Sequential Scan with LIMIT
Дата
Msg-id 417C023F.1080502@johnmeinel.com
обсуждение исходный текст
Ответы Re: Sequential Scan with LIMIT  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Sequential Scan with LIMIT  (Curt Sampson <cjs@cynic.net>)
Список pgsql-performance
I was looking into another problem, and I found something that surprised
me. If I'm doing "SELECT * FROM mytable WHERE col = 'myval' LIMIT 1.".
Now "col" is indexed, by mytable has 500,000 rows, and 'myval' occurs
maybe 100,000 times. Without the LIMIT, this query should definitely do
a sequential scan.

But with the LIMIT, doesn't it know that it will return at max 1 value,
and thus be able to use the index?

It seems to be doing the LIMIT too late.

The real purpose of this query is to check to see if a value exists in
the column, so there might be a better way of doing it. Here is the demo
info:

# select count(*) from finst_t;
542315

# select count(*) from finst_t where store_id = 539960;
85076

# explain analyze select id from finst_t where store_id = 539960 limit 1;
                                                      QUERY PLAN

-------------------------------------------------------------------------------------------------------------------
   Limit  (cost=0.00..0.13 rows=1 width=4) (actual time=860.000..860.000
rows=1 loops=1)
     ->  Seq Scan on finst_t  (cost=0.00..11884.94 rows=88217 width=4)
(actual time=860.000..860.000 rows=1 loops=1)
           Filter: (store_id = 539960)
   Total runtime: 860.000 ms

Notice that the "actual rows=1", meaning it is aware of the limit as it
is going through the table. But for some reason the planner thinks it is
going to return 88,217 rows. (This is close to the reality of 85076 if
it actually had to find all of the rows).

Now, if I do a select on a value that *does* only have 1 value, it works
fine:

# explain analyze select id from finst_t where store_id = 9605 limit 1;
                                                               QUERY PLAN



------------------------------------------------------------------------------------------------------------------------
   Limit  (cost=0.00..3.96 rows=1 width=4) (actual time=0.000..0.000
rows=1 loops=1)
     ->  Index Scan using finst_t_store_id_idx on finst_t
(cost=0.00..3.96 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
           Index Cond: (store_id = 9605)
   Total runtime: 0.000 ms

And 1 further thing, I *can* force it to do a fast index scan if I
disable sequential scanning.

# set enable_seqscan to off;
# explain analyze select id from finst_t where store_id = 539960 limit 1;
                                                                   QUERY
PLAN



------------------------------------------------------------------------------------------------------------------------
   Limit  (cost=0.00..1.59 rows=1 width=4) (actual time=0.000..0.000
rows=1 loops=1)
     ->  Index Scan using finst_t_store_id_idx on finst_t
(cost=0.00..140417.22 rows=88217 width=4) (actual time=0.000..0.000
rows=1 loops=1)
           Index Cond: (store_id = 539960)
   Total runtime: 0.000 ms

Could being aware of LIMIT be added to the planner? Is there a better
way to check for existence?

John
=:->

PS> I'm using postgres 8.0-beta3 on win32 (the latest installer).


Вложения

В списке pgsql-performance по дате отправления:

Предыдущее
От: John Meinel
Дата:
Сообщение: Re: Queries slow using stored procedures
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Sequential Scan with LIMIT