Using LIMIT changes index used by planner

Поиск
Список
Период
Сортировка
От Sven Willenberger
Тема Using LIMIT changes index used by planner
Дата
Msg-id 41BD3317.9090507@dmv.com
обсуждение исходный текст
Ответы Re: Using LIMIT changes index used by planner
Список pgsql-performance
I have a question regarding a serious performance hit taken when using a
LIMIT clause. I am using version 7.4.6 on FreeBSD 4.10-STABLE with 2GB
of memory. The table in question contains some 25 million rows with a
bigserial primary key, orderdate index and a referrer index. The 2
select statements are as follow:

A) select storelocation,order_number from custacct where referrer = 1365
  and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by
custacctid;

B) select storelocation,order_number from custacct where referrer = 1365
  and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by
custacctid limit 10;

So the only difference is the use of the Limit, which, in theory, should
be quicker after custacctid is ordered.

Now the analyze results:

A) explain select storelocation,order_number from custacct where
referrer = 1365  and orderdate between '2004-12-07' and '2004-12-07
12:00:00' order by custacctid;

        QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=904420.55..904468.11 rows=19025 width=44)
    Sort Key: custacctid
    ->  Index Scan using orderdate_idx on custacct
(cost=0.00..903068.29 rows=19025 width=44)
          Index Cond: ((orderdate >= '2004-12-07 00:00:00'::timestamp
without time zone) AND (orderdate <= '2004-12-07 12:00:00'::timestamp
without time zone))
          Filter: (referrer = 1365)
(5 rows)

************************

B) explain select storelocation,order_number from custacct where
referrer = 1365  and orderdate between '2004-12-07' and '2004-12-07
12:00:00' order by custacctid limit 10;

                  QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..33796.50 rows=10 width=44)
    ->  Index Scan using custacct2_pkey on custacct
(cost=0.00..64297840.86 rows=19025 width=44)
          Filter: ((referrer = 1365) AND (orderdate >= '2004-12-07
00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07
12:00:00'::timestamp without time zone))
(3 rows)

*******************

Notice the huge cost difference in the two plans: 904468 in the one
without LIMIT versus 64297840.86 for the index scan on custacct index.
Why would the planner switch from using the orderdate index to the
custacct index (which is a BIGSERIAL, btw)?

I can change that behavior (and speed up the resultant query) by using
the following subquery:

explain select foo.storelocation, foo.order_number from (select
storelocation,order_number from custacct where referrer = 1365  and
orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by
custacctid) as foo  limit 10;

              QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=904420.55..904420.67 rows=10 width=100)
    ->  Subquery Scan foo  (cost=904420.55..904658.36 rows=19025 width=100)
          ->  Sort  (cost=904420.55..904468.11 rows=19025 width=44)
                Sort Key: custacctid
                ->  Index Scan using orderdate_idx on custacct
(cost=0.00..903068.29 rows=19025 width=44)
                      Index Cond: ((orderdate >= '2004-12-07
00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07
12:00:00'::timestamp without time zone))
                      Filter: (referrer = 1365)
(7 rows)

As a side note, when running query A, the query takes 1772.523 ms, when
running the subselect version to get the limit, it takes 1415.615 ms.
Running option B (with the other index being scanned) takes several
minutes (close to 10 minutes!). What am I missing about how the planner
views the LIMIT statement?

Sven

В списке pgsql-performance по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: Query is not using index when it should
Следующее
От: Andrew McMillan
Дата:
Сообщение: Re: Using LIMIT changes index used by planner