Re: Using LIMIT changes index used by planner

От: Tom Lane
Тема: Re: Using LIMIT changes index used by planner
Дата: ,
Msg-id: 28820.1102977787@sss.pgh.pa.us
(см: обсуждение, исходный текст)
Ответ на: Re: Using LIMIT changes index used by planner  (Sven Willenberger)
Ответы: Re: Using LIMIT changes index used by planner  (Sven Willenberger)
Re: Using LIMIT changes index used by planner  (Pierre-Frédéric Caillaud<>)
Список: pgsql-performance

Скрыть дерево обсуждения

Using LIMIT changes index used by planner  (Sven Willenberger, )
 Re: Using LIMIT changes index used by planner  (Andrew McMillan, )
  Re: Using LIMIT changes index used by planner  (Sven Willenberger, )
   Re: Using LIMIT changes index used by planner  (Tom Lane, )
    Re: Using LIMIT changes index used by planner  (Sven Willenberger, )
     Re: Using LIMIT changes index used by planner  (Tom Lane, )
    Re: Using LIMIT changes index used by planner  (Pierre-Frédéric Caillaud<>, )

Sven Willenberger <> writes:
> explain analyze select storelocation,order_number from custacct where
> referrer = 1365  and orderdate between '2004-12-07' and '2004-12-07
> 12:00:00' order by custacctid limit 10;

>                   QUERY PLAN

>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   Limit  (cost=0.00..43065.76 rows=10 width=43) (actual
> time=1306957.216..1307072.111 rows=10 loops=1)
>     ->  Index Scan using custacct2_pkey on custacct
> (cost=0.00..92083209.38 rows=21382 width=43) (actual
> time=1306957.205..1307072.017 rows=10 loops=1)
>           Filter: ((referrer = 1365) AND (orderdate >= '2004-12-07
> 00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07
> 12:00:00'::timestamp without time zone))
>   Total runtime: 1307072.231 ms
> (4 rows)

I think this is the well-known issue of lack of cross-column correlation
statistics.  The planner is well aware that this indexscan will be
horridly expensive if run to completion --- but it's assuming that
stopping after 10 rows, or 10/21382 of the total scan, will cost only
about 10/21382 as much as the whole scan would.  This amounts to
assuming that the rows matching the filter condition are randomly
distributed among all the rows taken in custacctid order.  I suspect
that your test case actually has a great deal of correlation between
custacctid and referrer/orderdate, such that the indexscan in custacctid
order ends up fetching many more rows that fail the filter condition
than random chance would suggest, before it finally comes across 10 that
pass the filter.

There isn't any near-term fix in the wind for this, since storing
cross-column statistics is an expensive proposition that we haven't
decided how to handle.  Your workaround with separating the ORDER BY
from the LIMIT is a good one.

            regards, tom lane


В списке pgsql-performance по дате сообщения:

От: "Iain"
Дата:
Сообщение: Re: pg_restore taking 4 hours!
От: Bruno Wolff III
Дата:
Сообщение: Re: Similar tables, different indexes performance