Re: Different execution plan between PostgreSQL 8.4 and 12.11

Поиск
Список
Период
Сортировка
От Klint Gore
Тема Re: Different execution plan between PostgreSQL 8.4 and 12.11
Дата
Msg-id MEYPR01MB7101CD5DA0A07C9DE2B74850A4239@MEYPR01MB7101.ausprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Different execution plan between PostgreSQL 8.4 and 12.11  (gzh <gzhcoder@126.com>)
Ответы Re: Different execution plan between PostgreSQL 8.4 and 12.11  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-general

From: Bruce Momjian <bruce@momjian.us> Sent: Wednesday, 12 October 2022 1:30 AM
 
>On Tue, Oct 11, 2022 at 09:59:43AM -0400, Tom Lane wrote:
>> David Rowley <dgrowleyml@gmail.com> writes:
>> > It feels like something is a bit lacking in our cost model here. I'm
>> > just not sure what that is.
>>
>> The example you show is the same old problem that we've understood for
>> decades: for cost-estimation purposes, we assume that matching rows
>> are more or less evenly distributed in the table.  Their actual
>> location doesn't matter that much if you're scanning the whole table;
>> but if you're hoping that a LIMIT will be able to stop after scanning
>> just a few rows, it does matter.
>
> We do have a correlation statistics value for each column but I am
> unclear if that would help here.

This might give someone an idea -  the best query I come up with was

   explain analyze select distinct 2 from tbl where (fld = 230) limit 1;


Limit  (cost=0.56..28349.28 rows=1 width=4) (actual time=0.039..0.040 rows=1 loops=1)
  ->  Unique  (cost=0.56..28349.28 rows=1 width=4) (actual time=0.039..0.039 rows=1 loops=1)
        ->  Index Only Scan using idx on tbl  (cost=0.56..28349.28 rows=995241 width=4) (actual time=0.038..0.038 rows=1 loops=1)
              Index Cond: (fld = 230)
              Heap Fetches: 0
Planning Time: 0.066 ms
Execution Time: 0.047 ms

With the distinct and the limit, the planner somehow knows to push the either the distinct or the limit into the index only scan so the unique for distinct only had 1 row and the outer limit only had 1 row.  Without the limit, the distinct still does the index only scan but has to do the unique on the million rows and execution time goes to about 100ms.


fld is mostly ordered - it's a serial primary key in another table.  The cardinality of the 131 distinct values is an exponential distribution. Of the 20m rows,   the fld values ordered by count is 8m, 5m 2m, 1m, 1m, .... down to about 10k.  index is btree with stats target of 1000.  table is analyzed and vacuum frozen.  there is a "create statistics" on this table for n:1 relationship between another field and this one.


Without the distinct, choosing a different value with lower number of rows changed the plan to index only scan with limit somewhere between 3.7% and 4.7% of the table.  With a brin index on a similar size/distributed table that is in fld order, that changed to somewhere between 0.6% and 0.7%.



В списке pgsql-general по дате отправления:

Предыдущее
От: Guillaume Lelarge
Дата:
Сообщение: Re: Weird planner issue on a standby
Следующее
От: David Rowley
Дата:
Сообщение: Re: Different execution plan between PostgreSQL 8.4 and 12.11