Bad query plan decision when using multiple column index - postgresqluses only first column then filters

Поиск
Список
Период
Сортировка
От Cosmin Prund
Тема Bad query plan decision when using multiple column index - postgresqluses only first column then filters
Дата
Msg-id CAGU4dz_tvnjJB3OEgErzt6V1PHACH0WqxwFFoMm+5bCNvjrF4g@mail.gmail.com
обсуждение исходный текст
Ответы Re: Bad query plan decision when using multiple column index - postgresql uses only first column then filters  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
Hello List, I'm Cosmin. This is my first post and I'll get right down to the problem. I'm using Postgresql 10 (because that's what's installed by default on Ubuntu 18.04):

explain analyze 
   select R, C, V from LBD
   where Ver = 92 and Id in (10,11)

Index Scan using "IX_LBD_Ver_Id" on "LBD"  (cost=0.56..2.37 rows=1 width=13) (actual time=0.063..857.725 rows=2 loops=1)
  Index Cond: ("Ver" = 92)
  Filter: ("Id" = ANY ('{10,11}'::integer[]))
  Rows Removed by Filter: 1869178
Planning time: 0.170 ms
Execution time: 857.767 ms

The  IX_LBD_Ver_Id index is on two columns (Ver, Id) - it's not in "Ver" alone!

Somehow the query planner thinks that scanning the index on "Ver" alone should only return 1 record. The problem is that there are, on average, millions of records for each "Ver"!
The current query is not my real query: the original problem was with a JOIN. I boiled it down to this simple query because it shows the same problem: when dealing with more then one "Id" the planner scans on "Ver" and filters on "Id". Running the query with a single "Id" does use the index on both columns and the query finishes in only 0.7 ms (one thousand times faster)
The planner doesn't always get it this bad. The original JOIN usually runs instantaneously. Unless the planner gets into it's current funk and then the original JOIN never finishes.

- I've reindexed the whole database
- I ran ANALYZE on all tables
- I checked "pg_stats", here are the stats:

select attname, null_frac, avg_width, n_distinct, correlation from pg_stats where tablename = 'LBD' and (attname in ('Id', 'Ver'))
attname null_frac acg_width n_distinct correlation
Id           0            4                2029846  0.0631249
Ver         0            2                22            0.624823

According to data from "pg_stats" the query planner should know that scanning the "LBD" table has on average millions of records per "Ver".
The fact that this works right most of the time tells me I'm dealing with some kind of statistical information (something along the lines of n_distinct from pg_stat) and gives me hope. Once I know why the planner gets this wrong I should be able to make it right.

Please point me in the right direction. Where should I look, what should I try?

Thank you,
Cosmin

В списке pgsql-performance по дате отправления:

Предыдущее
От: Marcin Barczyński
Дата:
Сообщение: Queries in plpgsql are 6 times slower on partitioned tables
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Bad query plan decision when using multiple column index - postgresql uses only first column then filters