Bad query plan decision when using multiple column index - postgresqluses only first column then filters

Поиск

Список

Период

Сортировка

От	Cosmin Prund
Тема	Bad query plan decision when using multiple column index - postgresqluses only first column then filters
Дата	16 января 2020 г. 17:06:06
Msg-id	CAGU4dz_tvnjJB3OEgErzt6V1PHACH0WqxwFFoMm+5bCNvjrF4g@mail.gmail.com обсуждение исходный текст
Ответы	Re: Bad query plan decision when using multiple column index - postgresql uses only first column then filters (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-performance

Дерево обсуждения

Hello List, I'm Cosmin. This is my first post and I'll get right down to the problem. I'm using Postgresql 10 (because that's what's installed by default on Ubuntu 18.04):

explain analyze

select R, C, V from LBD

where Ver = 92 and Id in (10,11)

Index Scan using "IX_LBD_Ver_Id" on "LBD" (cost=0.56..2.37 rows=1 width=13) (actual time=0.063..857.725 rows=2 loops=1)

Index Cond: ("Ver" = 92)

Filter: ("Id" = ANY ('{10,11}'::integer[]))

Rows Removed by Filter: 1869178

Planning time: 0.170 ms

Execution time: 857.767 ms

The IX_LBD_Ver_Id index is on two columns (Ver, Id) - it's not in "Ver" alone!

Somehow the query planner thinks that scanning the index on "Ver" alone should only return 1 record. The problem is that there are, on average, millions of records for each "Ver"!

The current query is not my real query: the original problem was with a JOIN. I boiled it down to this simple query because it shows the same problem: when dealing with more then one "Id" the planner scans on "Ver" and filters on "Id". Running the query with a single "Id" does use the index on both columns and the query finishes in only 0.7 ms (one thousand times faster)

The planner doesn't always get it this bad. The original JOIN usually runs instantaneously. Unless the planner gets into it's current funk and then the original JOIN never finishes.

- I've reindexed the whole database

- I ran ANALYZE on all tables

- I checked "pg_stats", here are the stats:

select attname, null_frac, avg_width, n_distinct, correlation from pg_stats where tablename = 'LBD' and (attname in ('Id', 'Ver'))

attname null_frac acg_width n_distinct correlation

Id 0 4 2029846 0.0631249

Ver 0 2 22 0.624823

According to data from "pg_stats" the query planner should know that scanning the "LBD" table has on average millions of records per "Ver".

The fact that this works right most of the time tells me I'm dealing with some kind of statistical information (something along the lines of n_distinct from pg_stat) and gives me hope. Once I know why the planner gets this wrong I should be able to make it right.

Please point me in the right direction. Where should I look, what should I try?

Thank you,

Cosmin

В списке pgsql-performance по дате отправления:

Предыдущее

От: Marcin Barczyński
Дата: 16 января 2020 г., 16:21:57
Сообщение: Queries in plpgsql are 6 times slower on partitioned tables

Следующее

От: Tom Lane
Дата: 16 января 2020 г., 18:11:26
Сообщение: Re: Bad query plan decision when using multiple column index - postgresql uses only first column then filters

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Bad query plan decision when using multiple column index - postgresqluses only first column then filters

Предыдущее

Следующее