Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.

Поиск
Список
Период
Сортировка
От Michael Kolomeitsev
Тема Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.
Дата
Msg-id CAABbzO01zK4Ab=vd3UD1Jd++-Hju9bLerzS1+=JnpMsXCLtg8w@mail.gmail.com
обсуждение исходный текст
Ответы Re: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.  (Kevin Grittner <kgrittn@ymail.com>)
Список pgsql-performance
It seems postgresql is unable to choose correct index in such cases.
(my pg version is 9.3.2)

Let's see example:
create table t1 as select a.a, b.b from generate_series(1, 100) a(a), generate_series(1,500000) b(b);
create index t1_a_idx on t1(a);
create index t1_b_idx on t1(b);
create index t1_a_b_idx on t1(a,b);
create index t1_b_a_idx on t1(b,a);
alter table t1 alter a set statistics 10000;
alter table t1 alter b set statistics 10000;
analyze t1;

test=> explain select count(*) from t1 where a in (1, 9, 17, 26, 35, 41, 50) and b = 333333;
                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Aggregate  (cost=46.62..46.63 rows=1 width=0)
   ->  Index Only Scan using t1_a_b_idx on t1  (cost=0.57..46.60 rows=7 width=0)
         Index Cond: ((a = ANY ('{1,9,17,26,35,41,50}'::integer[])) AND (b = 333333))
(3 rows)

Rows estimation is exact.
But I think using t1_a_b_idx index is not the best choice.
Let's check:
# drop pg and disc buffers/caches
systemctl stop postgresql.service ; echo 3 >/proc/sys/vm/drop_caches ; systemctl start postgresql.service ; sleep 2
# warm up pg and check the plan
{ echo '\\timing' && echo "explain select count(*) from t1 where a in (1, 9, 17, 26, 35, 41, 50) and b = 333333;" ; } | psql test
# do the benchmark
{ echo '\\timing' && echo "select count(*) from t1 where a in (1, 9, 17, 26, 35, 41, 50) and b = 333333;" ; } | psql test

I have 200-210ms timing for the last query and t1_a_b_idx is used always. I checked several times.

Ok. Now 'drop index t1_a_b_idx;' and check again.
Pg now uses t1_b_a_idx and I have 90-100ms for that control query. This is much better.

I took pageinspect contrib module, learnt btree structure and it is clear for me
why t1_b_a_idx is better. The question is: Is postgresql able to see that?

В списке pgsql-performance по дате отправления:

Предыдущее
От: Ronaldo Maia
Дата:
Сообщение: Possible regression (slow query on 9.2/9.3 when compared to 9.1)
Следующее
От: GR Vishwanath
Дата:
Сообщение: Does fsync on/off for wal AND Checkpoint?