Re: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.
Дата
Msg-id 19001.1388267802@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.  (Kevin Grittner <kgrittn@ymail.com>)
Ответы Re: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.  (Michael Kolomeitsev <mkolomeitsev@gmail.com>)
Список pgsql-performance
Kevin Grittner <kgrittn@ymail.com> writes:
> Michael Kolomeitsev <mkolomeitsev@gmail.com> wrote:
>> it is clear for me why t1_b_a_idx is better. The question is: Is
>> postgresql able to see that?

> For a number of reasons I never consider a bulk load complete until
> I run VACUUM FREEZE ANALYZE on the table(s) involved.� When I try
> your test case without that, I get the bad index choice.� When I
> then run VACUUM FREEZE ANALYZE on the database I get the good index
> choice.

I think that's just chance, because AFAICS the cost estimates are exactly
the same for both indexes, once you've done the vacuum to make all the
heap pages all-visible.  What's more, I'm not sure that that's wrong,
because according to EXPLAIN (ANALYZE, BUFFERS) the exact same number of
index pages are touched for either index.  So I think Michael's claim that
the one index is better is at best unproven.

regression=# explain (analyze, buffers) select count(*) from t1 where a in (1, 9, 17, 26, 35, 41, 50) and b = 333333;
                                                        QUERY PLAN
   

---------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=32.12..32.13 rows=1 width=0) (actual time=0.097..0.098 rows=1 loops=1)
   Buffers: shared hit=30
   ->  Index Only Scan using t1_b_a_idx on t1  (cost=0.57..32.10 rows=7 width=0) (actual time=0.044..0.085 rows=7
loops=1)
         Index Cond: ((b = 333333) AND (a = ANY ('{1,9,17,26,35,41,50}'::integer[])))
         Heap Fetches: 0
         Buffers: shared hit=30
 Total runtime: 0.174 ms
(7 rows)

regression=# begin; drop index t1_b_a_idx;
BEGIN
DROP INDEX
regression=# explain (analyze, buffers) select count(*) from t1 where a in (1, 9, 17, 26, 35, 41, 50) and b = 333333;
                                                        QUERY PLAN
   

---------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=32.12..32.13 rows=1 width=0) (actual time=0.110..0.110 rows=1 loops=1)
   Buffers: shared hit=30
   ->  Index Only Scan using t1_a_b_idx on t1  (cost=0.57..32.10 rows=7 width=0) (actual time=0.039..0.101 rows=7
loops=1)
         Index Cond: ((a = ANY ('{1,9,17,26,35,41,50}'::integer[])) AND (b = 333333))
         Heap Fetches: 0
         Buffers: shared hit=30
 Total runtime: 0.199 ms
(7 rows)

regression=# abort;
ROLLBACK

I grant the theory that the repeated index probes in t1_b_a_idx should be
more localized than those in t1_a_b_idx, but PG's optimizer doesn't
attempt to estimate such effects, and this example isn't doing much to
convince me that it'd be worth the trouble.

            regards, tom lane


В списке pgsql-performance по дате отправления:

Предыдущее
От: Gavin Flower
Дата:
Сообщение: Re: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.
Следующее
От: Michael Kolomeitsev
Дата:
Сообщение: Re: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.