Re: QUERY PLANNER - Indexe mono column VS composite Index

Поиск
Список
Период
Сортировка
От Nicolas Paris
Тема Re: QUERY PLANNER - Indexe mono column VS composite Index
Дата
Msg-id CA+ssMORBzDy65z7+0QT8Yq1yr_LuLMd0w4E8SgLZjALdoAucAA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: QUERY PLANNER - Indexe mono column VS composite Index  (Guillaume Lelarge <guillaume@lelarge.info>)
Ответы Re: QUERY PLANNER - Indexe mono column VS composite Index  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-performance
Ok, here is the problem (it's different than what I explained before)
==INDEX ==
CREATE INDEX of_idx_modifier
  ON i2b2data_multi_nomi.observation_fact
  USING btree
  (concept_cd COLLATE pg_catalog."default", modifier_cd COLLATE pg_catalog."default", valtype_cd COLLATE pg_catalog."default", tval_char COLLATE pg_catalog."default", nval_num);

==QUERY==

 EXPLAIN ANALYSE select  f.patient_num  
from i2b2data_multi_nomi.observation_fact f 
where  
f.concept_cd IN (select concept_cd from  i2b2data_multi_nomi.concept_dimension   where concept_path LIKE '\\i2b2\\cim10\\A00-B99\\%')   
  AND  (  modifier_cd = '@'  AND  valtype_cd = 'T' AND tval_char   IN ('DP') ) 
group by  f.patient_num ;

                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=35153.99..35154.40 rows=41 width=4) (actual time=81.223..82.718 rows=5206 loops=1)
   Group Key: f.patient_num
   ->  Nested Loop  (cost=4740.02..35089.11 rows=25951 width=4) (actual time=45.393..76.893 rows=7359 loops=1)
         ->  HashAggregate  (cost=4739.45..4748.64 rows=919 width=10) (actual time=45.097..45.586 rows=925 loops=1)
               Group Key: (concept_dimension.concept_cd)::text
               ->  Seq Scan on concept_dimension  (cost=0.00..4734.73 rows=1892 width=10) (actual time=17.479..44.573 rows=925 loops=1)
                     Filter: ((concept_path)::text ~~ '\\i2b2\\cim10\\A00-B99\\%'::text)
                     Rows Removed by Filter: 186413
         ->  Index Scan using of_idx_modifier on observation_fact f  (cost=0.56..32.86 rows=15 width=14) (actual time=0.025..0.031 rows=8 loops=925)
               Index Cond: (((concept_cd)::text = (concept_dimension.concept_cd)::text) AND ((modifier_cd)::text = '@'::text) AND ((valtype_cd)::text 
= 'T'::text) AND ((tval_char)::text = 'DP'::text))
 Planning time: 2.843 ms
 Execution time: 83.273 ms
(12 rows)



============2 : without 3 constraint that match index => seq scan=======================================================================

 EXPLAIN ANALYSE select  f.patient_num  
from i2b2data_multi_nomi.observation_fact f 
where  
f.concept_cd IN (select concept_cd from  i2b2data_multi_nomi.concept_dimension   where concept_path LIKE '\\i2b2\\cim10\\A00-B99\\%')   
 -- AND  (  modifier_cd = '@'  AND  valtype_cd = 'T' AND tval_char   IN ('DP') ) 
group by  f.patient_num ;

                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
  HashAggregate  (cost=1345377.85..1346073.80 rows=69595 width=4) (actual time=18043.140..18048.741 rows=16865 loops=1)
   Group Key: f.patient_num
   ->  Hash Join  (cost=4760.13..1233828.53 rows=44619728 width=4) (actual time=17109.041..18027.763 rows=33835 loops=1)
         Hash Cond: ((f.concept_cd)::text = (concept_dimension.concept_cd)::text)
         ->  Seq Scan on observation_fact f  (cost=0.00..1057264.28 rows=44619728 width=14) (actual time=0.040..7918.984 rows=44619320 loops=1)
         ->  Hash  (cost=4748.64..4748.64 rows=919 width=10) (actual time=49.523..49.523 rows=925 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 39kB
               ->  HashAggregate  (cost=4739.45..4748.64 rows=919 width=10) (actual time=48.806..49.117 rows=925 loops=1)
                     Group Key: (concept_dimension.concept_cd)::text
                     ->  Seq Scan on concept_dimension  (cost=0.00..4734.73 rows=1892 width=10) (actual time=18.828..48.191 rows=925 loops=1)
                           Filter: ((concept_path)::text ~~ '\\i2b2\\cim10\\A00-B99\\%'::text)
                           Rows Removed by Filter: 186413
 Planning time: 2.588 ms
 Execution time: 18051.031 ms
(14 rows)


=========3: without a constraint on tval_char => seq scan========================================================================


 EXPLAIN ANALYSE select  f.patient_num  
from i2b2data_multi_nomi.observation_fact f 
where  
f.concept_cd IN (select concept_cd from  i2b2data_multi_nomi.concept_dimension   where concept_path LIKE '\\i2b2\\cim10\\A00-B99\\%')   
  AND  (  modifier_cd = '@'  AND  valtype_cd = 'T'  ) 
group by  f.patient_num ;


                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
HashAggregate  (cost=1305637.84..1305688.23 rows=5039 width=4) (actual time=22689.279..22694.583 rows=16865 loops=1)
   Group Key: f.patient_num
   ->  Hash Join  (cost=4760.13..1297561.67 rows=3230468 width=4) (actual time=12368.418..22674.145 rows=33835 loops=1)
         Hash Cond: ((f.concept_cd)::text = (concept_dimension.concept_cd)::text)
         ->  Seq Scan on observation_fact f  (cost=0.00..1280362.92 rows=3230468 width=14) (actual time=0.226..22004.808 rows=3195625 loops=1)
               Filter: (((modifier_cd)::text = '@'::text) AND ((valtype_cd)::text = 'T'::text))
               Rows Removed by Filter: 41423695
         ->  Hash  (cost=4748.64..4748.64 rows=919 width=10) (actual time=46.833..46.833 rows=925 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 39kB
               ->  HashAggregate  (cost=4739.45..4748.64 rows=919 width=10) (actual time=46.196..46.515 rows=925 loops=1)
                     Group Key: (concept_dimension.concept_cd)::text
                     ->  Seq Scan on concept_dimension  (cost=0.00..4734.73 rows=1892 width=10) (actual time=18.899..45.800 rows=925 loops=1)
                           Filter: ((concept_path)::text ~~ '\\i2b2\\cim10\\A00-B99\\%'::text)
                           Rows Removed by Filter: 186413
 Planning time: 1.940 ms
 Execution time: 22695.913 ms

What I would like is the planner allways hit of_idx_modifier 

Thanks !

2015-07-09 22:49 GMT+02:00 Guillaume Lelarge <guillaume@lelarge.info>:
2015-07-09 22:34 GMT+02:00 Nicolas Paris <niparisco@gmail.com>:
Hello,

My 9.4 database is used as datawharehouse. I can't change the queries generated.

first index  : INDEX COL (A,B,C,D,E)


In case of query based on COL A,  the query planner sometimes go to a seq scan instead of using the first composite index.

The solution is to add a second indexe (redondant)
second index : INDEX COL (A)

In case of query based on COL A, B, C, D, (without E) as well, it doesn't uses the first index and prefers a seq scan.

I could create a third indexe :
first index  : INDEX COL (A,B,C,D) 

But I hope there is an other solution for that (table is huge).

It seems that the malus for using composite indexes is high.

Question is : is there a way to make the composite index more attractive to query planner ? (idealy equivalent to mono column indexe)


There's no way we can answer that without seeing actual queries and query plans.


--

В списке pgsql-performance по дате отправления:

Предыдущее
От: Guillaume Lelarge
Дата:
Сообщение: Re: QUERY PLANNER - Indexe mono column VS composite Index
Следующее
От: Jeff Janes
Дата:
Сообщение: Re: QUERY PLANNER - Indexe mono column VS composite Index