Обсуждение: QUERY PLANNER - Indexe mono column VS composite Index

Поиск
Список
Период
Сортировка

QUERY PLANNER - Indexe mono column VS composite Index

От
Nicolas Paris
Дата:
Hello,

My 9.4 database is used as datawharehouse. I can't change the queries generated.

first index  : INDEX COL (A,B,C,D,E)


In case of query based on COL A,  the query planner sometimes go to a seq scan instead of using the first composite index.

The solution is to add a second indexe (redondant)
second index : INDEX COL (A)

In case of query based on COL A, B, C, D, (without E) as well, it doesn't uses the first index and prefers a seq scan.

I could create a third indexe :
first index  : INDEX COL (A,B,C,D) 

But I hope there is an other solution for that (table is huge).

It seems that the malus for using composite indexes is high.

Question is : is there a way to make the composite index more attractive to query planner ? (idealy equivalent to mono column indexe)


Thanks by advance

Re: QUERY PLANNER - Indexe mono column VS composite Index

От
Guillaume Lelarge
Дата:
2015-07-09 22:34 GMT+02:00 Nicolas Paris <niparisco@gmail.com>:
Hello,

My 9.4 database is used as datawharehouse. I can't change the queries generated.

first index  : INDEX COL (A,B,C,D,E)


In case of query based on COL A,  the query planner sometimes go to a seq scan instead of using the first composite index.

The solution is to add a second indexe (redondant)
second index : INDEX COL (A)

In case of query based on COL A, B, C, D, (without E) as well, it doesn't uses the first index and prefers a seq scan.

I could create a third indexe :
first index  : INDEX COL (A,B,C,D) 

But I hope there is an other solution for that (table is huge).

It seems that the malus for using composite indexes is high.

Question is : is there a way to make the composite index more attractive to query planner ? (idealy equivalent to mono column indexe)


There's no way we can answer that without seeing actual queries and query plans.


--

Re: QUERY PLANNER - Indexe mono column VS composite Index

От
Nicolas Paris
Дата:
Ok, here is the problem (it's different than what I explained before)
==INDEX ==
CREATE INDEX of_idx_modifier
  ON i2b2data_multi_nomi.observation_fact
  USING btree
  (concept_cd COLLATE pg_catalog."default", modifier_cd COLLATE pg_catalog."default", valtype_cd COLLATE pg_catalog."default", tval_char COLLATE pg_catalog."default", nval_num);

==QUERY==

 EXPLAIN ANALYSE select  f.patient_num  
from i2b2data_multi_nomi.observation_fact f 
where  
f.concept_cd IN (select concept_cd from  i2b2data_multi_nomi.concept_dimension   where concept_path LIKE '\\i2b2\\cim10\\A00-B99\\%')   
  AND  (  modifier_cd = '@'  AND  valtype_cd = 'T' AND tval_char   IN ('DP') ) 
group by  f.patient_num ;

                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=35153.99..35154.40 rows=41 width=4) (actual time=81.223..82.718 rows=5206 loops=1)
   Group Key: f.patient_num
   ->  Nested Loop  (cost=4740.02..35089.11 rows=25951 width=4) (actual time=45.393..76.893 rows=7359 loops=1)
         ->  HashAggregate  (cost=4739.45..4748.64 rows=919 width=10) (actual time=45.097..45.586 rows=925 loops=1)
               Group Key: (concept_dimension.concept_cd)::text
               ->  Seq Scan on concept_dimension  (cost=0.00..4734.73 rows=1892 width=10) (actual time=17.479..44.573 rows=925 loops=1)
                     Filter: ((concept_path)::text ~~ '\\i2b2\\cim10\\A00-B99\\%'::text)
                     Rows Removed by Filter: 186413
         ->  Index Scan using of_idx_modifier on observation_fact f  (cost=0.56..32.86 rows=15 width=14) (actual time=0.025..0.031 rows=8 loops=925)
               Index Cond: (((concept_cd)::text = (concept_dimension.concept_cd)::text) AND ((modifier_cd)::text = '@'::text) AND ((valtype_cd)::text 
= 'T'::text) AND ((tval_char)::text = 'DP'::text))
 Planning time: 2.843 ms
 Execution time: 83.273 ms
(12 rows)



============2 : without 3 constraint that match index => seq scan=======================================================================

 EXPLAIN ANALYSE select  f.patient_num  
from i2b2data_multi_nomi.observation_fact f 
where  
f.concept_cd IN (select concept_cd from  i2b2data_multi_nomi.concept_dimension   where concept_path LIKE '\\i2b2\\cim10\\A00-B99\\%')   
 -- AND  (  modifier_cd = '@'  AND  valtype_cd = 'T' AND tval_char   IN ('DP') ) 
group by  f.patient_num ;

                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
  HashAggregate  (cost=1345377.85..1346073.80 rows=69595 width=4) (actual time=18043.140..18048.741 rows=16865 loops=1)
   Group Key: f.patient_num
   ->  Hash Join  (cost=4760.13..1233828.53 rows=44619728 width=4) (actual time=17109.041..18027.763 rows=33835 loops=1)
         Hash Cond: ((f.concept_cd)::text = (concept_dimension.concept_cd)::text)
         ->  Seq Scan on observation_fact f  (cost=0.00..1057264.28 rows=44619728 width=14) (actual time=0.040..7918.984 rows=44619320 loops=1)
         ->  Hash  (cost=4748.64..4748.64 rows=919 width=10) (actual time=49.523..49.523 rows=925 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 39kB
               ->  HashAggregate  (cost=4739.45..4748.64 rows=919 width=10) (actual time=48.806..49.117 rows=925 loops=1)
                     Group Key: (concept_dimension.concept_cd)::text
                     ->  Seq Scan on concept_dimension  (cost=0.00..4734.73 rows=1892 width=10) (actual time=18.828..48.191 rows=925 loops=1)
                           Filter: ((concept_path)::text ~~ '\\i2b2\\cim10\\A00-B99\\%'::text)
                           Rows Removed by Filter: 186413
 Planning time: 2.588 ms
 Execution time: 18051.031 ms
(14 rows)


=========3: without a constraint on tval_char => seq scan========================================================================


 EXPLAIN ANALYSE select  f.patient_num  
from i2b2data_multi_nomi.observation_fact f 
where  
f.concept_cd IN (select concept_cd from  i2b2data_multi_nomi.concept_dimension   where concept_path LIKE '\\i2b2\\cim10\\A00-B99\\%')   
  AND  (  modifier_cd = '@'  AND  valtype_cd = 'T'  ) 
group by  f.patient_num ;


                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
HashAggregate  (cost=1305637.84..1305688.23 rows=5039 width=4) (actual time=22689.279..22694.583 rows=16865 loops=1)
   Group Key: f.patient_num
   ->  Hash Join  (cost=4760.13..1297561.67 rows=3230468 width=4) (actual time=12368.418..22674.145 rows=33835 loops=1)
         Hash Cond: ((f.concept_cd)::text = (concept_dimension.concept_cd)::text)
         ->  Seq Scan on observation_fact f  (cost=0.00..1280362.92 rows=3230468 width=14) (actual time=0.226..22004.808 rows=3195625 loops=1)
               Filter: (((modifier_cd)::text = '@'::text) AND ((valtype_cd)::text = 'T'::text))
               Rows Removed by Filter: 41423695
         ->  Hash  (cost=4748.64..4748.64 rows=919 width=10) (actual time=46.833..46.833 rows=925 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 39kB
               ->  HashAggregate  (cost=4739.45..4748.64 rows=919 width=10) (actual time=46.196..46.515 rows=925 loops=1)
                     Group Key: (concept_dimension.concept_cd)::text
                     ->  Seq Scan on concept_dimension  (cost=0.00..4734.73 rows=1892 width=10) (actual time=18.899..45.800 rows=925 loops=1)
                           Filter: ((concept_path)::text ~~ '\\i2b2\\cim10\\A00-B99\\%'::text)
                           Rows Removed by Filter: 186413
 Planning time: 1.940 ms
 Execution time: 22695.913 ms

What I would like is the planner allways hit of_idx_modifier 

Thanks !

2015-07-09 22:49 GMT+02:00 Guillaume Lelarge <guillaume@lelarge.info>:
2015-07-09 22:34 GMT+02:00 Nicolas Paris <niparisco@gmail.com>:
Hello,

My 9.4 database is used as datawharehouse. I can't change the queries generated.

first index  : INDEX COL (A,B,C,D,E)


In case of query based on COL A,  the query planner sometimes go to a seq scan instead of using the first composite index.

The solution is to add a second indexe (redondant)
second index : INDEX COL (A)

In case of query based on COL A, B, C, D, (without E) as well, it doesn't uses the first index and prefers a seq scan.

I could create a third indexe :
first index  : INDEX COL (A,B,C,D) 

But I hope there is an other solution for that (table is huge).

It seems that the malus for using composite indexes is high.

Question is : is there a way to make the composite index more attractive to query planner ? (idealy equivalent to mono column indexe)


There's no way we can answer that without seeing actual queries and query plans.


--

Re: QUERY PLANNER - Indexe mono column VS composite Index

От
Jeff Janes
Дата:
On Fri, Jul 10, 2015 at 2:34 AM, Nicolas Paris <niparisco@gmail.com> wrote:


=========3: without a constraint on tval_char => seq scan========================================================================


 EXPLAIN ANALYSE select  f.patient_num  
from i2b2data_multi_nomi.observation_fact f 
where  
f.concept_cd IN (select concept_cd from  i2b2data_multi_nomi.concept_dimension   where concept_path LIKE '\\i2b2\\cim10\\A00-B99\\%')   
  AND  (  modifier_cd = '@'  AND  valtype_cd = 'T'  ) 
group by  f.patient_num ;


                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
HashAggregate  (cost=1305637.84..1305688.23 rows=5039 width=4) (actual time=22689.279..22694.583 rows=16865 loops=1)
   Group Key: f.patient_num
   ->  Hash Join  (cost=4760.13..1297561.67 rows=3230468 width=4) (actual time=12368.418..22674.145 rows=33835 loops=1)
         Hash Cond: ((f.concept_cd)::text = (concept_dimension.concept_cd)::text)
         ->  Seq Scan on observation_fact f  (cost=0.00..1280362.92 rows=3230468 width=14) (actual time=0.226..22004.808 rows=3195625 loops=1)
               Filter: (((modifier_cd)::text = '@'::text) AND ((valtype_cd)::text = 'T'::text))
               Rows Removed by Filter: 41423695
         ->  Hash  (cost=4748.64..4748.64 rows=919 width=10) (actual time=46.833..46.833 rows=925 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 39kB
               ->  HashAggregate  (cost=4739.45..4748.64 rows=919 width=10) (actual time=46.196..46.515 rows=925 loops=1)
                     Group Key: (concept_dimension.concept_cd)::text
                     ->  Seq Scan on concept_dimension  (cost=0.00..4734.73 rows=1892 width=10) (actual time=18.899..45.800 rows=925 loops=1)
                           Filter: ((concept_path)::text ~~ '\\i2b2\\cim10\\A00-B99\\%'::text)
                           Rows Removed by Filter: 186413
 Planning time: 1.940 ms
 Execution time: 22695.913 ms

What I would like is the planner allways hit of_idx_modifier 

What does the above explain analyze query give when you have an index on just modifier_cd, or maybe on both (modifier_cd, valtype_cd)?

Your original email said it uses the index in that case, but we would need to see the numbers in the query plan in order to figure out why it is doing that. 

It seems like that the "tval_char   IN ('DP')" part of the restriction is very selective, while the other two restrictions are not.

Cheers,

Jeff