Обсуждение: PG using index+filter instead only use index
Hi, PostgreSQL 8.4.2 / default_statistics_target = 300 I have a strange problem for a bad choose of indexes. client=# \d ct13t Table "public.ct13t" Column | Type | Modifiers ------------+--------------+----------- ct12emp04 | integer | not null ct03emp01 | integer | not null ct03tradut | integer | not null ct07emp01 | integer | not null ct07c_cust | integer | not null ct13dtlanc | date | not null ct12numlot | integer | not null ct12numlan | integer | not null ct13emptr1 | integer | ct13tradu1 | integer | ct13empcc1 | integer | ct13ccust1 | integer | ct13duoc | character(1) | Indexes: "ct13t_pkey" PRIMARY KEY, btree (ct12emp04, ct03emp01, ct03tradut, ct07emp01, ct07c_cust, ct13dtlanc, ct12numlot, ct12numlan) CLUSTER "ict13t1" btree (ct12emp04, ct12numlot, ct12numlan) "ict13t2" btree (ct07emp01, ct07c_cust) "ict13t3" btree (ct13empcc1, ct13ccust1) "ict13t4" btree (ct03emp01, ct03tradut) "ict13t5" btree (ct13emptr1, ct13tradu1) "uct13t" btree (ct12emp04, ct13dtlanc) client=# explain analyze SELECT ct12emp04, ct03emp01, ct03tradut, ct07emp01, ct07c_cust, ct13dtlanc, ct12numlot, ct12numlan FROM CT13T WHERE ct12emp04 = '2' AND ct03emp01 = '2' AND ct03tradut = '60008' AND ct07emp01 = '2' AND ct07c_cust = '0' AND ct13dtlanc = '2005-01-28'::date AND ct12numlot = '82050128' AND ct12numlan = '123'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using ict13t2 on ct13t (cost=0.00..5.69 rows=1 width=32) (actual time=288.687..288.687 rows=0 loops=1) Index Cond: ((ct07emp01 = 2) AND (ct07c_cust = 0)) Filter: ((ct12emp04 = 2) AND (ct03emp01 = 2) AND (ct03tradut = 60008) AND (ct13dtlanc = '2005-01-28'::date) AND (ct12numlot = 82050128) AND (ct12numlan = 123)) Total runtime: 288.735 ms (4 rows) client=# create table ad_ct13t as select * from ct13t; SELECT client=# alter table ad_ct13t add primary key (ct12emp04, ct03emp01, ct03tradut, ct07emp01, ct07c_cust, ct13dtlanc, ct12numlot, ct12numlan); NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "ad_ct13t_pkey" for table "ad_ct13t" ALTER TABLE client=# explain analyze SELECT ct12emp04, ct03emp01, ct03tradut, ct07emp01, ct07c_cust, ct13dtlanc, ct12numlot, ct12numlan FROM AD_CT13T WHERE ct12emp04 = '2' AND ct03emp01 = '2' AND ct03tradut = '60008' AND ct07emp01 = '2' AND ct07c_cust = '0' AND ct13dtlanc = '2005-01-28'::date AND ct12numlot = '82050128' AND ct12numlan = '123'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- - Index Scan using ad_ct13t_pkey on ad_ct13t (cost=0.00..5.66 rows=1 width=32) (actual time=0.090..0.090 rows=0 loops=1) Index Cond: ((ct12emp04 = 2) AND (ct03emp01 = 2) AND (ct03tradut = 60008) AND (ct07emp01 = 2) AND (ct07c_cust = 0) AND (ct13dtlanc = '2005-01-28'::date) AND (ct12numlot = 82050128) AND (ct12numlan = 123)) Total runtime: 0.146 ms (3 rows) My question: if the cost is exactly the same, why PG choose the index ict13t2 on ct13t and apply a filter instead use the primary key ? In one query, it's ok. But this routine execute millions times this query. Thanks for any help, Alexandre
Alexandre de Arruda Paes <adaldeia@gmail.com> writes: > My question: if the cost is exactly the same, why PG choose the index > ict13t2 on ct13t and apply a filter instead use the primary key ? Why shouldn't it, if the estimated costs are the same? You didn't actually demonstrate they're the same though. The cost estimates look a bit unusual to me; are you using nondefault cost parameters, and if so what are they? regards, tom lane
Hi Tom, 2010/3/19 Tom Lane <tgl@sss.pgh.pa.us>: > Alexandre de Arruda Paes <adaldeia@gmail.com> writes: >> My question: if the cost is exactly the same, why PG choose the index >> ict13t2 on ct13t and apply a filter instead use the primary key ? > > Why shouldn't it, if the estimated costs are the same? You didn't > actually demonstrate they're the same though. > > The cost estimates look a bit unusual to me; are you using nondefault > cost parameters, and if so what are they? > > regards, tom lane > The non default value in cost parameters is different only in random_page_cost that are set to 2.5 and default_statistics_target set to 300. I set this parameters to defaults (4 and 100) and re-analyze the tables but results are the same. Some more info on another table with the same behavior (ANALYZE ok in all tables): client=# \d ct14t Table "public.ct14t" Column | Type | Modifiers ------------+---------------+----------- ct14emp04 | integer | not null ct03emp01 | integer | not null ct03tradut | integer | not null ct07emp01 | integer | not null ct07c_cust | integer | not null ct14ano | integer | not null ct14mes | integer | not null ct14debito | numeric(14,2) | ct14credit | numeric(14,2) | ct14orcado | numeric(14,2) | Indexes: "ct14t_pkey" PRIMARY KEY, btree (ct14emp04, ct03emp01, ct03tradut, ct07emp01, ct07c_cust, ct14ano, ct14mes) CLUSTER "ad_ict14t" btree (ct14emp04, ct03emp01, ct03tradut, ct07emp01, ct07c_cust, ct14ano, ct14mes) WHERE ct14emp04 = 2 AND ct03emp01 = 2 AND ct07emp01 = 2 "ict14t1" btree (ct07emp01, ct07c_cust) "ict14t2" btree (ct03emp01, ct03tradut) client=# select ct07c_cust,count(*) from ct14t group by ct07c_cust order by count(*) DESC; ct07c_cust | count ------------+------- 0 | 55536 99 | 14901 107 | 3094 800 | 1938 (...) If I use any different value from '0' in the ct07c_cust field, the planner choose the 'right' index: client=# explain analyze SELECT ct14mes, ct14ano, ct07c_cust, ct07emp01, ct03tradut, ct03emp01, ct14emp04, ct14debito, ct14credit FROM ad_CT14T WHERE ct14emp04 = '2' AND ct03emp01 = '2' AND ct03tradut = '14930' AND ct07emp01 = '2' AND ct07c_cust = '99' AND ct14ano = '2003' AND ct14mes = '4'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using ad_ict14t_1 on ad_ct14t (cost=0.00..5.28 rows=1 width=42) (actual time=5.504..5.504 rows=0 loops=1) Index Cond: ((ct14emp04 = 2) AND (ct03emp01 = 2) AND (ct03tradut = 14930) AND (ct07emp01 = 2) AND (ct07c_cust = 99) AND (ct14ano = 2003) AND (ct14mes = 4)) Total runtime: 5.548 ms (3 rows) With '0' in the ct07c_cust field, they choose a more slow way: client=# explain analyze SELECT ct14mes, ct14ano, ct07c_cust, ct07emp01, ct03tradut, ct03emp01, ct14emp04, ct14debito, ct14credit FROM CT14T WHERE ct14emp04 = '2' AND ct03emp01 = '2' AND ct03tradut = '57393' AND ct07emp01 = '2' AND ct07c_cust = '0' AND ct14ano = '2002' AND ct14mes = '5'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Index Scan using ict14t1 on ct14t (cost=0.00..5.32 rows=1 width=42) (actual time=211.007..211.007 rows=0 loops=1) Index Cond: ((ct07emp01 = 2) AND (ct07c_cust = 0)) Filter: ((ct14emp04 = 2) AND (ct03emp01 = 2) AND (ct03tradut = 57393) AND (ct14ano = 2002) AND (ct14mes = 5)) Total runtime: 211.062 ms (4 rows) Again, if I create a table for test from this table (AD_CT14T) and only create the index used in the first query plan, the results are ok (ct07c_cust=0 / same query above): client=# create table ad_ct14t as select * from ct14t; SELECT client=# create index ad_ict14t_abc on ad_ct14t(ct14emp04, ct03emp01, ct03tradut, ct07emp01, ct07c_cust, ct14ano, ct14mes) where ct14emp04 = '2' AND ct03emp01 = '2' AND ct07emp01 = '2'; CREATE client=# explain analyze SELECT ct14mes, ct14ano, ct07c_cust, ct07emp01, ct03tradut, ct03emp01, ct14emp04, ct14debito, ct14credit FROM AD_CT14T WHERE ct14emp04 = '2' AND ct03emp01 = '2' AND ct03tradut = '57393' AND ct07emp01 = '2' AND ct07c_cust = '0' AND ct14ano = '2002' AND ct14mes = '5'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using ad_ict14t_abc on ad_ct14t (cost=0.00..5.28 rows=1 width=42) (actual time=0.043..0.043 rows=0 loops=1) Index Cond: ((ct14emp04 = 2) AND (ct03emp01 = 2) AND (ct03tradut = 57393) AND (ct07emp01 = 2) AND (ct07c_cust = 0) AND (ct14ano = 2002) AND (ct14mes = 5)) Total runtime: 0.091 ms (3 rows) I don't know why the planner prefer to use a less specific index (ict14t1) and do a filter than use an index that matches with the WHERE parameter... Best regards, Alexandre
Alexandre de Arruda Paes <adaldeia@gmail.com> writes: > 2010/3/19 Tom Lane <tgl@sss.pgh.pa.us>: >> The cost estimates look a bit unusual to me; are you using nondefault >> cost parameters, and if so what are they? > The non default value in cost parameters is different only in > random_page_cost that are set to 2.5 and default_statistics_target set > to 300. Okay, so with random_page_cost = 2.5, those cost estimates definitely indicate that it's expecting only one heap tuple to be visited, for either choice of index. > I don't know why the planner prefer to use a less specific index > (ict14t1) and do a filter than use an index that matches with the > WHERE parameter... The cost estimate formulas bias the system against using a larger index when a smaller one will do. That seven-column index is probably at least twice as large as the two-column index, so it's hardly unreasonable to assume that scanning it will take more I/O and cache space and CPU time than using a smaller index, if all else is equal. Now of course all else is not equal if the smaller index is less selective than the larger one, but the cost estimates indicate that the planner thinks the two-column index condition is sufficient to narrow things down to only one heap tuple anyway. The fact that the smaller index is actually slower indicates that this estimate is off, ie (ct07emp01 = 2) AND (ct07c_cust = 0) actually selects more than one heap tuple. It's hard to speculate about why that estimate is wrong on the basis of the information you've shown us though. Perhaps there is a strong correlation between the values of those two columns? regards, tom lane