Обсуждение: PG using index+filter instead only use index

Поиск
Список
Период
Сортировка

PG using index+filter instead only use index

От
Alexandre de Arruda Paes
Дата:
Hi,

PostgreSQL 8.4.2 / default_statistics_target = 300

I have a strange problem for a bad choose of indexes.

client=# \d ct13t
         Table "public.ct13t"
   Column   |     Type     | Modifiers
------------+--------------+-----------
 ct12emp04  | integer      | not null
 ct03emp01  | integer      | not null
 ct03tradut | integer      | not null
 ct07emp01  | integer      | not null
 ct07c_cust | integer      | not null
 ct13dtlanc | date         | not null
 ct12numlot | integer      | not null
 ct12numlan | integer      | not null
 ct13emptr1 | integer      |
 ct13tradu1 | integer      |
 ct13empcc1 | integer      |
 ct13ccust1 | integer      |
 ct13duoc   | character(1) |
Indexes:
    "ct13t_pkey" PRIMARY KEY, btree (ct12emp04, ct03emp01, ct03tradut,
ct07emp01, ct07c_cust, ct13dtlanc, ct12numlot, ct12numlan) CLUSTER
    "ict13t1" btree (ct12emp04, ct12numlot, ct12numlan)
    "ict13t2" btree (ct07emp01, ct07c_cust)
    "ict13t3" btree (ct13empcc1, ct13ccust1)
    "ict13t4" btree (ct03emp01, ct03tradut)
    "ict13t5" btree (ct13emptr1, ct13tradu1)
    "uct13t" btree (ct12emp04, ct13dtlanc)


client=# explain analyze SELECT ct12emp04, ct03emp01, ct03tradut,
ct07emp01, ct07c_cust, ct13dtlanc, ct12numlot, ct12numlan FROM CT13T
WHERE ct12emp04 = '2' AND ct03emp01 = '2' AND ct03tradut = '60008' AND
ct07emp01 = '2' AND ct07c_cust = '0' AND ct13dtlanc =
'2005-01-28'::date AND ct12numlot = '82050128' AND ct12numlan = '123';

     QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using ict13t2 on ct13t  (cost=0.00..5.69 rows=1 width=32)
(actual time=288.687..288.687 rows=0 loops=1)
   Index Cond: ((ct07emp01 = 2) AND (ct07c_cust = 0))
   Filter: ((ct12emp04 = 2) AND (ct03emp01 = 2) AND (ct03tradut =
60008) AND (ct13dtlanc = '2005-01-28'::date) AND (ct12numlot =
82050128) AND (ct12numlan = 123))
 Total runtime: 288.735 ms
(4 rows)

client=# create table ad_ct13t as select * from ct13t;
SELECT
client=# alter table ad_ct13t add primary key (ct12emp04, ct03emp01,
ct03tradut, ct07emp01, ct07c_cust, ct13dtlanc, ct12numlot,
ct12numlan);
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
"ad_ct13t_pkey" for table "ad_ct13t"
ALTER TABLE
client=# explain analyze SELECT ct12emp04, ct03emp01, ct03tradut,
ct07emp01, ct07c_cust, ct13dtlanc, ct12numlot, ct12numlan FROM
AD_CT13T WHERE ct12emp04 = '2' AND ct03emp01 = '2' AND ct03tradut =
'60008' AND ct07emp01 = '2' AND ct07c_cust = '0' AND ct13dtlanc =
'2005-01-28'::date AND ct12numlot = '82050128' AND ct12numlan = '123';

                            QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-
 Index Scan using ad_ct13t_pkey on ad_ct13t  (cost=0.00..5.66 rows=1
width=32) (actual time=0.090..0.090 rows=0 loops=1)
   Index Cond: ((ct12emp04 = 2) AND (ct03emp01 = 2) AND (ct03tradut =
60008) AND (ct07emp01 = 2) AND (ct07c_cust = 0) AND (ct13dtlanc =
'2005-01-28'::date) AND (ct12numlot = 82050128) AND (ct12numlan =
123))
 Total runtime: 0.146 ms
(3 rows)

My question: if the cost is exactly the same, why PG choose the index
ict13t2 on ct13t and apply a filter instead use the primary key ?
In one query, it's ok. But this routine execute millions times this query.

Thanks for any help,

Alexandre

Re: PG using index+filter instead only use index

От
Tom Lane
Дата:
Alexandre de Arruda Paes <adaldeia@gmail.com> writes:
> My question: if the cost is exactly the same, why PG choose the index
> ict13t2 on ct13t and apply a filter instead use the primary key ?

Why shouldn't it, if the estimated costs are the same?  You didn't
actually demonstrate they're the same though.

The cost estimates look a bit unusual to me; are you using nondefault
cost parameters, and if so what are they?

            regards, tom lane

Re: PG using index+filter instead only use index

От
Alexandre de Arruda Paes
Дата:
Hi Tom,

2010/3/19 Tom Lane <tgl@sss.pgh.pa.us>:
> Alexandre de Arruda Paes <adaldeia@gmail.com> writes:
>> My question: if the cost is exactly the same, why PG choose the index
>> ict13t2 on ct13t and apply a filter instead use the primary key ?
>
> Why shouldn't it, if the estimated costs are the same?  You didn't
> actually demonstrate they're the same though.
>
> The cost estimates look a bit unusual to me; are you using nondefault
> cost parameters, and if so what are they?
>
>                        regards, tom lane
>

The non default value in cost parameters is different only in
random_page_cost that are set to 2.5 and default_statistics_target set
to 300.
I set this parameters to defaults (4 and 100) and re-analyze the
tables but results are the same.

Some more info on another table with the same behavior (ANALYZE ok in
all tables):

client=# \d ct14t
          Table "public.ct14t"
   Column   |     Type      | Modifiers
------------+---------------+-----------
 ct14emp04  | integer       | not null
 ct03emp01  | integer       | not null
 ct03tradut | integer       | not null
 ct07emp01  | integer       | not null
 ct07c_cust | integer       | not null
 ct14ano    | integer       | not null
 ct14mes    | integer       | not null
 ct14debito | numeric(14,2) |
 ct14credit | numeric(14,2) |
 ct14orcado | numeric(14,2) |
Indexes:
    "ct14t_pkey" PRIMARY KEY, btree (ct14emp04, ct03emp01, ct03tradut,
ct07emp01, ct07c_cust, ct14ano, ct14mes) CLUSTER
    "ad_ict14t" btree (ct14emp04, ct03emp01, ct03tradut, ct07emp01,
ct07c_cust, ct14ano, ct14mes) WHERE ct14emp04 = 2 AND ct03emp01 = 2
AND ct07emp01 = 2
    "ict14t1" btree (ct07emp01, ct07c_cust)
    "ict14t2" btree (ct03emp01, ct03tradut)

client=# select ct07c_cust,count(*) from ct14t group by ct07c_cust
order by count(*) DESC;
 ct07c_cust | count
------------+-------
          0 | 55536
         99 | 14901
        107 |  3094
        800 |  1938
(...)


If I use any different value from '0' in the ct07c_cust field, the
planner choose the 'right' index:

client=# explain analyze SELECT ct14mes, ct14ano, ct07c_cust,
ct07emp01, ct03tradut, ct03emp01, ct14emp04, ct14debito, ct14credit
FROM ad_CT14T WHERE ct14emp04 = '2' AND ct03emp01 = '2' AND ct03tradut
= '14930' AND ct07emp01 = '2' AND ct07c_cust = '99' AND ct14ano =
'2003' AND ct14mes = '4';

   QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using ad_ict14t_1 on ad_ct14t  (cost=0.00..5.28 rows=1
width=42) (actual time=5.504..5.504 rows=0 loops=1)
   Index Cond: ((ct14emp04 = 2) AND (ct03emp01 = 2) AND (ct03tradut =
14930) AND (ct07emp01 = 2) AND (ct07c_cust = 99) AND (ct14ano = 2003)
AND (ct14mes = 4))
 Total runtime: 5.548 ms
(3 rows)



With '0' in the ct07c_cust field, they choose a more slow way:


client=# explain analyze SELECT ct14mes, ct14ano, ct07c_cust,
ct07emp01, ct03tradut, ct03emp01, ct14emp04, ct14debito, ct14credit
FROM CT14T WHERE ct14emp04 = '2' AND ct03emp01 = '2' AND ct03tradut =
'57393' AND ct07emp01 = '2' AND ct07c_cust = '0' AND ct14ano = '2002'
AND ct14mes = '5';
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Index Scan using ict14t1 on ct14t  (cost=0.00..5.32 rows=1 width=42)
(actual time=211.007..211.007 rows=0 loops=1)
   Index Cond: ((ct07emp01 = 2) AND (ct07c_cust = 0))
   Filter: ((ct14emp04 = 2) AND (ct03emp01 = 2) AND (ct03tradut =
57393) AND (ct14ano = 2002) AND (ct14mes = 5))
 Total runtime: 211.062 ms
(4 rows)


Again, if I create a table for test from this table (AD_CT14T) and
only create the index used in the first query plan, the results are ok
(ct07c_cust=0 / same query above):

client=# create table ad_ct14t as select * from ct14t;
SELECT
client=# create index ad_ict14t_abc on ad_ct14t(ct14emp04, ct03emp01,
ct03tradut, ct07emp01, ct07c_cust, ct14ano, ct14mes) where ct14emp04 =
'2' AND ct03emp01 = '2' AND ct07emp01 = '2';
CREATE
client=# explain analyze SELECT ct14mes, ct14ano, ct07c_cust,
ct07emp01, ct03tradut, ct03emp01, ct14emp04, ct14debito, ct14credit
FROM AD_CT14T WHERE ct14emp04 = '2' AND ct03emp01 = '2' AND ct03tradut
= '57393' AND ct07emp01 = '2' AND ct07c_cust = '0' AND ct14ano =
'2002' AND ct14mes = '5';

   QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using ad_ict14t_abc on ad_ct14t  (cost=0.00..5.28 rows=1
width=42) (actual time=0.043..0.043 rows=0 loops=1)
   Index Cond: ((ct14emp04 = 2) AND (ct03emp01 = 2) AND (ct03tradut =
57393) AND (ct07emp01 = 2) AND (ct07c_cust = 0) AND (ct14ano = 2002)
AND (ct14mes = 5))
 Total runtime: 0.091 ms
(3 rows)



I don't know why the planner prefer to use a less specific index
(ict14t1) and do a filter than use an index that matches with the
WHERE parameter...

Best regards,

Alexandre

Re: PG using index+filter instead only use index

От
Tom Lane
Дата:
Alexandre de Arruda Paes <adaldeia@gmail.com> writes:
> 2010/3/19 Tom Lane <tgl@sss.pgh.pa.us>:
>> The cost estimates look a bit unusual to me; are you using nondefault
>> cost parameters, and if so what are they?

> The non default value in cost parameters is different only in
> random_page_cost that are set to 2.5 and default_statistics_target set
> to 300.

Okay, so with random_page_cost = 2.5, those cost estimates definitely
indicate that it's expecting only one heap tuple to be visited, for
either choice of index.

> I don't know why the planner prefer to use a less specific index
> (ict14t1) and do a filter than use an index that matches with the
> WHERE parameter...

The cost estimate formulas bias the system against using a larger index
when a smaller one will do.  That seven-column index is probably at
least twice as large as the two-column index, so it's hardly
unreasonable to assume that scanning it will take more I/O and cache
space and CPU time than using a smaller index, if all else is equal.
Now of course all else is not equal if the smaller index is less
selective than the larger one, but the cost estimates indicate that the
planner thinks the two-column index condition is sufficient to narrow
things down to only one heap tuple anyway.

The fact that the smaller index is actually slower indicates that this
estimate is off, ie (ct07emp01 = 2) AND (ct07c_cust = 0) actually
selects more than one heap tuple.  It's hard to speculate about why that
estimate is wrong on the basis of the information you've shown us
though.  Perhaps there is a strong correlation between the values of
those two columns?

            regards, tom lane