Seq scan over 3.3 millions of rows instead of using date and pattern indexes

От: Andrus
Тема: Seq scan over 3.3 millions of rows instead of using date and pattern indexes
Дата: ,
Msg-id: ggusgk$1i7o$1@news.hub.org
(см: обсуждение, исходный текст)
Список: pgsql-performance

explain analyze SELECT sum(1)
   FROM dok JOIN rid USING (dokumnr)
   WHERE dok.kuupaev>='2008-05-01'
 and
 ( (
      dok.doktyyp IN
('V','G','Y','K','I','T','D','N','H','M','E','B','A','R','C','F','J','Q')
          AND CASE WHEN NOT dok.objrealt OR dok.doktyyp='I' THEN dok.yksus
ELSE rid.kuluobjekt END LIKE 'AEGVIIDU%'
      )
     OR
     ( dok.doktyyp IN ('O','S','I','U','D','P')
          AND CASE WHEN dok.objrealt THEN rid.kuluobjekt ELSE dok.sihtyksus
END LIKE 'AEGVIIDU%'
      )
   )

"Aggregate  (cost=369240.67..369240.68 rows=1 width=0) (actual
time=41135.557..41135.560 rows=1 loops=1)"
"  ->  Hash Join  (cost=96614.24..369229.39 rows=4508 width=0) (actual
time=5859.704..40912.979 rows=59390 loops=1)"
"        Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
"        Join Filter: (((("inner".doktyyp = 'V'::bpchar) OR ("inner".doktyyp
= 'G'::bpchar) OR ("inner".doktyyp = 'Y'::bpchar) OR ("inner".doktyyp =
'K'::bpchar) OR ("inner".doktyyp = 'I'::bpchar) OR ("inner".doktyyp =
'T'::bpchar) OR ("inner".doktyyp = 'D'::bpchar) OR ("inner".doktyyp =
'N'::bpchar) OR ("inner".doktyyp = 'H'::bpchar) OR ("inner".doktyyp =
'M'::bpchar) OR ("inner".doktyyp = 'E'::bpchar) OR ("inner".doktyyp =
'B'::bpchar) OR ("inner".doktyyp = 'A'::bpchar) OR ("inner".doktyyp =
'R'::bpchar) OR ("inner".doktyyp = 'C'::bpchar) OR ("inner".doktyyp =
'F'::bpchar) OR ("inner".doktyyp = 'J'::bpchar) OR ("inner".doktyyp =
'Q'::bpchar)) AND (CASE WHEN ((NOT ("inner".objrealt)::boolean) OR
("inner".doktyyp = 'I'::bpchar)) THEN "inner".yksus ELSE "outer".kuluobjekt
END ~~ 'AEGVIIDU%'::text)) OR ((("inner".doktyyp = 'O'::bpchar) OR
("inner".doktyyp = 'S'::bpchar) OR ("inner".doktyyp = 'I'::bpchar) OR
("inner".doktyyp = 'U'::bpchar) OR ("inner".doktyyp = 'D'::bpchar) OR
("inner".doktyyp = 'P'::bpchar)) AND (CASE WHEN ("inner".objrealt)::boolean
THEN "outer".kuluobjekt ELSE "inner".sihtyksus END ~~ 'AEGVIIDU%'::text)))"
"        ->  Seq Scan on rid  (cost=0.00..129911.53 rows=3299853 width=18)
(actual time=0.039..17277.888 rows=3299777 loops=1)"
"        ->  Hash  (cost=92983.97..92983.97 rows=336110 width=38) (actual
time=3965.478..3965.478 rows=337455 loops=1)"
"              ->  Bitmap Heap Scan on dok  (cost=1993.66..92983.97
rows=336110 width=38) (actual time=135.810..2389.703 rows=337455 loops=1)"
"                    Recheck Cond: (kuupaev >= '2008-05-01'::date)"
"                    Filter: ((doktyyp = 'V'::bpchar) OR (doktyyp =
'G'::bpchar) OR (doktyyp = 'Y'::bpchar) OR (doktyyp = 'K'::bpchar) OR
(doktyyp = 'I'::bpchar) OR (doktyyp = 'T'::bpchar) OR (doktyyp =
'D'::bpchar) OR (doktyyp = 'N'::bpchar) OR (doktyyp = 'H'::bpchar) OR
(doktyyp = 'M'::bpchar) OR (doktyyp = 'E'::bpchar) OR (doktyyp =
'B'::bpchar) OR (doktyyp = 'A'::bpchar) OR (doktyyp = 'R'::bpchar) OR
(doktyyp = 'C'::bpchar) OR (doktyyp = 'F'::bpchar) OR (doktyyp =
'J'::bpchar) OR (doktyyp = 'Q'::bpchar) OR (doktyyp = 'O'::bpchar) OR
(doktyyp = 'S'::bpchar) OR (doktyyp = 'I'::bpchar) OR (doktyyp =
'U'::bpchar) OR (doktyyp = 'D'::bpchar) OR (doktyyp = 'P'::bpchar))"
"                    ->  Bitmap Index Scan on dok_kuupaev_idx
(cost=0.00..1993.66 rows=347618 width=0) (actual time=97.881..97.881
rows=337770 loops=1)"
"                          Index Cond: (kuupaev >= '2008-05-01'::date)"
"Total runtime: 41136.348 ms"

8.1.4 Db is analyzed, default_statistics_target is 40.
PostgreSql still choices seq scan over rid.

This query can optimized as follows:

1. kuupaev >= '2008-05-01'  index can reduce number of scanned rows 10 times
(to 330000)
2. AEGVIIDU%  can reduce number of rows 6 times (to 60000)

How to force pg to use indexes for those conditions ?

This query can be executed against different shops groups (int this case
there is other value than  AEGVIIDU) and for different date.

There are 6 different shop groups containing roughly same number or records
each.
So using index on  AEGVIIDU% can decrease number of scanned rows 6 times.
Usually 90% of dok records contain 'Y' in dok.doktyyp column and
dok.objrealt is false for those records.

Is it possible to use come functional index or other method to speed it ?

rid.kuluobjekt, dok.yksus and dok.sihtyksus types are char(10).

There are indexes

dok(yksus bpchar_pattern_ops)
dok(sihtyksus bpchar_pattern_ops)

Is it possible to re-write query that it uses those indexes or create some
other indexes?
Using 8.1.4, us-en locale, utf-8 db encoding.
select column list is removed from sample.

Andrus.



В списке pgsql-performance по дате сообщения:

От: "Andrus"
Дата:
Сообщение: Seq scan over 3.3 millions of rows instead of using date and pattern indexes
От: PFC
Дата:
Сообщение: Re: Query optimization