Обсуждение: Seq scan over 3.3 millions of rows instead of using date and pattern indexes

Поиск
Список
Период
Сортировка

Seq scan over 3.3 millions of rows instead of using date and pattern indexes

От
"Andrus"
Дата:
explain analyze SELECT sum(1)
   FROM dok JOIN rid USING (dokumnr)
   WHERE dok.kuupaev>='2008-05-01'
 and
 ( (
      dok.doktyyp IN
('V','G','Y','K','I','T','D','N','H','M','E','B','A','R','C','F','J','Q')
          AND CASE WHEN NOT dok.objrealt OR dok.doktyyp='I' THEN dok.yksus
ELSE rid.kuluobjekt END LIKE 'AEGVIIDU%'
      )
     OR
     ( dok.doktyyp IN ('O','S','I','U','D','P')
          AND CASE WHEN dok.objrealt THEN rid.kuluobjekt ELSE dok.sihtyksus
END LIKE 'AEGVIIDU%'
      )
   )

"Aggregate  (cost=369240.67..369240.68 rows=1 width=0) (actual
time=41135.557..41135.560 rows=1 loops=1)"
"  ->  Hash Join  (cost=96614.24..369229.39 rows=4508 width=0) (actual
time=5859.704..40912.979 rows=59390 loops=1)"
"        Hash Cond: ("outer".dokumnr = "inner".dokumnr)"
"        Join Filter: (((("inner".doktyyp = 'V'::bpchar) OR ("inner".doktyyp
= 'G'::bpchar) OR ("inner".doktyyp = 'Y'::bpchar) OR ("inner".doktyyp =
'K'::bpchar) OR ("inner".doktyyp = 'I'::bpchar) OR ("inner".doktyyp =
'T'::bpchar) OR ("inner".doktyyp = 'D'::bpchar) OR ("inner".doktyyp =
'N'::bpchar) OR ("inner".doktyyp = 'H'::bpchar) OR ("inner".doktyyp =
'M'::bpchar) OR ("inner".doktyyp = 'E'::bpchar) OR ("inner".doktyyp =
'B'::bpchar) OR ("inner".doktyyp = 'A'::bpchar) OR ("inner".doktyyp =
'R'::bpchar) OR ("inner".doktyyp = 'C'::bpchar) OR ("inner".doktyyp =
'F'::bpchar) OR ("inner".doktyyp = 'J'::bpchar) OR ("inner".doktyyp =
'Q'::bpchar)) AND (CASE WHEN ((NOT ("inner".objrealt)::boolean) OR
("inner".doktyyp = 'I'::bpchar)) THEN "inner".yksus ELSE "outer".kuluobjekt
END ~~ 'AEGVIIDU%'::text)) OR ((("inner".doktyyp = 'O'::bpchar) OR
("inner".doktyyp = 'S'::bpchar) OR ("inner".doktyyp = 'I'::bpchar) OR
("inner".doktyyp = 'U'::bpchar) OR ("inner".doktyyp = 'D'::bpchar) OR
("inner".doktyyp = 'P'::bpchar)) AND (CASE WHEN ("inner".objrealt)::boolean
THEN "outer".kuluobjekt ELSE "inner".sihtyksus END ~~ 'AEGVIIDU%'::text)))"
"        ->  Seq Scan on rid  (cost=0.00..129911.53 rows=3299853 width=18)
(actual time=0.039..17277.888 rows=3299777 loops=1)"
"        ->  Hash  (cost=92983.97..92983.97 rows=336110 width=38) (actual
time=3965.478..3965.478 rows=337455 loops=1)"
"              ->  Bitmap Heap Scan on dok  (cost=1993.66..92983.97
rows=336110 width=38) (actual time=135.810..2389.703 rows=337455 loops=1)"
"                    Recheck Cond: (kuupaev >= '2008-05-01'::date)"
"                    Filter: ((doktyyp = 'V'::bpchar) OR (doktyyp =
'G'::bpchar) OR (doktyyp = 'Y'::bpchar) OR (doktyyp = 'K'::bpchar) OR
(doktyyp = 'I'::bpchar) OR (doktyyp = 'T'::bpchar) OR (doktyyp =
'D'::bpchar) OR (doktyyp = 'N'::bpchar) OR (doktyyp = 'H'::bpchar) OR
(doktyyp = 'M'::bpchar) OR (doktyyp = 'E'::bpchar) OR (doktyyp =
'B'::bpchar) OR (doktyyp = 'A'::bpchar) OR (doktyyp = 'R'::bpchar) OR
(doktyyp = 'C'::bpchar) OR (doktyyp = 'F'::bpchar) OR (doktyyp =
'J'::bpchar) OR (doktyyp = 'Q'::bpchar) OR (doktyyp = 'O'::bpchar) OR
(doktyyp = 'S'::bpchar) OR (doktyyp = 'I'::bpchar) OR (doktyyp =
'U'::bpchar) OR (doktyyp = 'D'::bpchar) OR (doktyyp = 'P'::bpchar))"
"                    ->  Bitmap Index Scan on dok_kuupaev_idx
(cost=0.00..1993.66 rows=347618 width=0) (actual time=97.881..97.881
rows=337770 loops=1)"
"                          Index Cond: (kuupaev >= '2008-05-01'::date)"
"Total runtime: 41136.348 ms"

8.1.4 Db is analyzed, default_statistics_target is 40.
PostgreSql still choices seq scan over rid.

This query can optimized as follows:

1. kuupaev >= '2008-05-01'  index can reduce number of scanned rows 10 times
(to 330000)
2. AEGVIIDU%  can reduce number of rows 6 times (to 60000)

How to force pg to use indexes for those conditions ?

This query can be executed against different shops groups (int this case
there is other value than  AEGVIIDU) and for different date.

There are 6 different shop groups containing roughly same number or records
each.
So using index on  AEGVIIDU% can decrease number of scanned rows 6 times.
Usually 90% of dok records contain 'Y' in dok.doktyyp column and
dok.objrealt is false for those records.

Is it possible to use come functional index or other method to speed it ?

rid.kuluobjekt, dok.yksus and dok.sihtyksus types are char(10).

There are indexes

dok(yksus bpchar_pattern_ops)
dok(sihtyksus bpchar_pattern_ops)

Is it possible to re-write query that it uses those indexes or create some
other indexes?
Using 8.1.4, us-en locale, utf-8 db encoding.
select column list is removed from sample.

Andrus.