Обсуждение: Seqscan on big table, when an Index-Usage should be possible

Поиск
Список
Период
Сортировка

Seqscan on big table, when an Index-Usage should be possible

От
Weinzierl Stefan
Дата:
Hello,

I'm currently testing some queries on data which I had imported from an
other database-system into Postgres 9.4.

After the import I did create the indexes, run an analyze and vacuum. I
also played a little bit with seq_page_cost and random_page_cost. But
currently I have no clue, which parameter I have to adjust, to get an
query-time like the example width 'enable_seqscan=off'.

Stefan



> pd=> set enable_seqscan=off;
> pd=> explain analyze select t.name from product p left join measurements m on p.productid=m.productid inner join
measurementstypet on m.measurementstypeid=t.measurementstypeid where p.timestamp between '2013-02-01 15:00:00' and
'2013-02-0521:30:00' group by t.name; 
>                                                                                      QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> HashAggregate  (cost=200380892.01..200380936.43 rows=4442 width=16) (actual time=34428.335..34428.693 rows=656
loops=1)
>    Group Key: t.name
>    ->  Hash Join  (cost=8995.44..200361772.19 rows=7647926 width=16) (actual time=103.670..30153.958 rows=5404751
loops=1)
>          Hash Cond: (m.measurementstypeid = t.measurementstypeid)
>          ->  Nested Loop  (cost=8279.61..200188978.03 rows=7647926 width=4) (actual time=75.939..22488.725
rows=5404751loops=1) 
>                ->  Bitmap Heap Scan on product p  (cost=8279.03..662659.76 rows=526094 width=8) (actual
time=75.903..326.850rows=368494 loops=1) 
>                      Recheck Cond: (("timestamp" >= '2013-02-01 15:00:00'::timestamp without time zone) AND
("timestamp"<= '2013-02-05 21:30:00'::timestamp without time zo 
>                      Heap Blocks: exact=3192
>                      ->  Bitmap Index Scan on product_timestamp  (cost=0.00..8147.51 rows=526094 width=0) (actual
time=75.050..75.050rows=368494 loops=1) 
>                            Index Cond: (("timestamp" >= '2013-02-01 15:00:00'::timestamp without time zone) AND
("timestamp"<= '2013-02-05 21:30:00'::timestamp without tim 
>                ->  Index Scan using measurements_productid on measurements m  (cost=0.58..347.12 rows=3214 width=12)
(actualtime=0.018..0.045 rows=15 loops=368494) 
>                      Index Cond: (productid = p.productid)
>          ->  Hash  (cost=508.91..508.91 rows=16554 width=20) (actual time=27.704..27.704 rows=16554 loops=1)
>                Buckets: 2048  Batches: 1  Memory Usage: 686kB
>                ->  Index Scan using measurementstype_pkey on measurementstype t  (cost=0.29..508.91 rows=16554
width=20)(actual time=0.017..15.719 rows=16554 loops=1) 
> Planning time: 2.176 ms
> Execution time: 34429.080 ms
> (17 Zeilen)
>
>
> Zeit: 34432,187 ms
> pd=> set enable_seqscan=on;
> SET
> Zeit: 0,193 ms
> pd=> explain analyze select t.name from product p left join measurements m on p.productid=m.productid inner join
measurementstypet on m.measurementstypeid=t.measurementstypeid where p.timestamp between '2013-02-01 15:00:00' and
'2013-02-0521:30:00' group by t.name; 
>                                                                                         QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> HashAggregate  (cost=108645282.49..108645326.91 rows=4442 width=16) (actual time=5145182.269..5145182.656 rows=656
loops=1)
>    Group Key: t.name
>    ->  Hash Join  (cost=671835.40..108626162.68 rows=7647926 width=16) (actual time=2087822.232..5141351.539
rows=5404751loops=1) 
>          Hash Cond: (m.measurementstypeid = t.measurementstypeid)
>          ->  Hash Join  (cost=671291.94..108453540.88 rows=7647926 width=4) (actual time=2087800.816..5134312.822
rows=5404751loops=1) 
>                Hash Cond: (m.productid = p.productid)
>                ->  Seq Scan on measurements m  (cost=0.00..49325940.08 rows=2742148608 width=12) (actual
time=0.007..2704591.045rows=2742146806 loops=1) 
>                ->  Hash  (cost=662659.76..662659.76 rows=526094 width=8) (actual time=552.480..552.480 rows=368494
loops=1)
>                      Buckets: 16384  Batches: 4  Memory Usage: 2528kB
>                      ->  Bitmap Heap Scan on product p  (cost=8279.03..662659.76 rows=526094 width=8) (actual
time=73.353..302.482rows=368494 loops=1) 
>                            Recheck Cond: (("timestamp" >= '2013-02-01 15:00:00'::timestamp without time zone) AND
("timestamp"<= '2013-02-05 21:30:00'::timestamp without t 
>                            Heap Blocks: exact=3192
>                            ->  Bitmap Index Scan on product_timestamp  (cost=0.00..8147.51 rows=526094 width=0)
(actualtime=72.490..72.490 rows=368494 loops=1) 
>                                  Index Cond: (("timestamp" >= '2013-02-01 15:00:00'::timestamp without time zone) AND
("timestamp"<= '2013-02-05 21:30:00'::timestamp witho 
>          ->  Hash  (cost=336.54..336.54 rows=16554 width=20) (actual time=21.377..21.377 rows=16554 loops=1)
>                Buckets: 2048  Batches: 1  Memory Usage: 686kB
>                ->  Seq Scan on measurementstype t  (cost=0.00..336.54 rows=16554 width=20) (actual time=0.008..9.849
rows=16554loops=1) 
> Planning time: 2.236 ms
> Execution time: 5145183.471 ms
> (19 Zeilen)
>
> Zeit: 5145186,786 ms


> pd=> \d measurements
>                                                 Tabelle „public.measurements“
>        Spalte       |             Typ             |                                 Attribute
>
--------------------+-----------------------------+---------------------------------------------------------------------------
>  measurementsid     | bigint                      | not null Vorgabewert
nextval('measurements_measurementsid_seq'::regclass)
>  value              | text                        | not null
>  lowerlimit         | text                        |
>  upperlimit         | text                        |
>  measurementstypeid | integer                     | not null
>  productid          | bigint                      | not null
>  timestamp          | timestamp without time zone |
>  state              | character varying(20)       | not null Vorgabewert 'Unknown'::character varying
> Indexe:
>     "measurements_pkey" PRIMARY KEY, btree (measurementsid)
>     "measurements_measurementstypeid" btree (measurementstypeid)
>     "measurements_productid" btree (productid)
>
> pd=> \d product
>                                              Tabelle „public.product“
>       Spalte      |             Typ             |                            Attribute
> ------------------+-----------------------------+-----------------------------------------------------------------
>  productid        | bigint                      | not null Vorgabewert nextval('product_productid_seq'::regclass)
>  ordermaterialsid | integer                     | not null
>  testerid         | integer                     |
>  equipmentid      | integer                     | not null
>  timestamp        | timestamp without time zone | not null
>  state            | character varying(20)       | not null Vorgabewert 'Unknown'::character varying
>  exported         | character varying(1)        | not null Vorgabewert 'N'::character varying
>  mc_selectionid   | integer                     |
> Indexe:
>     "product_pkey" PRIMARY KEY, btree (productid)
>     "product_equipmentid" btree (equipmentid)
>     "product_exported" btree (exported)
>     "product_mc_selectionid" btree (mc_selectionid)
>     "product_ordermaterialsid" btree (ordermaterialsid)
>     "product_state" btree (state)
>     "product_testerid" btree (testerid)
>     "product_timestamp" btree ("timestamp")
> Fremdschlüssel-Constraints:
>     "fk_equipmentid" FOREIGN KEY (equipmentid) REFERENCES equipment(equipmentid) ON UPDATE CASCADE ON DELETE RESTRICT
>     "fk_mc_selectionid" FOREIGN KEY (mc_selectionid) REFERENCES mc_selection(mc_selectionid) ON UPDATE CASCADE ON
DELETESET NULL 
>     "fk_ordermaterialsid" FOREIGN KEY (ordermaterialsid) REFERENCES ordermaterials(ordermaterialsid) ON UPDATE
CASCADEON DELETE RESTRICT 
>     "fk_testerid" FOREIGN KEY (testerid) REFERENCES tester(testerid) ON UPDATE CASCADE ON DELETE RESTRICT
>
> pd=> \d measurementstype
>                                                 Tabelle „public.measurementstype“
>        Spalte       |          Typ           |                                     Attribute
>
--------------------+------------------------+-----------------------------------------------------------------------------------
>  measurementstypeid | integer                | not null Vorgabewert
nextval('measurementstype_measurementstypeid_seq'::regclass)
>  datatype           | character varying(20)  | not null Vorgabewert 'char'::character varying
>  name               | character varying(255) | not null
>  description        | character varying(255) | Vorgabewert NULL::character varying
>  unit               | character varying(20)  | Vorgabewert NULL::character varying
>  step               | integer                |
>  stepdescription    | character varying(255) | Vorgabewert NULL::character varying
>  permissionlevel    | integer                | not null Vorgabewert 0
> Indexe:
>     "measurementstype_pkey" PRIMARY KEY, btree (measurementstypeid)
>     "measurementstype_datatype" btree (datatype)
>     "measurementstype_name" btree (name)
>     "measurementstype_step" btree (step)
>     "measurementstype_stepdescription" btree (stepdescription)
>



Re: Seqscan on big table, when an Index-Usage should be possible

От
Igor Neyman
Дата:
Stefan,

See below

> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-
> performance-owner@postgresql.org] On Behalf Of Weinzierl Stefan
> Sent: Thursday, June 05, 2014 3:36 PM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] Seqscan on big table, when an Index-Usage should be
> possible
>
> Hello,
>
> I'm currently testing some queries on data which I had imported from an
> other database-system into Postgres 9.4.
>
> After the import I did create the indexes, run an analyze and vacuum. I also
> played a little bit with seq_page_cost and random_page_cost. But currently I
> have no clue, which parameter I have to adjust, to get an query-time like the
> example width 'enable_seqscan=off'.
>
> Stefan
>
>
>
> > pd=> set enable_seqscan=off;
> > pd=> explain analyze select t.name from product p left join measurements
> m on p.productid=m.productid inner join measurementstype t on
> m.measurementstypeid=t.measurementstypeid where p.timestamp
> between '2013-02-01 15:00:00' and '2013-02-05 21:30:00' group by t.name;
> >
> > QUERY PLAN
> > ----------------------------------------------------------------------
> > ----------------------------------------------------------------------
> > -------------------------------- HashAggregate
> > (cost=200380892.01..200380936.43 rows=4442 width=16) (actual
> time=34428.335..34428.693 rows=656 loops=1)
> >    Group Key: t.name
> >    ->  Hash Join  (cost=8995.44..200361772.19 rows=7647926 width=16)
> (actual time=103.670..30153.958 rows=5404751 loops=1)
> >          Hash Cond: (m.measurementstypeid = t.measurementstypeid)
> >          ->  Nested Loop  (cost=8279.61..200188978.03 rows=7647926 width=4)
> (actual time=75.939..22488.725 rows=5404751 loops=1)
> >                ->  Bitmap Heap Scan on product p  (cost=8279.03..662659.76
> rows=526094 width=8) (actual time=75.903..326.850 rows=368494 loops=1)
> >                      Recheck Cond: (("timestamp" >= '2013-02-01
> 15:00:00'::timestamp without time zone) AND ("timestamp" <= '2013-02-05
> 21:30:00'::timestamp without time zo
> >                      Heap Blocks: exact=3192
> >                      ->  Bitmap Index Scan on product_timestamp
> (cost=0.00..8147.51 rows=526094 width=0) (actual time=75.050..75.050
> rows=368494 loops=1)
> >                            Index Cond: (("timestamp" >= '2013-02-01
> 15:00:00'::timestamp without time zone) AND ("timestamp" <= '2013-02-05
> 21:30:00'::timestamp without tim
> >                ->  Index Scan using measurements_productid on measurements m
> (cost=0.58..347.12 rows=3214 width=12) (actual time=0.018..0.045 rows=15
> loops=368494)
> >                      Index Cond: (productid = p.productid)
> >          ->  Hash  (cost=508.91..508.91 rows=16554 width=20) (actual
> time=27.704..27.704 rows=16554 loops=1)
> >                Buckets: 2048  Batches: 1  Memory Usage: 686kB
> >                ->  Index Scan using measurementstype_pkey on
> > measurementstype t  (cost=0.29..508.91 rows=16554 width=20) (actual
> > time=0.017..15.719 rows=16554 loops=1) Planning time: 2.176 ms
> > Execution time: 34429.080 ms
> > (17 Zeilen)
> >
> >
> > Zeit: 34432,187 ms
> > pd=> set enable_seqscan=on;
> > SET
> > Zeit: 0,193 ms
> > pd=> explain analyze select t.name from product p left join measurements
> m on p.productid=m.productid inner join measurementstype t on
> m.measurementstypeid=t.measurementstypeid where p.timestamp
> between '2013-02-01 15:00:00' and '2013-02-05 21:30:00' group by t.name;
> >
> > QUERY PLAN
> > ----------------------------------------------------------------------
> > ----------------------------------------------------------------------
> > -------------------------------- HashAggregate
> > (cost=108645282.49..108645326.91 rows=4442 width=16) (actual
> time=5145182.269..5145182.656 rows=656 loops=1)
> >    Group Key: t.name
> >    ->  Hash Join  (cost=671835.40..108626162.68 rows=7647926 width=16)
> (actual time=2087822.232..5141351.539 rows=5404751 loops=1)
> >          Hash Cond: (m.measurementstypeid = t.measurementstypeid)
> >          ->  Hash Join  (cost=671291.94..108453540.88 rows=7647926 width=4)
> (actual time=2087800.816..5134312.822 rows=5404751 loops=1)
> >                Hash Cond: (m.productid = p.productid)
> >                ->  Seq Scan on measurements m  (cost=0.00..49325940.08
> rows=2742148608 width=12) (actual time=0.007..2704591.045
> rows=2742146806 loops=1)
> >                ->  Hash  (cost=662659.76..662659.76 rows=526094 width=8) (actual
> time=552.480..552.480 rows=368494 loops=1)
> >                      Buckets: 16384  Batches: 4  Memory Usage: 2528kB
> >                      ->  Bitmap Heap Scan on product p  (cost=8279.03..662659.76
> rows=526094 width=8) (actual time=73.353..302.482 rows=368494 loops=1)
> >                            Recheck Cond: (("timestamp" >= '2013-02-01
> 15:00:00'::timestamp without time zone) AND ("timestamp" <= '2013-02-05
> 21:30:00'::timestamp without t
> >                            Heap Blocks: exact=3192
> >                            ->  Bitmap Index Scan on product_timestamp
> (cost=0.00..8147.51 rows=526094 width=0) (actual time=72.490..72.490
> rows=368494 loops=1)
> >                                  Index Cond: (("timestamp" >= '2013-02-01
> 15:00:00'::timestamp without time zone) AND ("timestamp" <= '2013-02-05
> 21:30:00'::timestamp witho
> >          ->  Hash  (cost=336.54..336.54 rows=16554 width=20) (actual
> time=21.377..21.377 rows=16554 loops=1)
> >                Buckets: 2048  Batches: 1  Memory Usage: 686kB
> >                ->  Seq Scan on measurementstype t  (cost=0.00..336.54
> > rows=16554 width=20) (actual time=0.008..9.849 rows=16554 loops=1)
> > Planning time: 2.236 ms Execution time: 5145183.471 ms
> > (19 Zeilen)
> >
> > Zeit: 5145186,786 ms
>
>
> > pd=> \d measurements
> >                                                 Tabelle "public.measurements"
> >        Spalte       |             Typ             |                                 Attribute
> > --------------------+-----------------------------+-------------------
> > --------------------+-----------------------------+-------------------
> > --------------------+-----------------------------+-------------------
> > --------------------+-----------------------------+------------------
> >  measurementsid     | bigint                      | not null Vorgabewert
> nextval('measurements_measurementsid_seq'::regclass)
> >  value              | text                        | not null
> >  lowerlimit         | text                        |
> >  upperlimit         | text                        |
> >  measurementstypeid | integer                     | not null
> >  productid          | bigint                      | not null
> >  timestamp          | timestamp without time zone |
> >  state              | character varying(20)       | not null Vorgabewert
> 'Unknown'::character varying
> > Indexe:
> >     "measurements_pkey" PRIMARY KEY, btree (measurementsid)
> >     "measurements_measurementstypeid" btree (measurementstypeid)
> >     "measurements_productid" btree (productid)
> >
> > pd=> \d product
> >                                              Tabelle "public.product"
> >       Spalte      |             Typ             |                            Attribute
> > ------------------+-----------------------------+---------------------
> > ------------------+-----------------------------+---------------------
> > ------------------+-----------------------------+---------------------
> > ------------------+-----------------------------+--
> >  productid        | bigint                      | not null Vorgabewert
> nextval('product_productid_seq'::regclass)
> >  ordermaterialsid | integer                     | not null
> >  testerid         | integer                     |
> >  equipmentid      | integer                     | not null
> >  timestamp        | timestamp without time zone | not null
> >  state            | character varying(20)       | not null Vorgabewert
> 'Unknown'::character varying
> >  exported         | character varying(1)        | not null Vorgabewert
> 'N'::character varying
> >  mc_selectionid   | integer                     |
> > Indexe:
> >     "product_pkey" PRIMARY KEY, btree (productid)
> >     "product_equipmentid" btree (equipmentid)
> >     "product_exported" btree (exported)
> >     "product_mc_selectionid" btree (mc_selectionid)
> >     "product_ordermaterialsid" btree (ordermaterialsid)
> >     "product_state" btree (state)
> >     "product_testerid" btree (testerid)
> >     "product_timestamp" btree ("timestamp")
> > Fremdschlüssel-Constraints:
> >     "fk_equipmentid" FOREIGN KEY (equipmentid) REFERENCES
> equipment(equipmentid) ON UPDATE CASCADE ON DELETE RESTRICT
> >     "fk_mc_selectionid" FOREIGN KEY (mc_selectionid) REFERENCES
> mc_selection(mc_selectionid) ON UPDATE CASCADE ON DELETE SET NULL
> >     "fk_ordermaterialsid" FOREIGN KEY (ordermaterialsid) REFERENCES
> ordermaterials(ordermaterialsid) ON UPDATE CASCADE ON DELETE RESTRICT
> >     "fk_testerid" FOREIGN KEY (testerid) REFERENCES tester(testerid)
> > ON UPDATE CASCADE ON DELETE RESTRICT
> >
> > pd=> \d measurementstype
> >                                                 Tabelle "public.measurementstype"
> >        Spalte       |          Typ           |                                     Attribute
> > --------------------+------------------------+------------------------
> > --------------------+------------------------+------------------------
> > --------------------+------------------------+------------------------
> > --------------------+------------------------+-----------
> >  measurementstypeid | integer                | not null Vorgabewert
> nextval('measurementstype_measurementstypeid_seq'::regclass)
> >  datatype           | character varying(20)  | not null Vorgabewert
> 'char'::character varying
> >  name               | character varying(255) | not null
> >  description        | character varying(255) | Vorgabewert NULL::character
> varying
> >  unit               | character varying(20)  | Vorgabewert NULL::character varying
> >  step               | integer                |
> >  stepdescription    | character varying(255) | Vorgabewert NULL::character
> varying
> >  permissionlevel    | integer                | not null Vorgabewert 0
> > Indexe:
> >     "measurementstype_pkey" PRIMARY KEY, btree (measurementstypeid)
> >     "measurementstype_datatype" btree (datatype)
> >     "measurementstype_name" btree (name)
> >     "measurementstype_step" btree (step)
> >     "measurementstype_stepdescription" btree (stepdescription)
> >
>

You don't tell:
    - what kind of hardware (specifically, how much RAM) you are using
    - what are your config settings: shared_buffers, work_mem, effective_cache_size

All this affects planner decisions, when choosing one (or another) execution path/plan.

Regards,
Igor Neyman