Обсуждение: Instances where enable_seqscan = false is good

Поиск
Список
Период
Сортировка

Instances where enable_seqscan = false is good

От
Ow Mun Heng
Дата:
Same query, executed twice, once using seqscan enabled and the other
with it disabled. Difference is nearly night and day.


How can I persuade PG to use the index w/o resorting to setting seqscan
= false

(actually, I don't know what are the pro or cons - I read posts from the
archives far back as 2005 but that was dealing with inconsistencies in
the text string eg: warwa(s/z/etc..) which caused it to pick seq scans.)

PPl in IRC suggested setting default_statistics = 100 but I didn't find
that useful nor helpful. Also, tables has been vacuum and analysed.

Using Seq-scans
---------------


QUERY PLAN
--------------------------------------------------------------------------------------------------
 Limit  (cost=4430.53..50173.70 rows=1000 width=47) (actual
time=21832.092..43771.536 rows=228 loops=1)
   ->  Hash Join  (cost=4430.53..260866.77 rows=5606 width=47) (actual
time=21832.088..43770.927 rows=228 loops=1)
         Hash Cond: ((trz.number)::text = (drv.number)::text)
         ->  Seq Scan on zone trz  (cost=0.00..233254.27 rows=6148222
width=39) (actual time=22.807..31891.591 rows=6181910 loops=1)
               Filter: ((zn_2 <> 0::numeric) AND (zn_2 IS NOT NULL))
         ->  Hash  (cost=4414.39..4414.39 rows=1291 width=24) (actual
time=171.911..171.911 rows=12591 loops=1)
               ->  Bitmap Heap Scan on drv  (cost=30.44..4414.39
rows=1291 width=24) (actual time=62.980..142.594 rows=12591 loops=1)
                     Recheck Cond: ((code)::text = 'NRN15'::text)
                     ->  Bitmap Index Scan on idx_drv  (cost=0.00..30.11
rows=1291 width=0) (actual time=62.199..62.199 rows=12649 loops=1)
                           Index Cond: ((code)::text = 'NRN15'::text)
 Total runtime: 43772.045 ms
(11 rows)


set enable_seqscan = false;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------
--
 Limit  (cost=0.00..69314.54 rows=1000 width=47) (actual
time=122.920..553.538 rows=228 loops=1)
   ->  Nested Loop  (cost=0.00..388646.63 rows=5607 width=47) (actual
time=122.915..552.956 rows=228 loops=1)
         ->  Index Scan using idx_drv on drv  (cost=0.00..5077.64
rows=1293 width=24) (actual time=38.164..110.933 rows=12591 loops=1)
               Index Cond: ((code)::text = 'NRN15'::text)
         ->  Index Scan using idx_trz_sn on zone trz  (cost=0.00..295.10
rows=120 width=39) (actual time=0.021..0.021 rows=0 loops=12591)
               Index Cond: ((drv.number)::text = (trz.number)::text)
               Filter: ((zn_2 <> 0::numeric) AND (zn_2 IS NOT NULL))
 Total runtime: 553.964 ms
(8 rows)


Вложения

Re: Instances where enable_seqscan = false is good

От
Gregory Stark
Дата:
"Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes:

> Same query, executed twice, once using seqscan enabled and the other
> with it disabled. Difference is nearly night and day.
>
>
> How can I persuade PG to use the index w/o resorting to setting seqscan
> = false

The usual knob to fiddle with is random_page_cost. If your database fits
mostly in memory you may want to turn it down from the default of 4 to
something closer to 1. Perhaps 2 or even 1.5 or so. But don't do it based on a
single query under testing conditions, use a wide variety of queries under
production conditions.

> QUERY PLAN
> --------------------------------------------------------------------------------------------------
>  Limit  (cost=4430.53..50173.70 rows=1000 width=47) (actual time=21832.092..43771.536 rows=228 loops=1)
>    ->  Hash Join  (cost=4430.53..260866.77 rows=5606 width=47) (actual time=21832.088..43770.927 rows=228 loops=1)

The difference between the predicted and actual rows is suspicious. let's look
lower down to see where it comes from.

>          Hash Cond: ((trz.number)::text = (drv.number)::text)
>          ->  Seq Scan on zone trz  (cost=0.00..233254.27 rows=6148222 width=39) (actual time=22.807..31891.591
rows=6181910loops=1) 
>                Filter: ((zn_2 <> 0::numeric) AND (zn_2 IS NOT NULL))

This part looks ok 615k versus 618k is pretty good.

>          ->  Hash  (cost=4414.39..4414.39 rows=1291 width=24) (actual time=171.911..171.911 rows=12591 loops=1)

Ah, this is off by an order of magnitude, that's bad.

>                ->  Bitmap Heap Scan on drv  (cost=30.44..4414.39 rows=1291 width=24) (actual time=62.980..142.594
rows=12591loops=1) 
>                      Recheck Cond: ((code)::text = 'NRN15'::text)
>                      ->  Bitmap Index Scan on idx_drv  (cost=0.00..30.11 rows=1291 width=0) (actual
time=62.199..62.199rows=12649 loops=1) 
>                            Index Cond: ((code)::text = 'NRN15'::text)

So you might want to increase the statistics target for the "code" column.

Incidentally the way this is written makes me wonder what data type "code" is
defined as.


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: Instances where enable_seqscan = false is good

От
Ow Mun Heng
Дата:
On Mon, 2007-09-03 at 11:31 +0100, Gregory Stark wrote:
> "Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes:
> >
> > How can I persuade PG to use the index w/o resorting to setting seqscan
> > = false
>
> The usual knob to fiddle with is random_page_cost. If your database fits
> mostly in memory you may want to turn it down from the default of 4 to
> something closer to 1.

I tried down to 0.4 before it resorted to using the index. The DB
shouldn't fit into memory (I think) that table alone has ~8million rows
at ~1.5G size

> >          ->  Hash  (cost=4414.39..4414.39 rows=1291 width=24) (actual time=171.911..171.911 rows=12591 loops=1)
>
> Ah, this is off by an order of magnitude, that's bad.
>

having said so, still don't understand why..

> >                ->  Bitmap Heap Scan on drv  (cost=30.44..4414.39 rows=1291 width=24) (actual time=62.980..142.594
rows=12591loops=1) 
> >                      Recheck Cond: ((code)::text = 'NRN15'::text)
> >                      ->  Bitmap Index Scan on idx_drv  (cost=0.00..30.11 rows=1291 width=0) (actual
time=62.199..62.199rows=12649 loops=1) 
> >                            Index Cond: ((code)::text = 'NRN15'::text)
>
> So you might want to increase the statistics target for the "code" column.
Err.. how come? (newbie) it's scanning the index there. What's bad is
that it's using Seq_scans on the "zone" table.

> Incidentally the way this is written makes me wonder what data type "code" is
> defined as.

code is defined as varchar(5) data type. I'm changing all of the normal
char(n) to varchar(n) columns..

BTW, thanks for helping. Not using seq scans does really make a huge
difference as you can clearly see from the timing.

 Total runtime: 43772.045 ms
 Total runtime: 553.964 ms


Re: Instances where enable_seqscan = false is good

От
Ow Mun Heng
Дата:
On Tue, 2007-09-04 at 10:06 +0800, Ow Mun Heng wrote:
> On Mon, 2007-09-03 at 11:31 +0100, Gregory Stark wrote:
> > "Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes:

> > >                ->  Bitmap Heap Scan on drv  (cost=30.44..4414.39 rows=1291 width=24) (actual time=62.980..142.594
rows=12591loops=1) 
> > >                      Recheck Cond: ((code)::text = 'NRN15'::text)
> > >                      ->  Bitmap Index Scan on idx_drv  (cost=0.00..30.11 rows=1291 width=0) (actual
time=62.199..62.199rows=12649 loops=1) 
> > >                            Index Cond: ((code)::text = 'NRN15'::text)
> >
> > So you might want to increase the statistics target for the "code" column.
> Err.. how come? (newbie) it's scanning the index there. What's bad is
> that it's using Seq_scans on the "zone" table.

Code is now increased to Stat level of 100



----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=25113.04..30733.57 rows=1000 width=47) (actual time=19666.832..39961.032 rows=228 loops=1)
   ->  Hash Join  (cost=25113.04..324620.08 rows=53288 width=47) (actual time=19666.826..39960.437 rows=228 loops=1)
         Hash Cond: ((trz.number)::text = (drv.number)::text)
         ->  Seq Scan on zone trz  (cost=0.00..234363.75 rows=6394431 width=39) (actual time=17.635..29164.929
rows=6222984loops=1) 
               Filter: ((zn_2 <> 0::numeric) AND (zn_2 IS NOT NULL))
         ->  Hash  (cost=24965.72..24965.72 rows=11785 width=24) (actual time=215.851..215.851 rows=12591 loops=1)
               ->  Bitmap Heap Scan on drv (cost=243.76..24965.72 rows=11785 width=24) (actual time=50.910..188.894
rows=12591loops=1) 
                     Recheck Cond: ((code)::text = 'NRN15'::text)
                     ->  Bitmap Index Scan on idx_drv  (cost=0.00..240.82 rows=11785 width=0) (actual
time=49.180..49.180rows=12591 loops=1) 
                           Index Cond: ((code)::text = 'NRN15'::text)
 Total runtime: 39961.703 ms


Does seem to be slightly better (from 43772ms)

trz.number stat level increased to 100 & code to 100

                                                                     QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=25113.04..30733.57 rows=1000 width=47) (actual time=22152.398..42430.820 rows=228 loops=1)
   ->  Hash Join  (cost=25113.04..324620.08 rows=53288 width=47) (actual time=22152.392..42430.212 rows=228 loops=1)
         Hash Cond: ((trz.number)::text = (drv.number)::text)
         ->  Seq Scan on zone trz  (cost=0.00..234363.75 rows=6394431 width=39) (actual time=11.840..28808.222
rows=6222984loops=1) 
               Filter: ((zn_2 <> 0::numeric) AND (zn_2 IS NOT NULL))
         ->  Hash  (cost=24965.72..24965.72 rows=11785 width=24) (actual time=2646.652..2646.652 rows=12591 loops=1)
               ->  Bitmap Heap Scan on drv  (cost=243.76..24965.72 rows=11785 width=24) (actual time=50.628..2600.132
rows=12591loops=1) 
                     Recheck Cond: ((code)::text = 'NRN15'::text)
                     ->  Bitmap Index Scan on idx_drvl  (cost=0.00..240.82 rows=11785 width=0) (actual
time=38.436..38.436rows=12591 loops=1) 
                           Index Cond: ((code)::text = 'NRN15'::text)
 Total runtime: 42431.358 ms

hmm..not much difference..

What else can be done?? Many Thanks..

Re: Instances where enable_seqscan = false is good

От
Gregory Stark
Дата:
"Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes:

> On Mon, 2007-09-03 at 11:31 +0100, Gregory Stark wrote:
>> "Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes:
>> >
>> > How can I persuade PG to use the index w/o resorting to setting seqscan
>> > = false
>>
>> The usual knob to fiddle with is random_page_cost. If your database fits
>> mostly in memory you may want to turn it down from the default of 4 to
>> something closer to 1.
>
> I tried down to 0.4 before it resorted to using the index. The DB
> shouldn't fit into memory (I think) that table alone has ~8million rows
> at ~1.5G size

Values under 1 are nonsensical. Basically being as low as 1 means you're
telling the database that a random access i/o takes the same amount of time as
a sequential i/o. (Actually we have sequential_page_cost now so I guess
instead of "1" I should say "the same as sequential_page_cost" but I'm
assuming you haven't modified sequential_page_cost from the default of 1 have
you?)

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: Instances where enable_seqscan = false is good

От
Ow Mun Heng
Дата:
On Tue, 2007-09-04 at 05:15 +0100, Gregory Stark wrote:
> "Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes:
>
> > On Mon, 2007-09-03 at 11:31 +0100, Gregory Stark wrote:
> >> "Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes:
> >> >
> >> > How can I persuade PG to use the index w/o resorting to setting seqscan
> >> > = false
> >>
> >> The usual knob to fiddle with is random_page_cost. If your database fits
> >> mostly in memory you may want to turn it down from the default of 4 to
> >> something closer to 1.
> >
> > I tried down to 0.4 before it resorted to using the index. The DB
> > shouldn't fit into memory (I think) that table alone has ~8million rows
> > at ~1.5G size
>
> Values under 1 are nonsensical.
exactly, might as well use enable_seqscan=false. So it's still default
at 4

> Basically being as low as 1 means you're
> telling the database that a random access i/o takes the same amount of time as
> a sequential i/o. (Actually we have sequential_page_cost now so I guess
> instead of "1" I should say "the same as sequential_page_cost" but I'm
> assuming you haven't modified sequential_page_cost from the default of 1 have
> you?)

Have not changed anything in that area. Question is.. Do I need to? or
should I try out something just to see how it is?
(any) Recommendations would be good.


Re: Instances where enable_seqscan = false is good

От
Gregory Stark
Дата:
"Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes:

> Have not changed anything in that area. Question is.. Do I need to? or
> should I try out something just to see how it is?
> (any) Recommendations would be good.

Sorry, I don't have all the original plans. Can you post the explain analyze
with and without enable_seqscan now that the stats are giving good
predictions?

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com