Обсуждение: Optimizer Parameters

Поиск
Список
Период
Сортировка

Optimizer Parameters

От
Martin Foster
Дата:
Upon testing queries with EXPLAIN ANALYSE, I started to notice that the
planner would avoid using indexes when available.   Instead it would
jump to sequence scans, ignoring the index and increasing overall time
it took to get results.

I have been looking up documentation and noticed that you can somewhat
force Postgres into using Indexes when available.    So I changed the
following two lines in the .conf file:

    enable_seqscan = false
    enable_nestloop = false

This was recommended in the documentation, and to say the least things
have really changed in performance.   Queries have halved the time
needed to execute even if the estimates are insanely high compared.

I also increased this value, which apparently helps when running ANALYSE
on tables:
    default_statistics_target = 1000

Now how sane is it to keep those options turned off?   And what side
effects can I expect from changing default_statistics_target?   And any
way to have the planner quiet guessing tens of thousands of rows will be
return when there are at most hundred?

I included the EXPLAIN ALALYSE results in an attachment to maintain
formatting of the output. Thanks in advance!

    Martin Foster
    Creator/Designer Ethereal Realms
    martin@ethereal-realms.org

EXPLAIN ANALYSE SELECT
    Po.PuppetName     AS PuppetName,
    Po.PuppeteerLogin AS PuppeteerLogin,
    Po.RealmName      AS RealmName,
    Re.RealmPublic    AS RealmPublic,
    Re.RealmVerified  AS RealmVerified
  FROM ethereal.Post Po, ethereal.Puppet Ch, ethereal.Realm Re
  WHERE Po.PuppeteerLogin = Ch.PuppeteerLogin
    AND Po.RealmName = Re.RealmName
    AND Po.PostTimestamp > (LOCALTIMESTAMP - INTERVAL '10 minutes')
    AND Po.PuppetName IS NOT NULL
  GROUP BY Po.PuppeteerLogin, Po.PuppetName, Po.RealmName, Re.RealmPublic, Re.RealmVerified;

 -- Pre changes
 Group  (cost=298025.66..322310.42 rows=161898 width=77) (actual time=4241.28..4329.68 rows=74 loops=1)
   ->  Sort  (cost=298025.66..302073.12 rows=1618985 width=77) (actual time=4241.23..4253.68 rows=14420 loops=1)
         Sort Key: po.puppeteerlogin, po.puppetname, po.realmname, re.realmpublic, re.realmverified
         ->  Merge Join  (cost=24174.23..44794.94 rows=1618985 width=77) (actual time=3199.66..3891.09 rows=14420
loops=1)
               Merge Cond: ("outer".puppeteerlogin = "inner".puppeteerlogin)
               ->  Sort  (cost=5964.67..6056.93 rows=36906 width=10) (actual time=366.18..427.94 rows=36318 loops=1)
                     Sort Key: ch.puppeteerlogin
                     ->  Seq Scan on puppet ch  (cost=0.00..3165.06 rows=36906 width=10) (actual time=0.02..133.52
rows=36456loops=1) 
               ->  Sort  (cost=18209.57..18310.85 rows=40513 width=67) (actual time=2832.70..2945.77 rows=14411
loops=1)
                     Sort Key: po.puppeteerlogin
                     ->  Hash Join  (cost=30.30..15109.11 rows=40513 width=67) (actual time=2822.23..2830.93 rows=392
loops=1)
                           Hash Cond: ("outer".realmname = "inner".realmname)
                           ->  Seq Scan on post po  (cost=0.00..14369.84 rows=40513 width=41) (actual
time=2820.88..2826.30rows=392 loops=1) 
                                 Filter: ((posttimestamp > (('now'::text)::timestamp(6) without time zone -
'00:10'::interval))AND (puppetname IS NOT NULL)) 
                           ->  Hash  (cost=29.84..29.84 rows=184 width=26) (actual time=1.25..1.25 rows=0 loops=1)
                                 ->  Seq Scan on realm re  (cost=0.00..29.84 rows=184 width=26) (actual time=0.02..0.90
rows=179loops=1) 
 Total runtime: 4419.10 msec
(17 rows)

 -- Post changes
 Group  (cost=100293106.79..100316406.96 rows=155334 width=77) (actual time=1029.10..1159.25 rows=99 loops=1)
   ->  Sort  (cost=100293106.79..100296990.15 rows=1553344 width=77) (actual time=1025.98..1047.32 rows=24730 loops=1)
         Sort Key: po.puppeteerlogin, po.puppetname, po.realmname, re.realmpublic, re.realmverified
         ->  Merge Join  (cost=100031106.45..100050913.48 rows=1553344 width=77) (actual time=453.60..839.30 rows=24730
loops=1)
               Merge Cond: ("outer".puppeteerlogin = "inner".puppeteerlogin)
               ->  Sort  (cost=25124.25..25229.58 rows=42133 width=67) (actual time=17.07..17.67 rows=631 loops=1)
                     Sort Key: po.puppeteerlogin
                     ->  Hash Join  (cost=120.06..21887.85 rows=42133 width=67) (actual time=2.32..14.25 rows=631
loops=1)
                           Hash Cond: ("outer".realmname = "inner".realmname)
                           ->  Index Scan using idxpost_timestamp on post po  (cost=0.00..21030.46 rows=42133 width=41)
(actualtime=0.15..7.05 rows=631 loops=1) 
                                 Index Cond: (posttimestamp > (('now'::text)::timestamp(6) without time zone -
'00:10'::interval))
                                 Filter: (puppetname IS NOT NULL)
                           ->  Hash  (cost=119.61..119.61 rows=181 width=26) (actual time=2.01..2.01 rows=0 loops=1)
                                 ->  Index Scan using pkrealm on realm re  (cost=0.00..119.61 rows=181 width=26)
(actualtime=0.05..1.55 rows=181 loops=1) 
               ->  Sort  (cost=100005982.20..100006073.27 rows=36428 width=10) (actual time=435.39..506.81 rows=58859
loops=1)
                     Sort Key: ch.puppeteerlogin
                     ->  Seq Scan on puppet ch  (cost=100000000.00..100003222.28 rows=36428 width=10) (actual
time=0.03..184.55rows=36428 loops=1) 
 Total runtime: 1175.33 msec
(21 rows)


EXPLAIN ANALYSE SELECT
    Po.PuppetName     AS PuppetName,
    Po.PuppeteerLogin AS PuppeteerLogin,
    Po.RealmName      AS RealmName,
    Re.RealmPublic    AS RealmPublic,
    Re.RealmVerified  AS RealmVerified
  FROM ethereal.Post Po, ethereal.Puppet Ch, ethereal.Realm Re
  WHERE Po.PostIDNumber > (SELECT MIN(PostIDNumber)
       FROM ethereal.Post
       WHERE PostTimeStamp > (LOCALTIMESTAMP - INTERVAL '10 minutes'))
    AND Po.PuppeteerLogin = Ch.PuppeteerLogin
    AND Po.RealmName = Re.RealmName
    AND Po.PuppetName IS NOT NULL
  GROUP BY Po.PuppeteerLogin, Po.PuppetName, Po.RealmName, Re.RealmPublic, Re.RealmVerified;

 -- Pre changes
 Group  (cost=297096.47..321381.23 rows=161898 width=77) (actual time=2108.70..2179.12 rows=76 loops=1)
   InitPlan
     ->  Aggregate  (cost=14473.08..14473.08 rows=1 width=4) (actual time=1141.58..1141.58 rows=1 loops=1)
           ->  Seq Scan on post  (cost=0.00..14369.84 rows=41297 width=4) (actual time=1136.50..1140.95 rows=413
loops=1)
                 Filter: (posttimestamp > (('now'::text)::timestamp(6) without time zone - '00:10'::interval))
   ->  Sort  (cost=297096.47..301143.93 rows=1618985 width=77) (actual time=2108.55..2121.85 rows=15302 loops=1)
         Sort Key: po.puppeteerlogin, po.puppetname, po.realmname, re.realmpublic, re.realmverified
         ->  Merge Join  (cost=23245.04..43865.75 rows=1618985 width=77) (actual time=1782.43..2003.58 rows=15302
loops=1)
               Merge Cond: ("outer".puppeteerlogin = "inner".puppeteerlogin)
               ->  Sort  (cost=5964.67..6056.93 rows=36906 width=10) (actual time=415.56..449.62 rows=36318 loops=1)
                     Sort Key: ch.puppeteerlogin
                     ->  Seq Scan on puppet ch  (cost=0.00..3165.06 rows=36906 width=10) (actual time=0.02..135.58
rows=36456loops=1) 
               ->  Sort  (cost=17280.38..17381.66 rows=40513 width=67) (actual time=1362.15..1376.53 rows=15293
loops=1)
                     Sort Key: po.puppeteerlogin
                     ->  Hash Join  (cost=30.30..14179.92 rows=40513 width=67) (actual time=1354.41..1360.42 rows=411
loops=1)
                           Hash Cond: ("outer".realmname = "inner".realmname)
                           ->  Seq Scan on post po  (cost=0.00..13440.65 rows=40513 width=41) (actual
time=1353.11..1355.96rows=411 loops=1) 
                                 Filter: ((postidnumber > $0) AND (puppetname IS NOT NULL))
                           ->  Hash  (cost=29.84..29.84 rows=184 width=26) (actual time=1.20..1.20 rows=0 loops=1)
                                 ->  Seq Scan on realm re  (cost=0.00..29.84 rows=184 width=26) (actual time=0.02..0.88
rows=179loops=1) 
 Total runtime: 2192.72 msec
(21 rows)

 -- Post changes
 Group  (cost=100292759.45..100316059.61 rows=155334 width=77) (actual time=876.30..969.77 rows=84 loops=1)
   InitPlan
     ->  Aggregate  (cost=21137.57..21137.57 rows=1 width=4) (actual time=3.98..3.98 rows=1 loops=1)
           ->  Index Scan using idxpost_timestamp on post  (cost=0.00..21030.46 rows=42846 width=4) (actual
time=0.14..3.26rows=538 loops=1) 
                 Index Cond: (posttimestamp > (('now'::text)::timestamp(6) without time zone - '00:10'::interval))
   ->  Sort  (cost=100292759.45..100296642.81 rows=1553344 width=77) (actual time=875.32..893.11 rows=20378 loops=1)
         Sort Key: po.puppeteerlogin, po.puppetname, po.realmname, re.realmpublic, re.realmverified
         ->  Merge Join  (cost=100030759.11..100050566.13 rows=1553344 width=77) (actual time=436.22..708.88 rows=20378
loops=1)
               Merge Cond: ("outer".puppeteerlogin = "inner".puppeteerlogin)
               ->  Sort  (cost=24776.90..24882.24 rows=42133 width=67) (actual time=26.54..27.01 rows=524 loops=1)
                     Sort Key: po.puppeteerlogin
                     ->  Hash Join  (cost=120.06..21540.50 rows=42133 width=67) (actual time=5.93..24.30 rows=524
loops=1)
                           Hash Cond: ("outer".realmname = "inner".realmname)
                           ->  Index Scan using pkpost on post po  (cost=0.00..20683.11 rows=42133 width=41) (actual
time=4.08..17.10rows=524 loops=1) 
                                 Index Cond: (postidnumber > $0)
                                 Filter: (puppetname IS NOT NULL)
                           ->  Hash  (cost=119.61..119.61 rows=181 width=26) (actual time=1.68..1.68 rows=0 loops=1)
                                 ->  Index Scan using pkrealm on realm re  (cost=0.00..119.61 rows=181 width=26)
(actualtime=0.04..1.28 rows=181 loops=1) 
               ->  Sort  (cost=100005982.20..100006073.27 rows=36428 width=10) (actual time=408.56..456.91 rows=54202
loops=1)
                     Sort Key: ch.puppeteerlogin
                     ->  Seq Scan on puppet ch  (cost=100000000.00..100003222.28 rows=36428 width=10) (actual
time=0.01..162.22rows=36428 loops=1) 
 Total runtime: 984.75 msec
(22 rows)


Re: Optimizer Parameters

От
Tom Lane
Дата:
Martin Foster <martin@ethereal-realms.org> writes:
> force Postgres into using Indexes when available.    So I changed the
> following two lines in the .conf file:

>     enable_seqscan = false
>     enable_nestloop = false

> This was recommended in the documentation,

Where would you say that setting those off in the config file is
"recommended"?

> Now how sane is it to keep those options turned off?

It isn't.  If you have to force them off for a particular query, do
so right before you issue that query, and turn them on again after.
Turning them off globally is sure to cause you pain later.

> And any
> way to have the planner quiet guessing tens of thousands of rows will be
> return when there are at most hundred?

>     AND Po.PostTimestamp > (LOCALTIMESTAMP - INTERVAL '10 minutes')
>     AND Po.PuppetName IS NOT NULL

>                            ->  Seq Scan on post po  (cost=0.00..14369.84 rows=40513 width=41) (actual
time=2820.88..2826.30rows=392 loops=1) 
>                                  Filter: ((posttimestamp > (('now'::text)::timestamp(6) without time zone -
'00:10'::interval))AND (puppetname IS NOT NULL)) 

Not with that coding technique; "LOCALTIMESTAMP - INTERVAL '10 minutes'"
isn't a constant and so the planner can't look at its statistics to
see that only a small part of the table will be selected.

There are two standard workarounds for this:

1. Do the timestamp arithmetic on the client side, so that the query
you send the backend has a simple constant:

    ... AND Po.PostTimestamp > '2003-07-12 16:27'

2. Create a function that is falsely marked immutable, viz:

create function ago(interval) returns timestamp without time zone as
'select localtimestamp - $1' language sql immutable strict;

    ... AND Po.PostTimestamp > ago('10 minutes')

Because the function is marked immutable, the planner will reduce
"ago('10 minutes')" to a constant on sight, and then use that value
for planning purposes.  This technique can cause problems, since
in some contexts the reduction will occur prematurely, but as long
as you only use ago() in interactively-issued queries it works okay.

            regards, tom lane

Re: Optimizer Parameters

От
Martin Foster
Дата:
Tom Lane wrote:
>
>>force Postgres into using Indexes when available.    So I changed the
>>following two lines in the .conf file:
>
>
>>    enable_seqscan = false
>>    enable_nestloop = false
> >This was recommended in the documentation,
>
>
> Where would you say that setting those off in the config file is
> "recommended"?
>
>
>>Now how sane is it to keep those options turned off?
>
>
> It isn't.  If you have to force them off for a particular query, do
> so right before you issue that query, and turn them on again after.
> Turning them off globally is sure to cause you pain later.
>
>
>>And any
>>way to have the planner quiet guessing tens of thousands of rows will be
>>return when there are at most hundred?
>
>
>>    AND Po.PostTimestamp > (LOCALTIMESTAMP - INTERVAL '10 minutes')
>>    AND Po.PuppetName IS NOT NULL
>
>
>>                           ->  Seq Scan on post po  (cost=0.00..14369.84 rows=40513 width=41) (actual
time=2820.88..2826.30rows=392 loops=1) 
>>                                 Filter: ((posttimestamp > (('now'::text)::timestamp(6) without time zone -
'00:10'::interval))AND (puppetname IS NOT NULL)) 
>
>
> Not with that coding technique; "LOCALTIMESTAMP - INTERVAL '10 minutes'"
> isn't a constant and so the planner can't look at its statistics to
> see that only a small part of the table will be selected.
>
> There are two standard workarounds for this:
>
> 1. Do the timestamp arithmetic on the client side, so that the query
> you send the backend has a simple constant:
>
>     ... AND Po.PostTimestamp > '2003-07-12 16:27'
>
> 2. Create a function that is falsely marked immutable, viz:
>
> create function ago(interval) returns timestamp without time zone as
> 'select localtimestamp - $1' language sql immutable strict;
>
>     ... AND Po.PostTimestamp > ago('10 minutes')
>
> Because the function is marked immutable, the planner will reduce
> "ago('10 minutes')" to a constant on sight, and then use that value
> for planning purposes.  This technique can cause problems, since
> in some contexts the reduction will occur prematurely, but as long
> as you only use ago() in interactively-issued queries it works okay.
>
>             regards, tom lane

http://www.postgresql.org/docs/7.3/static/indexes-examine.html

The conf file does not make a mention of it, other then perhaps being
used to debug.  The above link points to disabling it, but tells you
nothing about potential consequences and what to do if it works better
then it did before.

However, when I tried out your functions things started to work much
better then previously.   This to say the least is a great sign as it
will increase overall performance.

So thanks for that!   As a side note, would you recommend disabling
fsync for added performance?   This would be joined with a healthy dose
of a kernel file system buffer.

Simply curious, as I have been increasing certain options for the WAL to
mean it writes less often (transactions are numerous so that's not an
issue) to the hard drives.

    Martin Foster
    Creator/Designer Ethereal Realms
    martin@ethereal-realms.org