Обсуждение: Forcing index scan on query produces 16x faster

Поиск
Список
Период
Сортировка

Forcing index scan on query produces 16x faster

От
Christian Brink
Дата:
I am running into a problem with a particular query. The execution plan
cost shows that the Seq Scan is a better bet (cost=54020.49..54020.55)
over the forced index 'enable_seqscan =  false'
(cost=1589703.87..1589703.93). But when I run the query both ways I get
a vastly different result (below). It appears not to want to bracket the
salesitems off of the 'id' foreign_key unless I force it.

Is there a way to rewrite or hint the planner to get me the better plan
without resorting to 'enable_seqscan' manipulation (or am I missing
something)?

postream=> select version();
                                                          version

-------------------------------------------------------------------------------------------------------------------------
  PostgreSQL 8.0.3 on i386-redhat-linux-gnu, compiled by GCC
i386-redhat-linux-gcc (GCC) 4.0.0 20050505 (Red Hat 4.0.0-4)


postream=> SET enable_seqscan = false;
SET
postream=> EXPLAIN ANALYZE
postream-> SELECT si.group1_id as name, sum(si.qty) as count,
sum(si.amt) as amt
postream->   FROM salesitems si, sales s, sysstrings
postream->  WHERE si.id = s.id
postream->    AND si.group1_id != ''
postream->    AND si.group1_id IS NOT NULL
postream->    AND NOT si.void
postream->    AND NOT s.void
postream->    AND NOT s.suspended
postream->    AND s.tranzdate >= (cast('2010-02-15' as date) +
cast(sysstrings.data as time))
postream->    AND s.tranzdate < ((cast('2010-02-15' as date) + 1) +
cast(sysstrings.data as time))
postream->    AND sysstrings.id='net/Console/Employee/Day End Time'
postream->  GROUP BY name;
                                                                                              QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  HashAggregate  (cost=1589703.87..1589703.93 rows=13 width=35) (actual
time=33.414..33.442 rows=12 loops=1)
    ->  Nested Loop  (cost=0.01..1588978.22 rows=96753 width=35) (actual
time=0.284..22.115 rows=894 loops=1)
          ->  Nested Loop  (cost=0.01..2394.31 rows=22530 width=4)
(actual time=0.207..4.671 rows=225 loops=1)
                ->  Index Scan using sysstrings_pkey on sysstrings
(cost=0.00..5.78 rows=1 width=175) (actual time=0.073..0.078 rows=1 loops=1)
                      Index Cond: (id = 'net/Console/Employee/Day End
Time'::text)
                ->  Index Scan using sales_tranzdate_index on sales s
(cost=0.01..1825.27 rows=22530 width=12) (actual time=0.072..3.464
rows=225 loops=1)
                      Index Cond: ((s.tranzdate >= ('2010-02-15'::date +
("outer".data)::time without time zone)) AND (s.tranzdate <
('2010-02-16'::date + ("outer".data)::time without time zone)))
                      Filter: ((NOT void) AND (NOT suspended))
          ->  Index Scan using salesitems_pkey on salesitems si
(cost=0.00..70.05 rows=30 width=39) (actual time=0.026..0.052 rows=4
loops=225)
                Index Cond: (si.id = "outer".id)
                Filter: ((group1_id <> ''::text) AND (group1_id IS NOT
NULL) AND (NOT void))
  Total runtime: 33.734 ms
(12 rows)

postream=> SET enable_seqscan = true;
SET
postream=> EXPLAIN ANALYZE
postream-> SELECT si.group1_id as name, sum(si.qty) as count,
sum(si.amt) as amt
postream->   FROM salesitems si, sales s, sysstrings
postream->  WHERE si.id = s.id
postream->    AND si.group1_id != ''
postream->    AND si.group1_id IS NOT NULL
postream->    AND NOT si.void
postream->    AND NOT s.void
postream->    AND NOT s.suspended
postream->    AND s.tranzdate >= (cast('2010-02-15' as date) +
cast(sysstrings.data as time))
postream->    AND s.tranzdate < ((cast('2010-02-15' as date) + 1) +
cast(sysstrings.data as time))
postream->    AND sysstrings.id='net/Console/Employee/Day End Time'
postream->  GROUP BY name;
                                                                                                 QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  HashAggregate  (cost=54020.49..54020.55 rows=13 width=35) (actual
time=5564.929..5564.957 rows=12 loops=1)
    ->  Hash Join  (cost=2539.63..53294.84 rows=96753 width=35) (actual
time=5502.324..5556.262 rows=894 loops=1)
          Hash Cond: ("outer".id = "inner".id)
          ->  Seq Scan on salesitems si  (cost=0.00..30576.60
rows=885215 width=39) (actual time=0.089..3099.453 rows=901249 loops=1)
                Filter: ((group1_id <> ''::text) AND (group1_id IS NOT
NULL) AND (NOT void))
          ->  Hash  (cost=2394.31..2394.31 rows=22530 width=4) (actual
time=3.329..3.329 rows=0 loops=1)
                ->  Nested Loop  (cost=0.01..2394.31 rows=22530 width=4)
(actual time=0.217..2.749 rows=225 loops=1)
                      ->  Index Scan using sysstrings_pkey on
sysstrings  (cost=0.00..5.78 rows=1 width=175) (actual time=0.077..0.085
rows=1 loops=1)
                            Index Cond: (id = 'net/Console/Employee/Day
End Time'::text)
                      ->  Index Scan using sales_tranzdate_index on
sales s  (cost=0.01..1825.27 rows=22530 width=12) (actual
time=0.074..1.945 rows=225 loops=1)
                            Index Cond: ((s.tranzdate >=
('2010-02-15'::date + ("outer".data)::time without time zone)) AND
(s.tranzdate < ('2010-02-16'::date + ("outer".data)::time without time
zone)))
                            Filter: ((NOT void) AND (NOT suspended))
  Total runtime: 5565.262 ms
(13 rows)


--
Christian Brink



Re: Forcing index scan on query produces 16x faster

От
David Wilson
Дата:


On Wed, Mar 17, 2010 at 5:25 PM, Christian Brink <cbrink@r-stream.com> wrote:

              ->  Index Scan using sales_tranzdate_index on sales s  (cost=0.01..1825.27 rows=22530 width=12) (actual time=0.072..3.464 rows=225 loops=1)

Have you tried increasing the statistics on that table (and then analyzing)? The estimates for that index scan are off by a factor of 100, which may indicate why the planner is trying so hard to avoid a nestloop there.

--
- David T. Wilson
david.t.wilson@gmail.com

Re: Forcing index scan on query produces 16x faster

От
Tom Lane
Дата:
Christian Brink <cbrink@r-stream.com> writes:
> Is there a way to rewrite or hint the planner to get me the better plan
> without resorting to 'enable_seqscan' manipulation (or am I missing
> something)?

I think your problem is here:

>   PostgreSQL 8.0.3 on i386-redhat-linux-gnu, compiled by GCC
> i386-redhat-linux-gcc (GCC) 4.0.0 20050505 (Red Hat 4.0.0-4)

Recent versions are significantly smarter about grouping operations
than that dinosaur.

(Even if you must stay on 8.0.x, you should at least be running
8.0.something-recent; what you have is full of security and data-loss
risks.  8.0.24 was released this week.)

            regards, tom lane

Re: Forcing index scan on query produces 16x faster

От
"Eger, Patrick"
Дата:
I'm running 8.4.2 and have noticed a similar heavy preference for
sequential scans and hash joins over index scans and nested loops.  Our
database is can basically fit in cache 100% so this may not be
applicable to your situation, but the following params seemed to help
us:

seq_page_cost = 1.0
random_page_cost = 1.01
cpu_tuple_cost = 0.0001
cpu_index_tuple_cost = 0.00005
cpu_operator_cost = 0.000025
effective_cache_size = 1000MB
shared_buffers = 1000MB


Might I suggest the Postgres developers reconsider these defaults for
9.0 release, or perhaps provide a few sets of tuning params for
different workloads in the default install/docs? The cpu_*_cost in
particular seem to be way off afaict. I may be dead wrong though, fwiw
=)

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Christian
Brink
Sent: Wednesday, March 17, 2010 2:26 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Forcing index scan on query produces 16x faster

I am running into a problem with a particular query. The execution plan
cost shows that the Seq Scan is a better bet (cost=54020.49..54020.55)
over the forced index 'enable_seqscan =  false'
(cost=1589703.87..1589703.93). But when I run the query both ways I get
a vastly different result (below). It appears not to want to bracket the

salesitems off of the 'id' foreign_key unless I force it.

Is there a way to rewrite or hint the planner to get me the better plan
without resorting to 'enable_seqscan' manipulation (or am I missing
something)?

postream=> select version();
                                                          version
------------------------------------------------------------------------
-------------------------------------------------
  PostgreSQL 8.0.3 on i386-redhat-linux-gnu, compiled by GCC
i386-redhat-linux-gcc (GCC) 4.0.0 20050505 (Red Hat 4.0.0-4)


postream=> SET enable_seqscan = false;
SET
postream=> EXPLAIN ANALYZE
postream-> SELECT si.group1_id as name, sum(si.qty) as count,
sum(si.amt) as amt
postream->   FROM salesitems si, sales s, sysstrings
postream->  WHERE si.id = s.id
postream->    AND si.group1_id != ''
postream->    AND si.group1_id IS NOT NULL
postream->    AND NOT si.void
postream->    AND NOT s.void
postream->    AND NOT s.suspended
postream->    AND s.tranzdate >= (cast('2010-02-15' as date) +
cast(sysstrings.data as time))
postream->    AND s.tranzdate < ((cast('2010-02-15' as date) + 1) +
cast(sysstrings.data as time))
postream->    AND sysstrings.id='net/Console/Employee/Day End Time'
postream->  GROUP BY name;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
----------------------------------------------------
  HashAggregate  (cost=1589703.87..1589703.93 rows=13 width=35) (actual
time=33.414..33.442 rows=12 loops=1)
    ->  Nested Loop  (cost=0.01..1588978.22 rows=96753 width=35) (actual

time=0.284..22.115 rows=894 loops=1)
          ->  Nested Loop  (cost=0.01..2394.31 rows=22530 width=4)
(actual time=0.207..4.671 rows=225 loops=1)
                ->  Index Scan using sysstrings_pkey on sysstrings
(cost=0.00..5.78 rows=1 width=175) (actual time=0.073..0.078 rows=1
loops=1)
                      Index Cond: (id = 'net/Console/Employee/Day End
Time'::text)
                ->  Index Scan using sales_tranzdate_index on sales s
(cost=0.01..1825.27 rows=22530 width=12) (actual time=0.072..3.464
rows=225 loops=1)
                      Index Cond: ((s.tranzdate >= ('2010-02-15'::date +

("outer".data)::time without time zone)) AND (s.tranzdate <
('2010-02-16'::date + ("outer".data)::time without time zone)))
                      Filter: ((NOT void) AND (NOT suspended))
          ->  Index Scan using salesitems_pkey on salesitems si
(cost=0.00..70.05 rows=30 width=39) (actual time=0.026..0.052 rows=4
loops=225)
                Index Cond: (si.id = "outer".id)
                Filter: ((group1_id <> ''::text) AND (group1_id IS NOT
NULL) AND (NOT void))
  Total runtime: 33.734 ms
(12 rows)

postream=> SET enable_seqscan = true;
SET
postream=> EXPLAIN ANALYZE
postream-> SELECT si.group1_id as name, sum(si.qty) as count,
sum(si.amt) as amt
postream->   FROM salesitems si, sales s, sysstrings
postream->  WHERE si.id = s.id
postream->    AND si.group1_id != ''
postream->    AND si.group1_id IS NOT NULL
postream->    AND NOT si.void
postream->    AND NOT s.void
postream->    AND NOT s.suspended
postream->    AND s.tranzdate >= (cast('2010-02-15' as date) +
cast(sysstrings.data as time))
postream->    AND s.tranzdate < ((cast('2010-02-15' as date) + 1) +
cast(sysstrings.data as time))
postream->    AND sysstrings.id='net/Console/Employee/Day End Time'
postream->  GROUP BY name;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
----------------------------------------------------------
  HashAggregate  (cost=54020.49..54020.55 rows=13 width=35) (actual
time=5564.929..5564.957 rows=12 loops=1)
    ->  Hash Join  (cost=2539.63..53294.84 rows=96753 width=35) (actual
time=5502.324..5556.262 rows=894 loops=1)
          Hash Cond: ("outer".id = "inner".id)
          ->  Seq Scan on salesitems si  (cost=0.00..30576.60
rows=885215 width=39) (actual time=0.089..3099.453 rows=901249 loops=1)
                Filter: ((group1_id <> ''::text) AND (group1_id IS NOT
NULL) AND (NOT void))
          ->  Hash  (cost=2394.31..2394.31 rows=22530 width=4) (actual
time=3.329..3.329 rows=0 loops=1)
                ->  Nested Loop  (cost=0.01..2394.31 rows=22530 width=4)

(actual time=0.217..2.749 rows=225 loops=1)
                      ->  Index Scan using sysstrings_pkey on
sysstrings  (cost=0.00..5.78 rows=1 width=175) (actual time=0.077..0.085

rows=1 loops=1)
                            Index Cond: (id = 'net/Console/Employee/Day
End Time'::text)
                      ->  Index Scan using sales_tranzdate_index on
sales s  (cost=0.01..1825.27 rows=22530 width=12) (actual
time=0.074..1.945 rows=225 loops=1)
                            Index Cond: ((s.tranzdate >=
('2010-02-15'::date + ("outer".data)::time without time zone)) AND
(s.tranzdate < ('2010-02-16'::date + ("outer".data)::time without time
zone)))
                            Filter: ((NOT void) AND (NOT suspended))
  Total runtime: 5565.262 ms
(13 rows)


--
Christian Brink



--
Sent via pgsql-performance mailing list
(pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Forcing index scan on query produces 16x faster

От
Dave Crooke
Дата:
I've also observed the same behaviour on a very large table (200GB data, 170GB for 2 indexes) ....

I have a table which has 6 small columns, let's call them (a, b, c, d, e, f) and about 1 billion rows. There is an index on (a, b, c, d) - not my idea, Hibernate requires primary keys for every table.

If I do the following query:

select max(c) from tbl where a=[constant literal] and b=[other constant literal];

.... then with maxed out analysis histograms, and no changes to any of the page_cost type stuff, it still deparately wants toi do a full table scan, which is ... kinda slow.

Of course, a billion row table is also rather suboptimal (our app collects a lot more data than it used to) and so I'm bypassing Hibernate, and sharding it all by time, so that the tables and indexes will be a manageable size, and will also be vacuum-free as my aging out process is now DROP TABLE :-)

Cheers
Dave

On Wed, Mar 17, 2010 at 8:01 PM, Eger, Patrick <peger@automotive.com> wrote:
I'm running 8.4.2 and have noticed a similar heavy preference for
sequential scans and hash joins over index scans and nested loops.  Our
database is can basically fit in cache 100% so this may not be
applicable to your situation, but the following params seemed to help
us:

seq_page_cost = 1.0
random_page_cost = 1.01
cpu_tuple_cost = 0.0001
cpu_index_tuple_cost = 0.00005
cpu_operator_cost = 0.000025
effective_cache_size = 1000MB
shared_buffers = 1000MB


Might I suggest the Postgres developers reconsider these defaults for
9.0 release, or perhaps provide a few sets of tuning params for
different workloads in the default install/docs? The cpu_*_cost in
particular seem to be way off afaict. I may be dead wrong though, fwiw
=)

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Christian
Brink
Sent: Wednesday, March 17, 2010 2:26 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Forcing index scan on query produces 16x faster

I am running into a problem with a particular query. The execution plan
cost shows that the Seq Scan is a better bet (cost=54020.49..54020.55)
over the forced index 'enable_seqscan =  false'
(cost=1589703.87..1589703.93). But when I run the query both ways I get
a vastly different result (below). It appears not to want to bracket the

salesitems off of the 'id' foreign_key unless I force it.

Is there a way to rewrite or hint the planner to get me the better plan
without resorting to 'enable_seqscan' manipulation (or am I missing
something)?

postream=> select version();
                                                         version
------------------------------------------------------------------------
-------------------------------------------------
 PostgreSQL 8.0.3 on i386-redhat-linux-gnu, compiled by GCC
i386-redhat-linux-gcc (GCC) 4.0.0 20050505 (Red Hat 4.0.0-4)


postream=> SET enable_seqscan = false;
SET
postream=> EXPLAIN ANALYZE
postream-> SELECT si.group1_id as name, sum(si.qty) as count,
sum(si.amt) as amt
postream->   FROM salesitems si, sales s, sysstrings
postream->  WHERE si.id = s.id
postream->    AND si.group1_id != ''
postream->    AND si.group1_id IS NOT NULL
postream->    AND NOT si.void
postream->    AND NOT s.void
postream->    AND NOT s.suspended
postream->    AND s.tranzdate >= (cast('2010-02-15' as date) +
cast(sysstrings.data as time))
postream->    AND s.tranzdate < ((cast('2010-02-15' as date) + 1) +
cast(sysstrings.data as time))
postream->    AND sysstrings.id='net/Console/Employee/Day End Time'
postream->  GROUP BY name;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
----------------------------------------------------
 HashAggregate  (cost=1589703.87..1589703.93 rows=13 width=35) (actual
time=33.414..33.442 rows=12 loops=1)
   ->  Nested Loop  (cost=0.01..1588978.22 rows=96753 width=35) (actual

time=0.284..22.115 rows=894 loops=1)
         ->  Nested Loop  (cost=0.01..2394.31 rows=22530 width=4)
(actual time=0.207..4.671 rows=225 loops=1)
               ->  Index Scan using sysstrings_pkey on sysstrings
(cost=0.00..5.78 rows=1 width=175) (actual time=0.073..0.078 rows=1
loops=1)
                     Index Cond: (id = 'net/Console/Employee/Day End
Time'::text)
               ->  Index Scan using sales_tranzdate_index on sales s
(cost=0.01..1825.27 rows=22530 width=12) (actual time=0.072..3.464
rows=225 loops=1)
                     Index Cond: ((s.tranzdate >= ('2010-02-15'::date +

("outer".data)::time without time zone)) AND (s.tranzdate <
('2010-02-16'::date + ("outer".data)::time without time zone)))
                     Filter: ((NOT void) AND (NOT suspended))
         ->  Index Scan using salesitems_pkey on salesitems si
(cost=0.00..70.05 rows=30 width=39) (actual time=0.026..0.052 rows=4
loops=225)
               Index Cond: (si.id = "outer".id)
               Filter: ((group1_id <> ''::text) AND (group1_id IS NOT
NULL) AND (NOT void))
 Total runtime: 33.734 ms
(12 rows)

postream=> SET enable_seqscan = true;
SET
postream=> EXPLAIN ANALYZE
postream-> SELECT si.group1_id as name, sum(si.qty) as count,
sum(si.amt) as amt
postream->   FROM salesitems si, sales s, sysstrings
postream->  WHERE si.id = s.id
postream->    AND si.group1_id != ''
postream->    AND si.group1_id IS NOT NULL
postream->    AND NOT si.void
postream->    AND NOT s.void
postream->    AND NOT s.suspended
postream->    AND s.tranzdate >= (cast('2010-02-15' as date) +
cast(sysstrings.data as time))
postream->    AND s.tranzdate < ((cast('2010-02-15' as date) + 1) +
cast(sysstrings.data as time))
postream->    AND sysstrings.id='net/Console/Employee/Day End Time'
postream->  GROUP BY name;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
----------------------------------------------------------
 HashAggregate  (cost=54020.49..54020.55 rows=13 width=35) (actual
time=5564.929..5564.957 rows=12 loops=1)
   ->  Hash Join  (cost=2539.63..53294.84 rows=96753 width=35) (actual
time=5502.324..5556.262 rows=894 loops=1)
         Hash Cond: ("outer".id = "inner".id)
         ->  Seq Scan on salesitems si  (cost=0.00..30576.60
rows=885215 width=39) (actual time=0.089..3099.453 rows=901249 loops=1)
               Filter: ((group1_id <> ''::text) AND (group1_id IS NOT
NULL) AND (NOT void))
         ->  Hash  (cost=2394.31..2394.31 rows=22530 width=4) (actual
time=3.329..3.329 rows=0 loops=1)
               ->  Nested Loop  (cost=0.01..2394.31 rows=22530 width=4)

(actual time=0.217..2.749 rows=225 loops=1)
                     ->  Index Scan using sysstrings_pkey on
sysstrings  (cost=0.00..5.78 rows=1 width=175) (actual time=0.077..0.085

rows=1 loops=1)
                           Index Cond: (id = 'net/Console/Employee/Day
End Time'::text)
                     ->  Index Scan using sales_tranzdate_index on
sales s  (cost=0.01..1825.27 rows=22530 width=12) (actual
time=0.074..1.945 rows=225 loops=1)
                           Index Cond: ((s.tranzdate >=
('2010-02-15'::date + ("outer".data)::time without time zone)) AND
(s.tranzdate < ('2010-02-16'::date + ("outer".data)::time without time
zone)))
                           Filter: ((NOT void) AND (NOT suspended))
 Total runtime: 5565.262 ms
(13 rows)


--
Christian Brink



--
Sent via pgsql-performance mailing list
(pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Forcing index scan on query produces 16x faster

От
Robert Haas
Дата:
On Wed, Mar 17, 2010 at 9:01 PM, Eger, Patrick <peger@automotive.com> wrote:
> I'm running 8.4.2 and have noticed a similar heavy preference for
> sequential scans and hash joins over index scans and nested loops.  Our
> database is can basically fit in cache 100% so this may not be
> applicable to your situation, but the following params seemed to help
> us:
>
> seq_page_cost = 1.0
> random_page_cost = 1.01
> cpu_tuple_cost = 0.0001
> cpu_index_tuple_cost = 0.00005
> cpu_operator_cost = 0.000025
> effective_cache_size = 1000MB
> shared_buffers = 1000MB
>
>
> Might I suggest the Postgres developers reconsider these defaults for
> 9.0 release, or perhaps provide a few sets of tuning params for
> different workloads in the default install/docs? The cpu_*_cost in
> particular seem to be way off afaict. I may be dead wrong though, fwiw
> =)

The default assume that the database is not cached in RAM.  If it is,
you want to lower seq_page_cost and random_page_cost to something much
smaller, and typically make them equal.  I often recommend 0.005, but
I know others have had success with higher values.

Ultimately it would be nice to have a better model of how data gets
cached in shared_buffers and the OS buffer cache, but that is not so
easy.

...Robert

Re: Forcing index scan on query produces 16x faster

От
"Eger, Patrick"
Дата:
Ok, the wording is a bit unclear in the documentation as to whether it is the cost for an entire *page* of tuples, or
actualtuples. So something like the following might give better results for a fully-cached DB? 

seq_page_cost = 1.0
random_page_cost = 1.1 #even memory has random access costs, lack of readahead, TLB misses, etc
cpu_tuple_cost = 1.0
cpu_index_tuple_cost = 0.5
cpu_operator_cost = 0.25
effective_cache_size = 1000MB
shared_buffers = 1000MB


-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Wednesday, March 24, 2010 5:47 PM
To: Eger, Patrick
Cc: Christian Brink; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Forcing index scan on query produces 16x faster

On Wed, Mar 17, 2010 at 9:01 PM, Eger, Patrick <peger@automotive.com> wrote:
> I'm running 8.4.2 and have noticed a similar heavy preference for
> sequential scans and hash joins over index scans and nested loops.  Our
> database is can basically fit in cache 100% so this may not be
> applicable to your situation, but the following params seemed to help
> us:
>
> seq_page_cost = 1.0
> random_page_cost = 1.01
> cpu_tuple_cost = 0.0001
> cpu_index_tuple_cost = 0.00005
> cpu_operator_cost = 0.000025
> effective_cache_size = 1000MB
> shared_buffers = 1000MB
>
>
> Might I suggest the Postgres developers reconsider these defaults for
> 9.0 release, or perhaps provide a few sets of tuning params for
> different workloads in the default install/docs? The cpu_*_cost in
> particular seem to be way off afaict. I may be dead wrong though, fwiw
> =)

The default assume that the database is not cached in RAM.  If it is,
you want to lower seq_page_cost and random_page_cost to something much
smaller, and typically make them equal.  I often recommend 0.005, but
I know others have had success with higher values.

Ultimately it would be nice to have a better model of how data gets
cached in shared_buffers and the OS buffer cache, but that is not so
easy.

...Robert

Re: Forcing index scan on query produces 16x faster

От
Robert Haas
Дата:
On Wed, Mar 24, 2010 at 8:59 PM, Eger, Patrick <peger@automotive.com> wrote:
> Ok, the wording is a bit unclear in the documentation as to whether it is the cost for an entire *page* of tuples, or
actualtuples. So something like the following might give better results for a fully-cached DB? 
>
> seq_page_cost = 1.0
> random_page_cost = 1.1 #even memory has random access costs, lack of readahead, TLB misses, etc
> cpu_tuple_cost = 1.0
> cpu_index_tuple_cost = 0.5
> cpu_operator_cost = 0.25
> effective_cache_size = 1000MB
> shared_buffers = 1000MB

Yeah, you can do it that way, by jacking up the cpu_tuple costs.  I
prefer to lower the {random/seq}_page_cost values because it keeps the
cost values in the range I'm used to seeing, but it works out to the
same thing.

I am not sure that there is any benefit from making random_page_cost >
seq_page_cost on a fully cached database.  What does readahead mean in
the context of cached data?  The data isn't likely physically
contiguous in RAM, and I'm not sure it would matter much if it were.
Basically, what random_page_cost > seq_page_cost tends to do is
discourage the use of index scans in borderline cases, so you want to
benchmark it and figure out which way is faster and then tune
accordingly.

...Robert

Re: Forcing index scan on query produces 16x faster

От
Cédric Villemain
Дата:
2010/3/25 Robert Haas <robertmhaas@gmail.com>:
> On Wed, Mar 17, 2010 at 9:01 PM, Eger, Patrick <peger@automotive.com> wrote:
>> I'm running 8.4.2 and have noticed a similar heavy preference for
>> sequential scans and hash joins over index scans and nested loops.  Our
>> database is can basically fit in cache 100% so this may not be
>> applicable to your situation, but the following params seemed to help
>> us:
>>
>> seq_page_cost = 1.0
>> random_page_cost = 1.01
>> cpu_tuple_cost = 0.0001
>> cpu_index_tuple_cost = 0.00005
>> cpu_operator_cost = 0.000025
>> effective_cache_size = 1000MB
>> shared_buffers = 1000MB
>>
>>
>> Might I suggest the Postgres developers reconsider these defaults for
>> 9.0 release, or perhaps provide a few sets of tuning params for
>> different workloads in the default install/docs? The cpu_*_cost in
>> particular seem to be way off afaict. I may be dead wrong though, fwiw
>> =)
>
> The default assume that the database is not cached in RAM.  If it is,
> you want to lower seq_page_cost and random_page_cost to something much
> smaller, and typically make them equal.  I often recommend 0.005, but
> I know others have had success with higher values.
>
> Ultimately it would be nice to have a better model of how data gets
> cached in shared_buffers and the OS buffer cache, but that is not so
> easy.

I have some work on this point with pgfincore project. Getting the
information from the OS is actualy a bit slow but possible. I try to
find time to finish my patch in order to get the info in the
pg_statio_* views :)

>
> ...Robert
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



--
Cédric Villemain