Обсуждение: Postgresql 7.4.8 inconsistent index usage

Поиск
Список
Период
Сортировка

Postgresql 7.4.8 inconsistent index usage

От
Stephen Bowman
Дата:
Hello,

I'm experiencing inconsistent usage of an index that I cannot explain.
 This is in postgresql 7.4.8.  Details are as follows:

I have a relatively large table (~3.5 million rows):

SCANS=# \d nessus_results;
                                        Table "public.nessus_results"
   Column    |         Type          |
Modifiers
-------------+-----------------------+-----------------------------------------------------------------------
 result_id   | integer               | not null default
nextval('public.nessus_results_result_id_seq'::text)
 scan_id     | integer               | not null
 ip          | inet                  | not null
 port        | integer               | not null
 service     | character varying(32) | not null
 plugin_id   | integer               |
 criticality | character varying(16) |
 description | character varying     |
Indexes:
    "nessus_results_pkey" primary key, btree (result_id)
    "nessus_results_scan_id" btree (scan_id)
    "nessus_results_scan_id_criticality" btree (scan_id, ip, criticality)
    "nessus_results_scan_id_result_id" btree (result_id, scan_id)
Foreign-key constraints:
    "$1" FOREIGN KEY (scan_id) REFERENCES nessus_scans(scan_id)
    "$2" FOREIGN KEY (ip) REFERENCES hosts(ip)

There are approximately 100 unique scan_ids in this table.  The
following should not happen as far as I can tell:

SCANS=# explain select * from nessus_results where scan_id = 55;
                               QUERY PLAN
-------------------------------------------------------------------------
 Seq Scan on nessus_results  (cost=0.00..127170.34 rows=42640 width=169)
   Filter: (scan_id = 55)
(2 rows)

SCANS=# explain select * from nessus_results where scan_id = 56;
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Index Scan using nessus_results_scan_id on nessus_results
(cost=0.00..126632.83 rows=41813 width=169)
   Index Cond: (scan_id = 56)
(2 rows)

SCANS=#

Both scan_ids (55, 56) exist.  Yes, I've analyzed the table.  I've
also tried upping the number of statistics to 100, with no apparent
change.

Thanks,
--Stephen

Re: Postgresql 7.4.8 inconsistent index usage

От
Michael Fuhr
Дата:
On Fri, Jul 08, 2005 at 11:11:46AM -0400, Stephen Bowman wrote:
>
> SCANS=# explain select * from nessus_results where scan_id = 55;
>                                QUERY PLAN
> -------------------------------------------------------------------------
>  Seq Scan on nessus_results  (cost=0.00..127170.34 rows=42640 width=169)
>    Filter: (scan_id = 55)
> (2 rows)
>
> SCANS=# explain select * from nessus_results where scan_id = 56;
>                                                QUERY PLAN
> --------------------------------------------------------------------------------------------------------
>  Index Scan using nessus_results_scan_id on nessus_results (cost=0.00..126632.83 rows=41813 width=169)
>    Index Cond: (scan_id = 56)
> (2 rows)

It looks like you're right at the edge of where the planner thinks
a sequential scan would be faster than an index scan.  The planner
estimates that scan_id = 55 will produce more rows than scan_id = 56
(42640 vs. 41813), which is probably just enough to make the estimated
cost for an index scan higher than for a sequential scan.  Could
you post the EXPLAIN ANALYZE output for these queries so we can see
how realistic the estimates are?  It might also be useful to see
them both with (enable_seqscan = on, enable_indexscan = off) and
then with (enable_seqscan = off, enable_indexscan = on).

Some people lower random_page_cost from the default of 4 to reduce
the estimated cost of an index scan.  Beware of tweaking cost
estimate settings based on one particular query, though.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: Postgresql 7.4.8 inconsistent index usage

От
Stephen Bowman
Дата:
On 7/8/05, Michael Fuhr <mike@fuhr.org> wrote:
> On Fri, Jul 08, 2005 at 11:11:46AM -0400, Stephen Bowman wrote:
> >
> > SCANS=# explain select * from nessus_results where scan_id = 55;
> >                                QUERY PLAN
> > -------------------------------------------------------------------------
> >  Seq Scan on nessus_results  (cost=0.00..127170.34 rows=42640 width=169)
> >    Filter: (scan_id = 55)
> > (2 rows)
> >
> > SCANS=# explain select * from nessus_results where scan_id = 56;
> >                                                QUERY PLAN
> > --------------------------------------------------------------------------------------------------------
> >  Index Scan using nessus_results_scan_id on nessus_results (cost=0.00..126632.83 rows=41813 width=169)
> >    Index Cond: (scan_id = 56)
> > (2 rows)
>
> It looks like you're right at the edge of where the planner thinks
> a sequential scan would be faster than an index scan.  The planner
> estimates that scan_id = 55 will produce more rows than scan_id = 56
> (42640 vs. 41813), which is probably just enough to make the estimated
> cost for an index scan higher than for a sequential scan.  Could
> you post the EXPLAIN ANALYZE output for these queries so we can see
> how realistic the estimates are?  It might also be useful to see
> them both with (enable_seqscan = on, enable_indexscan = off) and
> then with (enable_seqscan = off, enable_indexscan = on).
>
> Some people lower random_page_cost from the default of 4 to reduce
> the estimated cost of an index scan.  Beware of tweaking cost
> estimate settings based on one particular query, though.
>
> --
> Michael Fuhr
> http://www.fuhr.org/~mfuhr/
>

Sure:

=== Defaults: ===
SCANS=# explain analyze select * from nessus_results where scan_id = 56;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using nessus_results_scan_id on nessus_results
(cost=0.00..126632.83 rows=41813 width=169) (actual
time=0.090..137.883 rows=41199 loops=1)
  Index Cond: (scan_id = 56)
 Total runtime: 180.431 ms
(3 rows)

SCANS=# explain analyze select * from nessus_results where scan_id = 55;
                                                        QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
 Seq Scan on nessus_results  (cost=0.00..127170.34 rows=42640
width=169) (actual time=1612.537..2425.909 rows=41507 loops=1)
  Filter: (scan_id = 55)
 Total runtime: 2469.605 ms
(3 rows)

=== enable_seqscan off, enable_indexscan on ===

SCANS=# SET enable_seqscan = off;
SET
SCANS=# explain analyze select * from nessus_results where scan_id = 56;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using nessus_results_scan_id on nessus_results
(cost=0.00..126632.83 rows=41813 width=169) (actual
time=0.086..138.420 rows=41199 loops=1)
  Index Cond: (scan_id = 56)
 Total runtime: 181.712 ms
(3 rows)

SCANS=# explain analyze select * from nessus_results where scan_id = 55;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using nessus_results_scan_id on nessus_results
(cost=0.00..129136.46 rows=42640 width=169) (actual
time=0.066..139.351 rows=41507 loops=1)
  Index Cond: (scan_id = 55)
 Total runtime: 182.934 ms
(3 rows)

SCANS=#

=== enable_seqscan on, enable_indexscan off ===

SCANS=# set enable_seqscan =on;
SET
SCANS=# set enable_indexscan =off;
SET
SCANS=# explain analyze select * from nessus_results where scan_id = 56;
                                                        QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
 Seq Scan on nessus_results  (cost=0.00..127170.34 rows=41813
width=169) (actual time=1640.184..2422.106 rows=41199 loops=1)
  Filter: (scan_id = 56)
 Total runtime: 2464.834 ms
(3 rows)

SCANS=# explain analyze select * from nessus_results where scan_id = 55;
                                                        QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
 Seq Scan on nessus_results  (cost=0.00..127170.34 rows=42640
width=169) (actual time=1612.734..2425.494 rows=41507 loops=1)
  Filter: (scan_id = 55)
 Total runtime: 2469.415 ms
(3 rows)

Clearly it needs to use the index =)

Re: Postgresql 7.4.8 inconsistent index usage

От
Michael Fuhr
Дата:
On Fri, Jul 08, 2005 at 12:04:07PM -0400, Stephen Bowman wrote:
>
> Clearly it needs to use the index =)

Indeed -- now to figure out why the estimates for index scans are
so high.  The row count estimates are almost spot-on, so that's not
it.  What are your settings for the following configuration variables?

shared_buffers
random_page_cost
effective_cache_size
cpu_tuple_cost
cpu_index_tuple_cost

How much RAM do you have?  Have you set shared_buffers and
effective_cache_size accordingly?  The default values are pretty
low for most modern equipment; see the following for tips on
choosing appropriate values:

http://www.powerpostgresql.com/PerfList/

You mentioned that you've analyzed the table, but have you also
vacuumed it recently?

BTW, I should have mentioned earlier that this thread might be
on-topic in pgsql-performance.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: Postgresql 7.4.8 inconsistent index usage

От
Stephen Bowman
Дата:
On 7/8/05, Michael Fuhr <mike@fuhr.org> wrote:
> On Fri, Jul 08, 2005 at 12:04:07PM -0400, Stephen Bowman wrote:
> >
> > Clearly it needs to use the index =)
>
> Indeed -- now to figure out why the estimates for index scans are
> so high.  The row count estimates are almost spot-on, so that's not
> it.  What are your settings for the following configuration variables?
>
> shared_buffers
> random_page_cost
> effective_cache_size
> cpu_tuple_cost
> cpu_index_tuple_cost
>
> How much RAM do you have?  Have you set shared_buffers and
> effective_cache_size accordingly?  The default values are pretty
> low for most modern equipment; see the following for tips on
> choosing appropriate values:
>
> http://www.powerpostgresql.com/PerfList/
>
> You mentioned that you've analyzed the table, but have you also
> vacuumed it recently?
>
> BTW, I should have mentioned earlier that this thread might be
> on-topic in pgsql-performance.
>
> --
> Michael Fuhr
> http://www.fuhr.org/~mfuhr/
>

This is on a Xeon 3ghz with 2gb of RAM.  There are 2 SCSI U/320 disks.

For the variables, I have everything defaulted except for:

shared_buffers = 20000
effective_cache_size = 68916

I just added random_page_cost=3 and it seems to have fixed it.  Do the
values I have set for these variables make sense with my hardware?
The database is large (~15g).

My apologies on the wrong mailing list - next time I'll use -perf.

Thanks,
--Stephen

Re: Postgresql 7.4.8 inconsistent index usage

От
Michael Fuhr
Дата:
On Fri, Jul 08, 2005 at 03:06:29PM -0400, Stephen Bowman wrote:
>
> This is on a Xeon 3ghz with 2gb of RAM.  There are 2 SCSI U/320 disks.
>
> For the variables, I have everything defaulted except for:
>
> shared_buffers = 20000
> effective_cache_size = 68916
>
> I just added random_page_cost=3 and it seems to have fixed it.  Do the
> values I have set for these variables make sense with my hardware?

shared_buffers (156M) looks in line with various tuning guides'
recommendations given 2G of RAM; effective_cache_size (538M) might
(or might not) be low, depending on how much memory other processes
are using up.  Tuning guides like the one I mentioned and the folks
in pgsql-performance should be able to help determine good values
to use.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/