Обсуждение: Seq Scan but I think it should be Index Scan

Поиск
Список
Период
Сортировка

Seq Scan but I think it should be Index Scan

От
Edoceo Lists
Дата:
So the details of it:
I'm using PostgreSQL 8.0.3 on a Gentoo kernel 2.6.12 on a P4/2.8G+HT proc, with kernel RAID0 on some SATA drives and 1G

RAM.  Don't know the bus speed.  I'm thinking that my queries are not using indexs correctly and therefore taking
longer 
to complete than they should.  I've put the details below, but changed some names.  If anyone could shed some light?

pg_config  --configure
'--prefix=/usr' '--mandir=/usr/share/man' '--host=i686-pc-linux-gnu' '--with-docdir=/usr/share/doc/postgresql-8.0.3'
'--libdir=/usr/lib' '--enable-depend' '--with-gnu-ld' '--with-perl' '--with-openssl' '--enable-nls' '--with-pam'
'CFLAGS=-O2 -march=pentium4' 'host_alias=i686-pc-linux-gnu'


data=# explain analyze select count(id) from x_base where x_type < 100 and x_date<='2005-10-26' and x_time<'06:00:00';
                                                           QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=539453.36..539453.36 rows=1 width=4) (actual time=66200.763..66200.764 rows=1 loops=1)
    ->  Seq Scan on x_base  (cost=0.00..539136.18 rows=126871 width=4) (actual time=31618.624..66174.710 rows=37451
loops=1)
          Filter: ((x_type < 100) AND (x_date <= '2005-10-26'::date) AND (x_time < '06:00:00'::time without time zone))
  Total runtime: 66200.811 ms

data=# \d x_base
                                          Table "public.x_base"
        Column        |          Type          |                        Modifiers
---------------------+------------------------+----------------------------------------------------------
  id                  | integer                | not null default nextval('public.x_base_id_seq'::text)
  x_code              | character(8)           |
  x_date              | date                   | not null
  x_time              | time without time zone | not null
  a                   | character(1)           |
  b                   | integer                |
  c                   | character(5)           |
  d                   | character(16)          |
  e                   | character(1)           |
  f                   | character(1)           |
  g                   | character(10)          |
  h                   | character(1)           |
  i                   | character(1)           |
  j                   | character varying(32)  |
  k                   | integer                |
  l                   | integer                |
  m                   | integer                |
  n                   | character varying(32)  |
  o                   | integer                |
  p                   | character varying(14)  |
  q                   | integer                |
Indexes:
     "x_base_pkey" PRIMARY KEY, btree (id)
     "ix_d_cd" btree (x_date)
     "ix_t_cb" btree (x_type)
Foreign-key constraints:
     "fk_k_id" FOREIGN KEY (k) REFERENCES x_file(id)

Now, see that x_type index?  Why didn't this thing Index Scan "ix_t_cb" on that column?  Me thinks if it had my query
would be much faster.  Or perhaps if I only where x_type?  I tried that but it still took a minute.  I took out count()

and it still took a minute.  Always using Seq Scan, am I doing something dumb here?  There are more than six million
records in that table, maybe thats just how long it takes?  Perhaps I should change architecture or schema to improve
performance?  Tweak the log? Thanks.

/djb


Re: Seq Scan but I think it should be Index Scan

От
Michael Fuhr
Дата:
On Wed, Oct 26, 2005 at 06:55:54PM -0700, Edoceo Lists wrote:
> I'm thinking that my queries are not using indexs correctly and
> therefore taking longer to complete than they should.

Index scans aren't necessarily faster than sequential scans: if the
query reads a significant amount of the table then a sequential
scan can be faster.

> data=# explain analyze select count(id) from x_base where x_type < 100 and x_date<='2005-10-26' and
x_time<'06:00:00';
>                                                            QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------
>   Aggregate  (cost=539453.36..539453.36 rows=1 width=4) (actual time=66200.763..66200.764 rows=1 loops=1)
>     ->  Seq Scan on x_base  (cost=0.00..539136.18 rows=126871 width=4) (actual time=31618.624..66174.710 rows=37451
loops=1)
>           Filter: ((x_type < 100) AND (x_date <= '2005-10-26'::date) AND (x_time < '06:00:00'::time without time
zone))
>   Total runtime: 66200.811 ms

Notice that the estimated row count (126871) is much higher than
the actual row count (37451).  It's possible that the planner would
prefer an index scan if the row count estimate was more accurate.
Has this table been vacuumed and analyzed recently?  If so then you
might get more accurate estimates by increasing columns' statistics
targets with ALTER TABLE ... SET STATISTICS.  If you do that then
be sure to analyze the table afterwards to update the planner's
statistics.

> Now, see that x_type index?  Why didn't this thing Index Scan "ix_t_cb" on
> that column?  Me thinks if it had my query would be much faster.

No need to guess: disable sequential scans and see if an index scan
is faster.

SET enable_seqscan TO off;
EXPLAIN ANALYZE SELECT ...

Run the query several times with and without sequential scans to
make sure that timing differences aren't due more to disk caching
than to the query plan.

Have you adjusted any settings in postgresql.conf?  With 1G RAM the
defaults are probably too conservative.  In particular, you could
probably use a much higher effective_cache_size than the default,
and that's one of the settings that the planner uses when considering
whether to do an index scan.  Some people also see performance
improvements by lowering random_page_cost, although doing so isn't
really correct.

BTW, pgsql-performance might be a better list to post performance
questions.

--
Michael Fuhr