Обсуждение: Index not being used

Поиск
Список
Период
Сортировка

Index not being used

От
Ralph Smith
Дата:
I'm confused.  Shouldn't this index be used?
(It's running on v7.4.7)

airburst=> \d stats2
             Table "public.stats2"
  Column   |         Type          | Modifiers
-----------+-----------------------+-----------
lab       | character varying(30) |
name      | character varying(50) |
status    | character varying(40) |
eventtime | integer               |
username  | character varying(30) |
pkey      | character varying(60) |
Indexes:
    "stats2_etime_index" btree (eventtime)

airburst=> \d stats2_etime_index
Index "public.stats2_etime_index"
  Column   |  Type  
-----------+---------
eventtime | integer
btree, for table "public.stats2"

airburst=> explain select count(*) from stats2 where eventtime > 1167638400 ;
                              QUERY PLAN                              
-----------------------------------------------------------------------
Aggregate  (cost=185247.97..185247.97 rows=1 width=0)
   ->  Seq Scan on stats2  (cost=0.00..179622.45 rows=2250205 width=0)
         Filter: (eventtime > 1167638400)
(3 rows)

Thanks,
Ralph

Re: Index not being used

От
"Scott Marlowe"
Дата:
On 8/13/07, Ralph Smith <smithrn@u.washington.edu> wrote:
> I'm confused.  Shouldn't this index be used?
> (It's running on v7.4.7)
>
> airburst=> \d stats2
>              Table "public.stats2"
>   Column   |         Type          | Modifiers
> -----------+-----------------------+-----------
>  lab       | character varying(30) |
>  name      | character varying(50) |
>  status    | character varying(40) |
>  eventtime | integer               |
>  username  | character varying(30) |
>  pkey      | character varying(60) |
> Indexes:
>     "stats2_etime_index" btree (eventtime)
>
> airburst=> \d stats2_etime_index
> Index "public.stats2_etime_index"
>   Column   |  Type
> -----------+---------
>  eventtime | integer
> btree, for table "public.stats2"
>
> airburst=> explain select count(*) from stats2 where eventtime > 1167638400
> ;
>                               QUERY PLAN
> -----------------------------------------------------------------------
>  Aggregate  (cost=185247.97..185247.97 rows=1 width=0)
>    ->  Seq Scan on stats2  (cost=0.00..179622.45 rows=2250205 width=0)
>          Filter: (eventtime > 1167638400)
> (3 rows)

That really depends.  how many rows are actually returned?  If it's
2250205 like the query planner thinks, and that's a fair chunk of the
table, then no, it shouldn't use an index, a seq scan will be faster.
What does explain analyze select ... say?

Re: Index not being used

От
Ralph Smith
Дата:

On Aug 13, 2007, at 4:39 PM, Scott Marlowe wrote:

On 8/13/07, Ralph Smith <smithrn@u.washington.edu> wrote:
I'm confused.  Shouldn't this index be used?
(It's running on v7.4.7)

airburst=> \d stats2
             Table "public.stats2"
  Column   |         Type          | Modifiers
-----------+-----------------------+-----------
 lab       | character varying(30) |
 name      | character varying(50) |
 status    | character varying(40) |
 eventtime | integer               |
 username  | character varying(30) |
 pkey      | character varying(60) |
Indexes:
    "stats2_etime_index" btree (eventtime)

airburst=> \d stats2_etime_index
Index "public.stats2_etime_index"
  Column   |  Type
-----------+---------
 eventtime | integer
btree, for table "public.stats2"

airburst=> explain select count(*) from stats2 where eventtime > 1167638400
;
                              QUERY PLAN
-----------------------------------------------------------------------
 Aggregate  (cost=185247.97..185247.97 rows=1 width=0)
   ->  Seq Scan on stats2  (cost=0.00..179622.45 rows=2250205 width=0)
         Filter: (eventtime > 1167638400)
(3 rows)


======================================================================

=====================================================================
That really depends.  how many rows are actually returned?  If it's
2250205 like the query planner thinks, and that's a fair chunk of the
table, then no, it shouldn't use an index, a seq scan will be faster.
What does explain analyze select ... say?

======================================================================
Somewhere between 40,000 and 48,000 rows returned the index kicks in.
Out of a table of 7 million rows, that's a fairly common count I have to work with.
It's the amount of activity since August 2nd, this year; NOT that long ago.

Any suggestions on speeding up these queries, other than using more and more tables, thus ruling out the reasonable use of command-line queries?

Ultimately we'll move to some datawarehousing solution, but that's not a 'tomorrow' kind of thing...

Thanks again all,

Ralph
======================================================================

Re: Index not being used

От
"Scott Marlowe"
Дата:
On 8/13/07, Ralph Smith <smithrn@u.washington.edu> wrote:
>
>
>
> On Aug 13, 2007, at 4:39 PM, Scott Marlowe wrote:
>
> On 8/13/07, Ralph Smith <smithrn@u.washington.edu> wrote:
> I'm confused.  Shouldn't this index be used?
> (It's running on v7.4.7)
>
> airburst=> \d stats2
>              Table "public.stats2"
>   Column   |         Type          | Modifiers
> -----------+-----------------------+-----------
>  lab       | character varying(30) |
>  name      | character varying(50) |
>  status    | character varying(40) |
>  eventtime | integer               |
>  username  | character varying(30) |
>  pkey      | character varying(60) |
> Indexes:
>     "stats2_etime_index" btree (eventtime)
>
> airburst=> \d stats2_etime_index
> Index "public.stats2_etime_index"
>   Column   |  Type
> -----------+---------
>  eventtime | integer
> btree, for table "public.stats2"
>
> airburst=> explain select count(*) from stats2 where eventtime > 1167638400
> ;
>                               QUERY PLAN
> -----------------------------------------------------------------------
>  Aggregate  (cost=185247.97..185247.97 rows=1 width=0)
>    ->  Seq Scan on stats2  (cost=0.00..179622.45 rows=2250205 width=0)
>          Filter: (eventtime > 1167638400)
> (3 rows)
>
>
> ======================================================================
>
>
> =====================================================================
> That really depends.  how many rows are actually returned?  If it's
> 2250205 like the query planner thinks, and that's a fair chunk of the
> table, then no, it shouldn't use an index, a seq scan will be faster.
> What does explain analyze select ... say?
> ======================================================================
> Somewhere between 40,000 and 48,000 rows returned the index kicks in.
> Out of a table of 7 million rows, that's a fairly common count I have to
> work with.
> It's the amount of activity since August 2nd, this year; NOT that long ago.
>
> Any suggestions on speeding up these queries, other than using more and more
> tables, thus ruling out the reasonable use of command-line queries?
>
> Ultimately we'll move to some datawarehousing solution, but that's not a
> 'tomorrow' kind of thing...

Hmmmm.  Not sure you answered my question about the explain analyze output.

Note that select count(*) from table is NEVER going to fast in
PostgreSQL compared to MySQL with MyIsam tables, just because of the
difference in design.  The same kind of problem exists for MySQL with
innodb tables.  It's a FAQ, you can find lots of info on it by
searching this forum or reading the FAQ.  Basically it's a question of
visibility.  No matter whether or not the data are in the index, the
db has to look at the table to see if the row is visible.

Oh good night, I just noticed you're running 7.4.7.  Two things.  1:
IMMEDIATELY UPDATE to 7.4.17 or whatever the latest 7.4 series is.
That's easy, you don't have to dump and restore the db for that.  2:
Start planning to upgrade to 8.2.4 now.  7.4 is getting old fast, and
I've found a lot of queries that run much faster (factors faster) in
8.2.4 than they did in 7.4.

That said, that probably won't help on this query a lot.  select
count(*) is expensive.

We can look at how big your shared_buffers are, your work_mem, and a
few others in postgresql.conf.  Make sure your stats are up to date,
and you might have to increase stats target for that field in your
database (see alter table alter column...)

Let us know what explain analyze says.

Re: Index not being used

От
"Scott Marlowe"
Дата:
Oh, and you can use the sledge hammer of tuning by using the

set enable_xxx = off

settings for the planner.  It's not a normal way to tune most queries,
but it certainly can let you know if the problem is using the index or
not.

psql mydb
\timing
select count(*) from table where field > 12345;
set enable_seqscan=off;
select count(*) from table where field > 12345;

and compare them.  run each a few times, since the cache will affect
the performance.

Re: Index not being used

От
"Scott Marlowe"
Дата:
Oh yeah, go read this:

http://www.westnet.com/~gsmith/content/postgresql/pg-5minute.htm

Note that you shouldn't set your shared buffers quite as high as in
that guide, since you're running 7.4 which isn't quite as good at
using shared_buffers

Re: Index not being used

От
Greg Smith
Дата:
On Mon, 13 Aug 2007, Scott Marlowe wrote:

> We can look at how big your shared_buffers are, your work_mem, and a
> few others in postgresql.conf.

That's going to be sort_mem, not work_mem, with 7.4

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD