Обсуждение: planner question re index vs seqscan

Поиск
Список
Период
Сортировка

planner question re index vs seqscan

От
Ed Loehr
Дата:
I have 2 tables with indices as follows:
Table "activity" (~4000 rows) id           serial start_time   timestamp not null stop_time    timestamp not null ...
CREATE INDEX activity_start_time ON activity (start_time)CREATE INDEX activity_stop_time ON activity (stop_time)
Table "activity_hr_need" (~2000 rows) id            serial activity_id   integer not null hr_type_id    integer not
nullhr_count      integer not null ...
 
CREATE UNIQUE INDEX activity_hr_need_pkey     ON activity_hr_need (activity_id, hr_type_id)CREATE INDEX
activity_hr_need_hrtid    ON activity_hr_need (hr_type_id)CREATE INDEX activity_hr_need_aid     ON activity_hr_need
(activity_idint4_ops)
 

QUESTION:  Why doesn't the planner, just after 'vacuum analyze', use the
provided indices for this query?  How can I tweak it to use the indices?

sdb=# EXPLAIN SELECT ahrn.hr_type_id AS "Resource Type", 
sdb-#        SUM(ahrn.hr_count) AS "Planned Consulting Days"
sdb-# FROM activity a, activity_hr_need ahrn
sdb-# WHERE a.start_time::date >= '1-Jun-2000'::date
sdb-#   AND a.stop_time::date <= '1-Jul-2000'::date
sdb-#   AND ahrn.activity_id = a.id
sdb-# GROUP BY "Resource Type";
NOTICE:  QUERY PLAN:

Aggregate  (cost=243.74..244.58 rows=17 width=16) ->  Group  (cost=243.74..244.16 rows=169 width=16)       ->  Sort
(cost=243.74..243.74rows=169 width=16)             ->  Hash Join  (cost=142.65..237.50 rows=169 width=16)
   ->  Seq Scan on activity_hr_need ahrn 
 
(cost=0.00..53.58 rows=2358 width=12)                   ->  Hash  (cost=141.60..141.60 rows=420 width=4)
        ->  Seq Scan on activity a  (cost=0.00..141.60
 
rows=420 width=4)


Regards,
Ed Loehr


Re: planner question re index vs seqscan

От
Jeff Hoffmann
Дата:
Ed Loehr wrote:

> QUESTION:  Why doesn't the planner, just after 'vacuum analyze', use the
> provided indices for this query?  How can I tweak it to use the indices?
> 
> sdb=# EXPLAIN SELECT ahrn.hr_type_id AS "Resource Type",
> sdb-#        SUM(ahrn.hr_count) AS "Planned Consulting Days"
> sdb-# FROM activity a, activity_hr_need ahrn
> sdb-# WHERE a.start_time::date >= '1-Jun-2000'::date
> sdb-#   AND a.stop_time::date <= '1-Jul-2000'::date
> sdb-#   AND ahrn.activity_id = a.id
> sdb-# GROUP BY "Resource Type";
> NOTICE:  QUERY PLAN:

dump the typecasting in the query and try again.  not sure if it'll
work, but it's worth a try.  typecasting has an annoying effect of
disabling index scans in some cases even when you'd swear logically that
they should be used.  if that doesn't help, it's possible that it just
shouldn't be using the indexes based on cost estimates.  try shutting
off the sequential scan with "set enable_seqscan=off" before the query
to check if that's the case.  

-- 

Jeff Hoffmann
PropertyKey.com


Re: planner question re index vs seqscan

От
Tom Lane
Дата:
Ed Loehr <eloehr@austin.rr.com> writes:
> QUESTION:  Why doesn't the planner, just after 'vacuum analyze', use the
> provided indices for this query?  How can I tweak it to use the indices?

> sdb=# EXPLAIN SELECT ahrn.hr_type_id AS "Resource Type", 
> sdb-#        SUM(ahrn.hr_count) AS "Planned Consulting Days"
> sdb-# FROM activity a, activity_hr_need ahrn
> sdb-# WHERE a.start_time::date >= '1-Jun-2000'::date
> sdb-#   AND a.stop_time::date <= '1-Jul-2000'::date
> sdb-#   AND ahrn.activity_id = a.id
> sdb-# GROUP BY "Resource Type";

At least part of the problem is that you have two separate one-sided
inequalities, neither one of which is very selective by itself ---
and of course the planner has no idea that there might be any semantic
connection between "start_time" and "stop_time".  You could help it out
by providing something it can recognize as a range restriction on one
index or the other.  For example:
WHERE a.start_time::date >= '1-Jun-2000'::date  AND a.start_time::date <= '1-Jul-2000'::date  AND a.stop_time::date <=
'1-Jul-2000'::date AND ahrn.activity_id = a.id
 
        regards, tom lane


Re: planner question re index vs seqscan

От
Tom Lane
Дата:
Jeff Hoffmann <jeff@propertykey.com> writes:
>> QUESTION:  Why doesn't the planner, just after 'vacuum analyze', use the
>> provided indices for this query?  How can I tweak it to use the indices?
>> 
>> sdb=# EXPLAIN SELECT ahrn.hr_type_id AS "Resource Type",
>> sdb-#        SUM(ahrn.hr_count) AS "Planned Consulting Days"
>> sdb-# FROM activity a, activity_hr_need ahrn
>> sdb-# WHERE a.start_time::date >= '1-Jun-2000'::date
>> sdb-#   AND a.stop_time::date <= '1-Jul-2000'::date
>> sdb-#   AND ahrn.activity_id = a.id
>> sdb-# GROUP BY "Resource Type";

> dump the typecasting in the query and try again.  not sure if it'll
> work, but it's worth a try.  typecasting has an annoying effect of
> disabling index scans in some cases even when you'd swear logically that
> they should be used.

Oh, that's a good point --- if the start_time and stop_time columns are
not of type date then the above is guaranteed not to be indexscanable,
because what you've really written is
WHERE date(a.start_time) >= '1-Jun-2000'::date  AND date(a.stop_time) <= '1-Jul-2000'::date

It might be able to use a functional index on date(start_time) or
date(stop_time), but not a straight index on the timestamp columns.

A good rule of thumb is not to use casts unless you have no choice...
        regards, tom lane


Re: planner question re index vs seqscan

От
Ed Loehr
Дата:
Tom Lane wrote:
> 
> Jeff Hoffmann <jeff@propertykey.com> writes:
> >> QUESTION:  Why doesn't the planner, just after 'vacuum analyze', use the
> >> provided indices for this query?  How can I tweak it to use the indices?
> 
> > dump the typecasting in the query and try again.  not sure if it'll
> > work, but it's worth a try.  typecasting has an annoying effect of
> > disabling index scans in some cases even when you'd swear logically that
> > they should be used.

I dropped the typecasting, but that had no visible effect.  Adding the
additional predicate to the where clause as Tom suggested had the desired
effect of replacing one seqscan with an index scan.  But I'm still
wondering why it is still doing a seq scan on the "ahrn.activity_id =
a.id" part when both of those integer columns are indexed??

EXPLAIN SELECT ahrn.hr_type_id AS "Resource Type",       SUM(ahrn.hr_count) AS "Planned Consulting Days"
FROM activity a, activity_hr_need ahrn
WHERE a.start_time >= '1-Jun-2000' AND a.stop_time <= '1-Jul-2000' AND a.start_time <= '1-Jul-2000' AND
ahrn.activity_id= a.id
 
GROUP BY "Resource Type";
QUERY PLAN:

Aggregate  (cost=137.12..137.16 rows=1 width=16) ->  Group  (cost=137.12..137.14 rows=7 width=16)       ->  Sort
(cost=137.12..137.12rows=7 width=16)             ->  Hash Join  (cost=47.86..137.04 rows=7 width=16)
-> Seq Scan on activity_hr_need ahrn 
 
(cost=0.00..53.58 rows=2358 width=12)                   ->  Hash  (cost=47.82..47.82 rows=16 width=4)
     ->  Index Scan using activity_start_time on
 
activity a  (cost=0.00..47.82 rows=16 width=4)

Regards,
Ed Loehr


Re: planner question re index vs seqscan

От
Tom Lane
Дата:
Ed Loehr <eloehr@austin.rr.com> writes:
> But I'm still wondering why it is still doing a seq scan on the
> "ahrn.activity_id = a.id" part when both of those integer columns are
> indexed??

Presumably because it thinks the hash join is cheaper than a nestloop
or merge join would be ... although that seems kinda surprising.  What
plans do you get if you try various combinations ofset enable_hashjoin = off;set enable_mergejoin = off;set
enable_nestloop= off;
 
How do the cost estimates compare against the actual runtimes for
doing the query each way?
        regards, tom lane