Обсуждение: timestamp query doesn't use an index ...
I'm trying to figure out some way to speed up the following query:
select ps2.page_id, ps2.template_component_id, max(ps2.start_time) from page_schedule ps2 where ps2.start_time
attime zone 'MST7MDT' <= '2006-5-17 8:9:18'
group by ps2.page_id, ps2.template_component_id
When run through EXPLAIN ANALYZE, it shows:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=2613.28..2614.17 rows=72 width=16) (actual time=976.629..976.938 rows=128 loops=1) -> Seq Scan
onpage_schedule ps2 (cost=0.00..2364.95 rows=33110 width=16) (actual time=0.021..623.363 rows=94798 loops=1)
Filter:(timezone('MST7MDT'::text, start_time) <= '2006-05-17 08:09:18'::timestamp without time zone) Total runtime:
977.224ms
(4 rows)
I've tried doing a function index, like:
create index start_time_page_schedule on page_schedule using btree ( timezone('MST7MDT'::text,
start_time) );
But, same result ... whereas, if I change the <= to just =, the index is
used, but that is expected ...
Is there some other way I can either write above query *or* do an index,
such that it will use the index?
thanks ...
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org MSN . scrappy@hub.org
Yahoo . yscrappy Skype: hub.org ICQ . 7615664
On May 21, 2006, at 10:42 , Marc G. Fournier wrote: > -> Seq Scan on page_schedule ps2 (cost=0.00..2364.95 > rows=33110 width=16) (actual time=0.021..623.363 rows=94798 loops=1) I don't know about rewriting the query, but it appears your statistics are a little out of date (e.g., rows expected/actual 33110/94798). Does running ANALYZE help? Michael Glaesemann grzm seespotcode net
On Sun, 21 May 2006, Michael Glaesemann wrote:
>
> On May 21, 2006, at 10:42 , Marc G. Fournier wrote:
>
>> -> Seq Scan on page_schedule ps2 (cost=0.00..2364.95 rows=33110
>> width=16) (actual time=0.021..623.363 rows=94798 loops=1)
>
> I don't know about rewriting the query, but it appears your statistics are a
> little out of date (e.g., rows expected/actual 33110/94798). Does running
> ANALYZE help?
the data is idle, just loaded it on my desktop for testing purposes ...
being paranoid, I have been doing a vacuum analyze on the table as I
change the index's *just in case*, but, doing a full analyze on the whole
database doesn't change the results any:
Actually, the above results are odd anyway, since a second run of the
exact same query, shows more normal numbers:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=3051.91..3054.19 rows=183 width=16) (actual time=1030.970..1031.257 rows=128 loops=1) -> Seq
Scanon page_schedule ps2 (cost=0.00..2364.95 rows=91594 width=16) (actual time=0.019..636.599 rows=94798 loops=1)
Filter: (timezone('MST7MDT'::text, start_time) <= '2006-05-17 08:09:18'::timestamp without time zone) Total runtime:
1031.681ms
(4 rows)
So not 100% certain where the 33110/94798 gap came from ;)
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org MSN . scrappy@hub.org
Yahoo . yscrappy Skype: hub.org ICQ . 7615664
"Marc G. Fournier" <scrappy@postgresql.org> writes:
> I'm trying to figure out some way to speed up the following query:
> select ps2.page_id, ps2.template_component_id, max(ps2.start_time)
> from page_schedule ps2
> where ps2.start_time at time zone 'MST7MDT' <= '2006-5-17 8:9:18'
> group by ps2.page_id, ps2.template_component_id
> Is there some other way I can either write above query *or* do an index,
> such that it will use the index?
One-sided inequalities frequently *shouldn't* use an index, because
they're retrieving too much of the table. Are you sure this is fetching
only a small fraction of the table? Are you using PG 8.1 (8.1 would be
likely to try to use a bitmap indexscan for this)?
You could experiment with enable_seqscan = off to see if the planner is
actually wrong about its choice. If so, reducing random_page_cost might
be the best permanent solution.
regards, tom lane
On Sun, 21 May 2006, Tom Lane wrote:
> "Marc G. Fournier" <scrappy@postgresql.org> writes:
>> I'm trying to figure out some way to speed up the following query:
>
>> select ps2.page_id, ps2.template_component_id, max(ps2.start_time)
>> from page_schedule ps2
>> where ps2.start_time at time zone 'MST7MDT' <= '2006-5-17 8:9:18'
>> group by ps2.page_id, ps2.template_component_id
>
>> Is there some other way I can either write above query *or* do an index,
>> such that it will use the index?
>
> One-sided inequalities frequently *shouldn't* use an index, because
> they're retrieving too much of the table. Are you sure this is fetching
> only a small fraction of the table? Are you using PG 8.1 (8.1 would be
> likely to try to use a bitmap indexscan for this)?
>
> You could experiment with enable_seqscan = off to see if the planner is
> actually wrong about its choice. If so, reducing random_page_cost might
> be the best permanent solution.
vrnprd=# select version(); version
------------------------------------------------------------------------------------------------ PostgreSQL 8.1.3 on
i386-portbld-freebsd6.1,compiled by GCC cc (GCC) 3.4.4 [FreeBSD] 20050518
(1 row)
vrnprd=# set enable_seqscan = off;
SET
vrnprd=# explain analyze select ps2.page_id, ps2.template_component_id, max(ps2.start_time) from page_schedule ps2
where ps2.start_time at time zone 'MST7MDT' <= '2006-5-17 8:9:18'
group by ps2.page_id, ps2.template_component_id
; QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=3509.96..3513.50 rows=283 width=16) (actual time=839.460..839.769 rows=128 loops=1) -> Bitmap
HeapScan on page_schedule ps2 (cost=573.65..2822.86 rows=91614 width=16) (actual time=149.788..505.438 rows=94798
loops=1) Recheck Cond: (timezone('MST7MDT'::text, start_time) <= '2006-05-17 08:09:18'::timestamp without time
zone) -> Bitmap Index Scan on start_time_page_schedule (cost=0.00..573.65 rows=91614 width=0) (actual
time=127.761..127.761rows=94798 loops=1) Index Cond: (timezone('MST7MDT'::text, start_time) <=
'2006-05-1708:09:18'::timestamp without time zone) Total runtime: 846.604 ms
(6 rows)
vrnprd=#
And yup, it is definitely returning just 128 rows out of the 93k or so:
110 | 419 | 2005-10-26 13:15:00-03 130 | 215 | 2006-04-26 10:15:00-03
(128 rows)
And, not sure how far to reduce random_page_cost, but it was 4 ... tried 2
and 1, and both come up with the same results ... with seqscan enabled, it
does a seqscan :(
I suspected with the <= there wasn't going to be much I could do with
this, but figured I'd make sure there wasn't something that I was
overlooking :(
Thx ...
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org MSN . scrappy@hub.org
Yahoo . yscrappy Skype: hub.org ICQ . 7615664
"Marc G. Fournier" <scrappy@postgresql.org> writes:
> -> Bitmap Heap Scan on page_schedule ps2 (cost=573.65..2822.86 rows=91614 width=16) (actual
time=149.788..505.438rows=94798 loops=1)
> Recheck Cond: (timezone('MST7MDT'::text, start_time) <= '2006-05-17 08:09:18'::timestamp without time
zone)
> -> Bitmap Index Scan on start_time_page_schedule (cost=0.00..573.65 rows=91614 width=0) (actual
time=127.761..127.761rows=94798 loops=1)
> Index Cond: (timezone('MST7MDT'::text, start_time) <= '2006-05-17 08:09:18'::timestamp without time
zone)
> And yup, it is definitely returning just 128 rows out of the 93k or so:
No, the scan is pulling 94798 rows from the table, according to the
EXPLAIN ANALYZE --- the number of resulting groups isn't much of a
factor here.
We don't currently have any index optimization for MIN/MAX in a GROUP BY
context, and even if we did, it wouldn't apply here: the planner
couldn't assume that the sort order of an index on "start_time at time
zone 'MST7MDT'" would have anything to do with the ordering of just
"start_time". Is there a reason you're writingwhere ps2.start_time at time zone 'MST7MDT' <= '2006-5-17 8:9:18'
and notwhere ps2.start_time <= '2006-5-17 8:9:18' at time zone 'MST7MDT'
The latter seems less likely to have strange behaviors near DST
transitions. I don't think it'll be any faster at the moment, but you
could at least save maintaining a specialized index.
regards, tom lane