Обсуждение: Increase Query Speed

Поиск

Список

Период

Сортировка

Increase Query Speed

От

Jamie Kahgee

Дата:

27 июля 2010 г., 12:01:34

I'm trying to increase my query speed for this table, it currently takes 2-3 seconds and would like to make if faster if possible.

my table reads as

Table "public.campaign_impressions"

Column | Type | Modifiers

-----------+-----------------------------+---------------

campaign | integer | not null

page | integer | not null

timestamp | timestamp without time zone | default now()

Indexes:

"campaign_impressions_campaign_idx" btree (campaign)

"campaign_impressions_page_idx" btree (page)

"campaign_impressions_timestamp_idx" btree ("timestamp")

This is the type of query I do on the table (get the page and # of times for each page this campaign was viewed between date x & y)

SELECT page, COUNT(page) AS impressions

FROM campaign_impressions

WHERE campaign = 1

AND timestamp BETWEEN '2010-05-21 00:00:00' AND '2010-07-27 00:00:00'

GROUP BY page

ORDER BY impressions

right now I have ~13 million rows in the table, and the query seems (to me) easy enough. but it feels like 2-3 seconds is a long time. Is there any way I can squeeze more speed out of this? Is there a better way to store the data for the type of info I'm trying to extract? I'm open to all suggestions

Thanks,
Jamie K.

Re: Increase Query Speed

От

Grzegorz Jaśkiewicz

Дата:

27 июля 2010 г., 12:06:54

show us explain analyze on this

Re: Increase Query Speed

От

Jamie Kahgee

Дата:

27 июля 2010 г., 17:00:21

EXPLAIN ANALYZE SELECT page, count(page) as impressions FROM campaign_impressions WHERE campaign = 42 and "timestamp" BETWEEN '2010-05-21 00:00:00' AND '2010-07-27 00:00:00' group by page order by impressions;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Sort (cost=106059.36..106059.40 rows=16 width=4) (actual time=2209.808..2209.816 rows=109 loops=1)

Sort Key: (count(page))

Sort Method: quicksort Memory: 30kB

-> HashAggregate (cost=106058.84..106059.04 rows=16 width=4) (actual time=2209.749..2209.765 rows=109 loops=1)

-> Bitmap Heap Scan on campaign_impressions (cost=19372.78..102534.06 rows=704956 width=4) (actual time=424.023..1980.987 rows=1010896 loops=1)

Recheck Cond: (campaign = 42)

Filter: (("timestamp" >= '2010-05-21 00:00:00'::timestamp without time zone) AND ("timestamp" <= '2010-07-27 00:00:00'::timestamp without time zone))

-> Bitmap Index Scan on campaign_impressions_campaign_idx (cost=0.00..19196.54 rows=1039330 width=0) (actual time=421.587..421.587 rows=1044475 loops=1)

Index Cond: (campaign = 42)

Total runtime: 2209.869 ms

(10 rows)

Thanks,
Jamie K.

2010/7/27 Grzegorz Jaśkiewicz <gryzman@gmail.com>

show us explain analyze on this

Re: Increase Query Speed

От

Alban Hertroys

Дата:

27 июля 2010 г., 17:39:03

On 27 Jul 2010, at 21:48, Jamie Kahgee wrote:

> EXPLAIN ANALYZE SELECT page, count(page) as impressions FROM campaign_impressions WHERE campaign = 42 and "timestamp"
BETWEEN'2010-05-21 00:00:00' AND '2010-07-27 00:00:00' group by page order by impressions; 
>                                                                                    QUERY PLAN

>
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=106059.36..106059.40 rows=16 width=4) (actual time=2209.808..2209.816 rows=109 loops=1)
>    Sort Key: (count(page))
>    Sort Method:  quicksort  Memory: 30kB
>    ->  HashAggregate  (cost=106058.84..106059.04 rows=16 width=4) (actual time=2209.749..2209.765 rows=109 loops=1)

Looks fine up to here.

>          ->  Bitmap Heap Scan on campaign_impressions  (cost=19372.78..102534.06 rows=704956 width=4) (actual
time=424.023..1980.987rows=1010896 loops=1) 
>                Recheck Cond: (campaign = 42)
>                Filter: (("timestamp" >= '2010-05-21 00:00:00'::timestamp without time zone) AND ("timestamp" <=
'2010-07-2700:00:00'::timestamp without time zone)) 

Here's your problem. There are about a million rows matching these criteria.

Now, a million rows of width 4 in 2 seconds is (if I interpret row width correctly) about 2MB/s, so that's possibly not
toppingyour I/O subsystem. That probably means that those rows are all over the table-file, which means Postgres needs
tofetch them through random disk I/O. 

It would probably help to cluster that table on the campaign_impressions_timestamp_idx index. At least most of the rows
willthen be in chronological order, so disk I/O would be much more efficient (if it isn't already in that order, of
course!).

Another possibility would be to create a summary table that sums up the count of pages by day, so that you would only
needto query for the sum of relatively few records. The summary table can then be kept up to date by triggers or
something- that's up to you. 

>                ->  Bitmap Index Scan on campaign_impressions_campaign_idx  (cost=0.00..19196.54 rows=1039330 width=0)
(actualtime=421.587..421.587 rows=1044475 loops=1) 
>                      Index Cond: (campaign = 42)

Nothing wrong here either.

>  Total runtime: 2209.869 ms
> (10 rows)

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4c4f43dc286213192919587!

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Increase Query Speed

Increase Query Speed

Re: Increase Query Speed

Re: Increase Query Speed

Re: Increase Query Speed