Обсуждение: Query running slow

Поиск
Список
Период
Сортировка

Query running slow

От
"samantha mahindrakar"
Дата:
Hi....
Iam finding the following query is working a bit slow:
EXECUTE '(SELECT ARRAY(SELECT DISTINCT date_part(''day'', measurement_start)
FROM ' || gettablestring(dates)|| '
WHERE lane_id IN (' || lanesidarr || ')))'
INTO temparr;

This function is trying to find all the days in a prticular month
whihc has data for the particular lane and put it in an array...which
can be used later.
gettablestring(dates) returns the partition name from which the data
needs to be extracted. These partitions have index on the
measurement_start field.
lanesidarr is a lane number. The partition has an index on this field to.
Could anyone give me some hints???/

Thanks
Sam

Re: Query running slow

От
PFC
Дата:
On Wed, 16 Apr 2008 23:14:11 +0200, samantha mahindrakar
<sam.mahindrakar@gmail.com> wrote:

> Hi....
> Iam finding the following query is working a bit slow:
> EXECUTE '(SELECT ARRAY(SELECT DISTINCT date_part(''day'',
> measurement_start)
> FROM ' || gettablestring(dates)|| '
> WHERE lane_id IN (' || lanesidarr || ')))'
> INTO temparr;
>
> This function is trying to find all the days in a prticular month
> whihc has data for the particular lane and put it in an array...which
> can be used later.
> gettablestring(dates) returns the partition name from which the data
> needs to be extracted. These partitions have index on the
> measurement_start field.
> lanesidarr is a lane number. The partition has an index on this field to.
> Could anyone give me some hints???/

    OK so I guess you have one partition per month since there is no month in
your WHERE.
    If this is a table which hasn't got much write activity (probably the
case for last month's partition, for instance), CLUSTER it on something
appropriate that you use often in queries, like lane_id here.
    And you can use SELECT foo GROUP BY foo, this will use a hash, it is
faster than a sort.
    Example :

CREATE TABLE blop AS SELECT '2008-01-01'::TIMESTAMP + ((n%30)*'1
DAY'::INTERVAL) AS t FROM generate_series(1,100000) AS n;
ALTER TABLE blop ADD d DATE NULL;
UPDATE blop SET d=t;
VACUUM FULL ANALYZE blop;

-- Now blop contains 100K timestamps and 100K dates from the month 2008-01

EXPLAIN ANALYZE SELECT DISTINCT EXTRACT( DAY from t )  FROM blop;
                                                        QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
  Unique  (cost=10051.82..10551.82 rows=30 width=8) (actual
time=221.740..289.801 rows=30 loops=1)
    ->  Sort  (cost=10051.82..10301.82 rows=100000 width=8) (actual
time=221.737..250.911 rows=100000 loops=1)
          Sort Key: (date_part('day'::text, t))
          Sort Method:  quicksort  Memory: 5955kB
          ->  Seq Scan on blop  (cost=0.00..1747.00 rows=100000 width=8)
(actual time=0.021..115.254 rows=100000 loops=1)
  Total runtime: 290.237 ms
(6 lignes)

Temps : 290,768 ms

EXPLAIN ANALYZE SELECT EXTRACT( DAY from t ) AS day FROM blop GROUP BY day;
                                                     QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
  HashAggregate  (cost=1997.00..1997.38 rows=30 width=8) (actual
time=198.375..198.390 rows=30 loops=1)
    ->  Seq Scan on blop  (cost=0.00..1747.00 rows=100000 width=8) (actual
time=0.021..129.779 rows=100000 loops=1)
  Total runtime: 198.437 ms
(3 lignes)

Temps : 198,894 ms

==> Hash is faster than Sort

EXPLAIN ANALYZE SELECT d FROM blop GROUP BY d;
                                                     QUERY PLAN
------------------------------------------------------------------------------------------------------------------
  HashAggregate  (cost=1747.00..1747.30 rows=30 width=4) (actual
time=101.829..101.842 rows=30 loops=1)
    ->  Seq Scan on blop  (cost=0.00..1497.00 rows=100000 width=4) (actual
time=0.012..33.428 rows=100000 loops=1)
  Total runtime: 101.905 ms
(3 lignes)

Temps : 102,516 ms

==> Not computing the EXTRACT is faster obviously

(actually EXPLAIN ANALYZE adds some overhead, the query really takes 60 ms)


    If you have an index lane_id, measurement_date, you can always do :

for day in 1..31:
    find 1 row with which has this day
reutrn the days you found