Обсуждение: Select running slow on Postgres

Поиск
Список
Период
Сортировка

Select running slow on Postgres

От
"samantha mahindrakar"
Дата:
Hi
I have a select statement that runs on a partition having say couple
million rows.
The tabel has indexes on two colums. However the query uses the
non-indexed colums too in its where clause.
For example:
SELECT lane_id,measurement_start,
measurement_end,speed,volume,occupancy,quality,effective_date
 FROM tss.lane_data_06_08
 WHERE lane_id in(select lane_id from lane_info where inactive is  null )
 AND date_part('hour', measurement_start) between 5 and 23
 AND date_part('day',measurement_start)=30
GROUP BY lane_id,measurement_start,measurement_end,speed,volume,occupancy,quality,effective_date
ORDER BY lane_id, measurement_start

out of this only lane_id and mesaurement_start are indexed. This query
will return around 10,000 rows. But it seems to be taking a long time
to execute which doesnt make sense for a select statement. It doesnt
make any sense to create index for every field we are gonna use in tne
where clause.
Isnt there any way we can improve the performance?


Samantha

Re: Select running slow on Postgres

От
"Scott Marlowe"
Дата:
On Tue, Jul 1, 2008 at 1:29 PM, samantha mahindrakar
<sam.mahindrakar@gmail.com> wrote:
> Hi
> I have a select statement that runs on a partition having say couple
> million rows.
> The tabel has indexes on two colums. However the query uses the
> non-indexed colums too in its where clause.
> For example:
> SELECT lane_id,measurement_start,
> measurement_end,speed,volume,occupancy,quality,effective_date
>  FROM tss.lane_data_06_08
>  WHERE lane_id in(select lane_id from lane_info where inactive is  null )
>  AND date_part('hour', measurement_start) between 5 and 23
>  AND date_part('day',measurement_start)=30
> GROUP BY lane_id,measurement_start,measurement_end,speed,volume,occupancy,quality,effective_date
> ORDER BY lane_id, measurement_start
>
> out of this only lane_id and mesaurement_start are indexed. This query
> will return around 10,000 rows. But it seems to be taking a long time
> to execute which doesnt make sense for a select statement. It doesnt
> make any sense to create index for every field we are gonna use in tne
> where clause.
> Isnt there any way we can improve the performance?

I'm guessing that adding an index for either
date_part('hour',measurement_start) or
date_part('day',measurement_start) or both would help.

What does explain analyze select ... (rest of query here) say?

Re: Select running slow on Postgres

От
"samantha mahindrakar"
Дата:
I ran the explain analyze.Here is what i got:


"Group  (cost=112266.37..112266.40 rows=1 width=56) (actual time=5583.399..5615.476 rows=13373 loops=1)"
"  ->  Sort  (cost=112266.37..112266.38 rows=1 width=56) (actual time=5583.382..5590.890 rows=13373 loops=1)"
"        Sort Key: lane_data_07_08.lane_id, lane_data_07_08.measurement_start, lane_data_07_08.measurement_end, lane_data_07_08.speed, lane_data_07_08.volume, lane_data_07_08.occupancy, lane_data_07_08.quality, lane_data_07_08.effective_date"
"        ->  Nested Loop IN Join  (cost=0.00..112266.36 rows=1 width=56) (actual time=1100.307..5547.768 rows=13373 loops=1)"
"              ->  Seq Scan on lane_data_07_08  (cost=0.00..112241.52 rows=3 width=56) (actual time=1087.666..5341.662 rows=20581 loops=1)"
"                    Filter: (((volume = 255::double precision) OR (speed = 255::double precision) OR (occupancy = 255::double precision) OR (occupancy >= 100::double precision) OR (volume > 52::double precision) OR (volume < 0::double precision) OR (speed > 120::double precision) OR (speed < 0::double precision)) AND (date_part('hour'::text, measurement_start) >= 5::double precision) AND (date_part('hour'::text, measurement_start) <= 23::double precision) AND (date_part('day'::text, measurement_start) = 1::double precision))"
"              ->  Index Scan using lane_info_pk on lane_info  (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=20581)"
"                    Index Cond: (lane_data_07_08.lane_id = lane_info.lane_id)"
"                    Filter: (inactive IS NULL)"
"Total runtime: 5621.409 ms"


Well instaed of creating extra indexes (since they eat up lot of space) i made use of the whole measurement_start field, so thet it uses the index proeprty and makes the search faster.
So i changed the query to include the measuerment start as follows:

SELECT lane_id,measurement_start,
measurement_end,speed,volume,occupancy,quality,effective_date
FROM tss.lane_data_06_08
WHERE lane_id in(select lane_id from lane_info where inactive is  null )
AND measurement_start between '2008-06-30 05:00:00-04' AND  '2008-06-30 23:00:00-04'
GROUP BY lane_id,measurement_start,measurement_end,speed,volume,occupancy,quality,effective_date
ORDER BY lane_id, measurement_start


Samantha

On 7/1/08, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Tue, Jul 1, 2008 at 1:29 PM, samantha mahindrakar
> <sam.mahindrakar@gmail.com> wrote:
> > Hi
> > I have a select statement that runs on a partition having say couple
> > million rows.
> > The tabel has indexes on two colums. However the query uses the
> > non-indexed colums too in its where clause.
> > For example:
> > SELECT lane_id,measurement_start,
> > measurement_end,speed,volume,occupancy,quality,effective_date
> >  FROM tss.lane_data_06_08
> >  WHERE lane_id in(select lane_id from lane_info where inactive is  null )
> >  AND date_part('hour', measurement_start) between 5 and 23
> >  AND date_part('day',measurement_start)=30
> > GROUP BY lane_id,measurement_start,measurement_end,speed,volume,occupancy,quality,effective_date
> > ORDER BY lane_id, measurement_start
> >
> > out of this only lane_id and mesaurement_start are indexed. This query
> > will return around 10,000 rows. But it seems to be taking a long time
> > to execute which doesnt make sense for a select statement. It doesnt
> > make any sense to create index for every field we are gonna use in tne
> > where clause.
> > Isnt there any way we can improve the performance?
>
> I'm guessing that adding an index for either
> date_part('hour',measurement_start) or
> date_part('day',measurement_start) or both would help.
>
> What does explain analyze select ... (rest of query here) say?
>

Re: Select running slow on Postgres

От
"Scott Marlowe"
Дата:
On Wed, Jul 2, 2008 at 1:01 PM, samantha mahindrakar
<sam.mahindrakar@gmail.com> wrote:
> I ran the explain analyze.Here is what i got:
>
>
> "Group  (cost=112266.37..112266.40 rows=1 width=56) (actual
> time=5583.399..5615.476 rows=13373 loops=1)"
> "  ->  Sort  (cost=112266.37..112266.38 rows=1 width=56) (actual
> time=5583.382..5590.890 rows=13373 loops=1)"
> "        Sort Key: lane_data_07_08.lane_id,
> lane_data_07_08.measurement_start, lane_data_07_08.measurement_end,
> lane_data_07_08.speed, lane_data_07_08.volume, lane_data_07_08.occupancy,
> lane_data_07_08.quality, lane_data_07_08.effective_date"
> "        ->  Nested Loop IN Join  (cost=0.00..112266.36 rows=1 width=56)
> (actual time=1100.307..5547.768 rows=13373 loops=1)"
> "              ->  Seq Scan on lane_data_07_08  (cost=0.00..112241.52 rows=3
> width=56) (actual time=1087.666..5341.662 rows=20581 loops=1)"

You can see here that the seq scan on lane_data is what's eating up
all your time.  Also, since the row estimate is WAY off, it then chose
a nested loop thinking it would be joining up only 1 row and actually
running across 20k rows.

> "                    Filter: (((volume = 255::double precision) OR (speed =
> 255::double precision) OR (occupancy = 255::double precision) OR (occupancy
>>= 100::double precision) OR (volume > 52::double precision) OR (volume <
> 0::double precision) OR (speed > 120::double precision) OR (speed <
> 0::double precision)) AND (date_part('hour'::text, measurement_start) >=
> 5::double precision) AND (date_part('hour'::text, measurement_start) <=
> 23::double precision) AND (date_part('day'::text, measurement_start) =
> 1::double precision))"
> "              ->  Index Scan using lane_info_pk on
> lane_info  (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.007 rows=1
> loops=20581)"
> "                    Index Cond: (lane_data_07_08.lane_id =
> lane_info.lane_id)"
> "                    Filter: (inactive IS NULL)"
> "Total runtime: 5621.409 ms"
>
>
> Well instaed of creating extra indexes (since they eat up lot of space) i
> made use of the whole measurement_start field, so thet it uses the index
> proeprty and makes the search faster.
> So i changed the query to include the measuerment start as follows:
>
> SELECT lane_id,measurement_start,
> measurement_end,speed,volume,occupancy,quality,effective_date
> FROM tss.lane_data_06_08
> WHERE lane_id in(select lane_id from lane_info where inactive is  null )
> AND measurement_start between '2008-06-30 05:00:00-04' AND  '2008-06-30
> 23:00:00-04'
> GROUP BY
> lane_id,measurement_start,measurement_end,speed,volume,occupancy,quality,effective_date
> ORDER BY lane_id, measurement_start

Yeah, anytime you can just compare date / timestamp on an indexed
field you'll do better.  If you find yourself needing to use the other
syntax, so you can, for instance, grab the data for 5 days in a row
from 5am to 11am or something, then the method I mentioned of making
indexes on date_part are a good choice.  Note that you need regular
timestamp, not timstamptz to create indexes.