Re: slow plan on join when adding where clause

Поиск
Список
Период
Сортировка
От Mladen Gogala
Тема Re: slow plan on join when adding where clause
Дата
Msg-id 4BBE06EE.8000305@vmsinfo.com
обсуждение исходный текст
Ответ на Re: slow plan on join when adding where clause  ("L. Loewe" <lloewe@hotmail.com>)
Список pgsql-novice
How about creating index on both columns and drop the separate index?
The composite index on ti and sensor_id could still be used for the
original query but would probably give you a killer performance when the
columns are used together.

L. Loewe wrote:
> Thanks.
>
> That did change the plan a bit but it's still not doing it the way
> it does it when searching all sensors - and it's still a lot slower.
>
>
>  Nested Loop  (cost=2.36..1482129.06 rows=2629241 width=68)
>              (actual time=30983.301..31827.299 rows=772 loops=1)
>    Join Filter: ((main.ti > events.start) AND (main.ti < events.stop))
>    ->  Seq Scan on main  (cost=0.00..890547.50 rows=190832 width=40)
>              (actual time=54.095..18136.153 rows=492150 loops=1)
>          Filter: ((sensor_id + 0) = 1)
>    ->  Materialize  (cost=2.36..3.60 rows=124 width=28)
>              (actual time=0.000..0.009 rows=124 loops=492150)
>          ->  Seq Scan on events  (cost=0.00..2.24 rows=124 width=28)
>              (actual time=0.010..0.021 rows=124 loops=1)
>  Total runtime: 31827.435 ms
>
> I'm not sure the index is the problem - seems more like the way it
> assembles the data.
> If I read this right the planner takes the rows matching sensor_id=1
> first and then joins the result with the time slices of events.
>
> This'd be probably a good idea if events had a lot of rows or the time
> slices were large.
> But with the data at hand it is a lot faster to take the rows
> of events first and then match each one to main.ti (which is what
> the planner does without the where clause).
>
> It's understandable that the planner cannot properly figure out
> how many rows these start-stop slices will select, however since
> it appears to make an assumption I was looking for a way to
> tell it which value to assume...
>
> Regards
>
>
>
>
> On Tue, 30 Mar 2010 12:06:46 -0600, Mladen Gogala
> <mgogala@vmsinfo.com> wrote:
>
>> There is an old trick which can help you here. Try doing this:
>> SELECT  * FROM  events, main WHERE  main.ti > events.start and
>>                                     main.ti < events.stop and
>> sensor_id+0=1;
>>
>> That will stop planner from merging two indexes and using bitmap.
>>
>> L. Loewe wrote:
>>> Hi all
>>>
>>>
>>> I have a fairly large table ("main" ~50M rows) containing a
>>> timestamp (indexed),
>>>  a sensor_id (indexed) and some sensor data ranging over a few years.
>>>
>>> The second table ("events" <500 rows) contains a set of interesting
>>> events with
>>> an events.id and two timestamps: events.start and events.stop plus some
>>> additional data.
>>>
>>> Now I want to join these tables to get the sensor data for those
>>> events.
>>>
>>> The interval between start and stop is quite short for each event
>>> (usually a
>>> couple of minutes) so that there aren't too many rows from table "main"
>>> matching this criteria (~1K) for each event:
>>>
>>> SELECT * FROM events, main WHERE main.ti > events.start and main.ti
>>> < events.stop;
>>>
>>> EXPLAIN ANALYZE gives
>>>
>>> Nested Loop  (cost=0.00..27621542.27 rows=524505120 width=68)
>>>                  (actual time=0.038..42.314 rows=69209 loops=1)
>>>    ->  Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
>>>                 (actual time=0.006..0.025 rows=124 loops=1)
>>>    ->  Index Scan using idx_main_ti on main  (cost=0.00..159306.16
>>> rows=4229880 width=40)
>>>                 (actual time=0.016..0.178 rows=558 loops=124)
>>>          Index Cond: ((main.ti > events.start) AND (main.ti <
>>> events.stop))
>>> Total runtime: 47.682 ms
>>>
>>> So far so good, however if I add a sensor_id constraint the planner
>>> chooses a
>>> different approach:
>>>
>>>
>>> SELECT  * FROM  events, main WHERE  main.ti > events.start and
>>>                                     main.ti < events.stop and
>>> sensor_id=1;
>>>
>>> Nested Loop  (cost=7309.32..1422246.30 rows=4795865 width=68)
>>>                 (actual time=23427.599..23886.276 rows=772 loops=1)
>>>    Join Filter: ((main.ti > events.start) AND (main.ti < events.stop))
>>>    ->  Bitmap Heap Scan on main (cost=7306.96..343174.23 rows=348087
>>> width=40)
>>>                 (actual time=3771.719..9508.728 rows=490984 loops=1)
>>>          Recheck Cond: (sensor_id= 1)
>>>          ->  Bitmap Index Scan on idx_main_sens (cost=0.00..7219.94
>>> rows=348087 width=0)
>>>                 (actual time=3769.075..3769.075 rows=491102 loops=1)
>>>                Index Cond: (sensor_id= 1)
>>>    ->  Materialize  (cost=2.36..3.60 rows=124 width=28)
>>>                 (actual time=0.000..0.010 rows=124 loops=490984)
>>>          ->  Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
>>>                (actual time=0.005..0.021 rows=124 loops=1)
>>> Total runtime: 23886.494 ms
>>>
>>>
>>> Apparently the planner assumes that sensor_id=1 will return fewer
>>> rows than
>>> the time constraint while the opposite is true:
>>> sensor_id=1 -> ~ 500K , time window -> ~ 1K.
>>>
>>> Is there a way to hint the planner to use plan 1 even with the
>>> sensor_id=1
>>> clause or am I doing something fundamentally wrong here?
>>>
>>> Thanks
>>>
>>
>


--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




В списке pgsql-novice по дате отправления:

Предыдущее
От: "Machiel Richards"
Дата:
Сообщение: FW: Postgres cpu & memory usage
Следующее
От: Mladen Gogala
Дата:
Сообщение: Re: FW: Postgres cpu & memory usage