Re: Major performance problem after upgrade from 8.3 to 8.4

Поиск
Список
Период
Сортировка
От Merlin Moncure
Тема Re: Major performance problem after upgrade from 8.3 to 8.4
Дата
Msg-id AANLkTim2srP5bSy68iDgrS5HvH9p3Obf-qexmxB2tBUc@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Major performance problem after upgrade from 8.3 to 8.4  (Gerhard Wiesinger <lists@wiesinger.com>)
Ответы Re: Major performance problem after upgrade from 8.3 to 8.4  (Gerhard Wiesinger <lists@wiesinger.com>)
Список pgsql-performance
On Tue, Sep 14, 2010 at 3:59 PM, Gerhard Wiesinger <lists@wiesinger.com> wrote:
> On Tue, 14 Sep 2010, Merlin Moncure wrote:
>
>> On Tue, Sep 14, 2010 at 2:07 AM, Gerhard Wiesinger <lists@wiesinger.com>
>> wrote:
>>>
>>> Hello Merlin,
>>>
>>> Seems to be a feasible approach. On problem which might be that when
>>> multiple rows are returned that they are not ordered in each subselect
>>> correctly. Any idea to solve that?
>>>
>>> e.g.
>>> Raumsolltemperatur | Raumisttemperatur
>>> Value from time 1  | Value from time 2
>>> Value from time 2  | Value from time 1
>>>
>>> but should be
>>> Raumsolltemperatur | Raumisttemperatur
>>> Value from time 1  | Value from time 1
>>> Value from time 2  | Value from time 2
>>>
>>> But that might be solveable by first selecting keys from the log_details
>>> table and then join again.
>>>
>>> I will try it in the evening and I have to think about in detail.
>>>
>>> But thank you for the new approach and opening the mind :-)
>>
>> Using subquery in that style select (<subquery>), ... is limited to
>> results that return 1 row, 1 column.  I assumed that was the case...if
>> it isn't in your view, you can always attempt arrays:
>>
>> CREATE OR REPLACE VIEW log_entries AS
>> SELECT
>> l.id AS id,
>> l.datetime AS datetime,
>> l.tdate AS tdate,
>> l.ttime AS ttime,
>> array(select value from log_details ld join key_description kd on
>> ld.fk_keyid = kd.keyid where ld.fk_id = l.id and  description =
>> 'Raumsolltemperatur' order by XYZ) AS Raumsolltemperatur,
>> [...]
>>
>> arrays might raise the bar somewhat in terms of dealing with the
>> returned data, or they might work great.  some experimentation is in
>> order.
>>
>> XYZ being the ordering condition you want.  If that isn't available
>> inside the join then we need to think about this some more.  We could
>> probably help more if you could describe the schema in a little more
>> detail.  This is solvable.
>
> Of course, subquery is limited to a result set returning 1 row and 1 column.
> Also order is of course preserved because of the join.
>
> Further, I think I found a perfect query plan for the EAV pattern.
>
> First I tried your suggestion but there were some limitation with O(n^2)
> efforts (e.g. nested loops=12586 and also index scans with loop 12586):
>
> CREATE OR REPLACE VIEW log_entries_test AS
> SELECT
>  l.id AS id,
>  l.datetime AS datetime,
>  l.tdate AS tdate,
>  l.ttime AS ttime,
>  (SELECT value FROM log_details d JOIN key_description kd ON d.fk_keyid =
> kd.keyid WHERE l.id = d.fk_id AND kd.description = 'Raumsolltemperatur') AS
> Raumsolltemperatur,
>  (SELECT value FROM log_details d JOIN key_description kd ON d.fk_keyid =
> kd.keyid WHERE l.id = d.fk_id AND kd.description = 'Raumtemperatur') AS
> Raumtemperatur,
>  (SELECT value FROM log_details d JOIN key_description kd ON d.fk_keyid =
> kd.keyid WHERE l.id = d.fk_id AND kd.description = 'Kesselsolltemperatur')
> AS Kesselsolltemperatur,
>  (SELECT value FROM log_details d JOIN key_description kd ON d.fk_keyid =
> kd.keyid WHERE l.id = d.fk_id AND kd.description = 'Kesseltemperatur') AS
> Kesseltemperatur,
> ....
> FROM
>  log l
> ;
>
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> EXPLAIN ANALYZE SELECT * FROM log_entries_test WHERE datetime > now() -
> INTERVAL '10 days' ORDER BY datetime DESC;
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Index Scan Backward using i_log_unique on log l  (cost=0.00..140820.12
> rows=69 width=32) (actual time=2.848..22812.331 rows=12586 loops=1)
>  Index Cond: (datetime > (now() - '10 days'::interval))
>  SubPlan 1
>    ->  Nested Loop  (cost=0.00..19.99 rows=1 width=8) (actual
> time=0.007..0.018 rows=1 loops=12586)
>          ->  Seq Scan on key_description kd  (cost=0.00..2.38 rows=1
> width=8) (actual time=0.003..0.013 rows=1 loops=12586)
>                Filter: ((description)::text = 'Raumsolltemperatur'::text)
>          ->  Index Scan using unique_key_and_id on log_details d
>  (cost=0.00..17.60 rows=1 width=16) (actual time=0.004..0.004 rows=1
> loops=12586)
>                Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid))
>  SubPlan 2
>    ->  Nested Loop  (cost=0.00..19.99 rows=1 width=8) (actual
> time=0.006..0.017 rows=1 loops=12586)
>          ->  Seq Scan on key_description kd  (cost=0.00..2.38 rows=1
> width=8) (actual time=0.003..0.013 rows=1 loops=12586)
>                Filter: ((description)::text = 'Raumtemperatur'::text)
>          ->  Index Scan using unique_key_and_id on log_details d
>  (cost=0.00..17.60 rows=1 width=16) (actual time=0.002..0.003 rows=1
> loops=12586)
>                Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid))
>  SubPlan 3
>    ->  Nested Loop  (cost=0.00..19.99 rows=1 width=8) (actual
> time=0.005..0.017 rows=1 loops=12586)
>          ->  Seq Scan on key_description kd  (cost=0.00..2.38 rows=1
> width=8) (actual time=0.002..0.013 rows=1 loops=12586)
>                Filter: ((description)::text = 'Kesselsolltemperatur'::text)
>          ->  Index Scan using unique_key_and_id on log_details d
>  (cost=0.00..17.60 rows=1 width=16) (actual time=0.003..0.003 rows=1
> loops=12586)
>                Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid))
>  SubPlan 4
>    ->  Nested Loop  (cost=0.00..19.99 rows=1 width=8) (actual
> time=0.006..0.017 rows=1 loops=12586)
>          ->  Seq Scan on key_description kd  (cost=0.00..2.38 rows=1
> width=8) (actual time=0.002..0.013 rows=1 loops=12586)
>                Filter: ((description)::text = 'Kesseltemperatur'::text)
>          ->  Index Scan using unique_key_and_id on log_details d
>  (cost=0.00..17.60 rows=1 width=16) (actual time=0.002..0.003 rows=1
> loops=12586)
>                Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid))
>  SubPlan 5
>    ->  Nested Loop  (cost=0.00..19.99 rows=1 width=8) (actual
> time=0.005..0.017 rows=1 loops=12586)
>          ->  Seq Scan on key_description kd  (cost=0.00..2.38 rows=1
> width=8) (actual time=0.002..0.014 rows=1 loops=12586)
>                Filter: ((description)::text =
> 'Speichersolltemperatur'::text)
>          ->  Index Scan using unique_key_and_id on log_details d
>  (cost=0.00..17.60 rows=1 width=16) (actual time=0.002..0.003 rows=1
> loops=12586)
>                Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid))
>  SubPlan 6
>    ->  Nested Loop  (cost=0.00..19.99 rows=1 width=8) (actual
> time=0.006..0.017 rows=1 loops=12586)
>          ->  Seq Scan on key_description kd  (cost=0.00..2.38 rows=1
> width=8) (actual time=0.003..0.013 rows=1 loops=12586)
>                Filter: ((description)::text = 'Speichertemperatur'::text)
>          ->  Index Scan using unique_key_and_id on log_details d
>  (cost=0.00..17.60 rows=1 width=16) (actual time=0.002..0.003 rows=1
> loops=12586)
>                Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid))
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Therefore I optimized the query further which can be done in the following
> way with another subquery and IHMO a perfect query plan. Also the subselect
> avoid multiple iterations for each of the result rows:
>
> CREATE OR REPLACE VIEW log_entries_test AS
> SELECT
>  l.id AS id,
>  l.datetime AS datetime,
>  l.tdate AS tdate,
>  l.ttime AS ttime,
>  (SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid =
> (SELECT keyid FROM key_description WHERE description =
> 'Raumsolltemperatur')) AS Raumsolltemperatur,
>  (SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid =
> (SELECT keyid FROM key_description WHERE description = 'Raumtemperatur')) AS
> Raumtemperatur,
>  (SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid =
> (SELECT keyid FROM key_description WHERE description =
> 'Kesselsolltemperatur')) AS Kesselsolltemperatur,
>  (SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid =
> (SELECT keyid FROM key_description WHERE description = 'Kesseltemperatur'))
> AS Kesseltemperatur,
> ...
> FROM
>  log l
> ;
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> EXPLAIN ANALYZE SELECT * FROM log_entries_test WHERE datetime > now() -
> INTERVAL '10 days' ORDER BY datetime DESC;
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Index Scan Backward using i_log_unique on log l  (cost=0.00..140603.99
> rows=69 width=32) (actual time=2.588..5602.899 rows=12586 loops=1)
>  Index Cond: (datetime > (now() - '10 days'::interval))
>  SubPlan 2
>    ->  Index Scan using unique_key_and_id on log_details d
>  (cost=2.38..19.97 rows=1 width=8) (actual time=0.010..0.011 rows=1
> loops=12586)
>          Index Cond: (($1 = fk_id) AND (fk_keyid = $0))
>          InitPlan 1 (returns $0)
>            ->  Seq Scan on key_description  (cost=0.00..2.38 rows=1 width=8)
> (actual time=0.015..0.066 rows=1 loops=1)
>                  Filter: ((description)::text = 'Raumsolltemperatur'::text)
>  SubPlan 4
>    ->  Index Scan using unique_key_and_id on log_details d
>  (cost=2.38..19.97 rows=1 width=8) (actual time=0.003..0.003 rows=1
> loops=12586)
>          Index Cond: (($1 = fk_id) AND (fk_keyid = $2))
>          InitPlan 3 (returns $2)
>            ->  Seq Scan on key_description  (cost=0.00..2.38 rows=1 width=8)
> (actual time=0.009..0.020 rows=1 loops=1)
>                  Filter: ((description)::text = 'Raumtemperatur'::text)
>  SubPlan 6
>    ->  Index Scan using unique_key_and_id on log_details d
>  (cost=2.38..19.97 rows=1 width=8) (actual time=0.002..0.003 rows=1
> loops=12586)
>          Index Cond: (($1 = fk_id) AND (fk_keyid = $3))
>          InitPlan 5 (returns $3)
>            ->  Seq Scan on key_description  (cost=0.00..2.38 rows=1 width=8)
> (actual time=0.005..0.017 rows=1 loops=1)
>                  Filter: ((description)::text =
> 'Kesselsolltemperatur'::text)
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> BTW: Schemadata is in the links discussed in the thread
>
> Thnx to all for helping me.

np -- this felt particularly satisfying for some reason. btw, I think
you have some more low hanging optimization fruit.  I think (although
it would certainly have to be tested) hiding your attribute
description under keyid is buying you nothing but headaches.  If you
used natural key style, making description primary key of
key_description (or unique), and had log_details have a description
column that directly referenced that column, your subquery:

(
  SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid =
  (
    SELECT keyid FROM key_description WHERE description = 'Kesselsolltemperatur'
  )
) AS Kesselsolltemperatur,

would look like this:
(
  SELECT value FROM log_details d WHERE l.id = d.fk_id AND
d.description = 'Kesselsolltemperatur'
) AS Kesselsolltemperatur,

your index on log_details(fk_id, description) is of course fatter, but
quite precise...does require rebuilding your entire dataset however.
food for thought.

merlin

В списке pgsql-performance по дате отправления:

Предыдущее
От: Gerhard Wiesinger
Дата:
Сообщение: Re: Major performance problem after upgrade from 8.3 to 8.4
Следующее
От: "Joshua D. Drake"
Дата:
Сообщение: Re: Held idle connections vs use of a Pooler