Re: query performance with hstore vs. non-hstore

Поиск
Список
Период
Сортировка
От Huang, Suya
Тема Re: query performance with hstore vs. non-hstore
Дата
Msg-id D83E55F5F4D99B4A9B4C4E259E6227CD014FA4CD@AUX1EXC02.apac.experian.local
обсуждение исходный текст
Ответ на Re: query performance with hstore vs. non-hstore  (David G Johnston <david.g.johnston@gmail.com>)
Список pgsql-performance
Hi David,

Thanks for the reply.

>Calling 43s "close to" 70s doesn't sound right...

Oops, I'm not saying 43s close to 70s... I mean that the plan generated by disable timing for explain plan doesn't make
obviousdifference comparing to the earlier plan I sent out which enabled timing. 

>What version of PostgreSQL are you using?
>
>Two calls to each() and cast to numeric are not free.
>
>Your sequential scan savings is nearly 9 seconds but you lose all of that, and more, when PostgreSQL evaluates the
resultof the scan and has to process the each() and >the cast before it performs the join against the expanded result.
Thereis no planner node for this activity but it does cost time - in this case more time than it >would take to simply
storethe native data types in separate rows. 
>
>You really should expand the hstore after the join (i.e., in the top-most
>select-list) but in this case since the join removed hardly any rows the gain from doing so would be minimal.  The
ideabeing you should not expand the hstore of any row >that fails the join condition since it will not end up in the
finalresult anyway. 
>
>Also, in this specific case, the call to each(...).key is pointless - you never use the data.
>
>If you did need to use both columns, and are using 9.3, you should re-write this to use LATERAL.
>
>In 9.2- you, possibly using a CTE, could do something like this:
>
>SELECT (each).* FROM (
>SELECT each(hs) FROM ( VALUES('k=>1'::hstore) ) h (hs)
>) src
>
>This is a single call to each(), in a subquery, which result is then expanded using (col).* notation in the parent
query. This avoids calling each twice - and note that >(each(...).*) does not work to avoid the double-call - you have
touse a subquery / a CTE one to ensure that it is not collapsed (offset 0 should work too but I find the >CTE one a
littlecleaner personally). 
>

I'm using Postgresql 9.3.4.
I changed the query as you suggested. The execution time are still similar to the original one.

dev=# explain analyze select (each).key as cha_type, sum((each).value::numeric) as visits from (select each(visits)
fromweekly_hstore a  join seg1 b on a.ref_id=b.ref_id )foo  group by cha_type  order by visits desc; 
                                                                          QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=9455046.69..9455047.19 rows=200 width=32) (actual time=70928.881..71425.833 rows=3639539 loops=1)
   Sort Key: (sum(((foo.each).value)::numeric))
   Sort Method: quicksort  Memory: 394779kB
   ->  HashAggregate  (cost=9455037.05..9455039.05 rows=200 width=32) (actual time=60077.937..61425.469 rows=3639539
loops=1)
         ->  Subquery Scan on foo  (cost=12029.58..5737447.05 rows=371759000 width=32) (actual time=281.658..23912.400
rows=36962761loops=1) 
               ->  Hash Join  (cost=12029.58..2019857.05 rows=371759000 width=186) (actual time=281.655..18759.265
rows=36962761loops=1) 
                     Hash Cond: ((a.ref_id)::text = (b.ref_id)::text)
                     ->  Seq Scan on weekly_hstore a  (cost=0.00..133321.14 rows=1292314 width=232) (actual
time=11.141..857.959rows=1292314 loops=1) 
                     ->  Hash  (cost=7382.59..7382.59 rows=371759 width=47) (actual time=262.722..262.722 rows=371759
loops=1)
                           Buckets: 65536  Batches: 1  Memory Usage: 28951kB
                           ->  Seq Scan on seg1 b  (cost=0.00..7382.59 rows=371759 width=47) (actual
time=11.701..113.859rows=371759 loops=1) 
 Total runtime: 71626.871 ms
(12 rows)


-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of David G
Johnston
Sent: Tuesday, September 02, 2014 1:38 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] query performance with hstore vs. non-hstore

Huang, Suya wrote
> See output of explain (analyze,timing off), the total runtime is close
> to the one enable timing.

Calling 43s "close to" 70s doesn't sound right...


> dev=# explain (analyze, timing off) select cha_type, sum(visits) from
> (select (each(visits)).key as cha_type,(each(visits)).value::numeric
> as visits from weekly_hstore a  join seg1 b on a.ref_id=b.ref_id )foo
> group by cha_type  order by sum(visits) desc;

What version of PostgreSQL are you using?

Two calls to each() and cast to numeric are not free.

Your sequential scan savings is nearly 9 seconds but you lose all of that, and more, when PostgreSQL evaluates the
resultof the scan and has to process the each() and the cast before it performs the join against the expanded result.
Thereis no planner node for this activity but it does cost time - in this case more time than it would take to simply
storethe native data types in separate rows. 

You really should expand the hstore after the join (i.e., in the top-most
select-list) but in this case since the join removed hardly any rows the gain from doing so would be minimal.  The idea
beingyou should not expand the hstore of any row that fails the join condition since it will not end up in the final
resultanyway. 

Also, in this specific case, the call to each(...).key is pointless - you never use the data.

If you did need to use both columns, and are using 9.3, you should re-write this to use LATERAL.

In 9.2- you, possibly using a CTE, could do something like this:

SELECT (each).* FROM (
SELECT each(hs) FROM ( VALUES('k=>1'::hstore) ) h (hs)
) src

This is a single call to each(), in a subquery, which result is then expanded using (col).* notation in the parent
query. This avoids calling each twice - and note that (each(...).*) does not work to avoid the double-call - you have
touse a subquery / a CTE one to ensure that it is not collapsed (offset 0 should work too but I find the CTE one a
littlecleaner personally). 

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/query-performance-with-hstore-vs-non-hstore-tp5817109p5817281.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


В списке pgsql-performance по дате отправления:

Предыдущее
От: gmb
Дата:
Сообщение: Re: Performance issue: index not used on GROUP BY...
Следующее
От: "Huang, Suya"
Дата:
Сообщение: how to change the provoke table in hash join