Re: Calculating statistic via function rather than with query is slowing my query

Поиск
Список
Период
Сортировка
От Anish Kejariwal
Тема Re: Calculating statistic via function rather than with query is slowing my query
Дата
Msg-id CAOpcnr-1=YGsp=OAV58cdEfNw4EEuuzEqo53+9C7kURZLmX4Ew@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Calculating statistic via function rather than with query is slowing my query  (Pavel Stehule <pavel.stehule@gmail.com>)
Ответы Re: Calculating statistic via function rather than with query is slowing my query  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: Calculating statistic via function rather than with query is slowing my query  (Craig Ringer <ringerc@ringerc.id.au>)
Список pgsql-performance
Thanks Pavel! that definitely solved it.  

Unfortunately, the function I gave you was a simple/short version of what the actual function is going to be.  The actual function is going to get parameters passed to it, and based on the parameters will go through some if...else conditions, and maybe even call another function.  Based on that, I was definitely hoping to use plpgsql, and the overhead is unfortunate.  

Is there any way to get around this overhead?  Will I still have the same overhead if I use plperl, plpython, pljava, or write the function in C?

Anish


On Wed, Aug 17, 2011 at 11:27 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hello

2011/8/17 Anish Kejariwal <anishkej@gmail.com>:
> Hi everyone,
> I'm using postgres 9.0.3, and here's the OS I'm running this on:
> Linux 2.6.18-238.12.1.el5xen #1 SMP Tue May 31 14:02:29 EDT 2011 x86_64
> x86_64 x86_64 GNU/Linux
> I have a fairly straight forward query.  I'm doing a group by on an ID, and
> then calculating some a statistic on the resulting data.  The problem I'm
> running into is that when I'm calculating the statistics via a function,
> it's twice as slow as when I'm calculating the statistics directly in my
> query.  I want to be able to use a function, since I'll be using this
> particular calculation in many places.
> Any idea of what's going on?  Below, I've included my function, and both
> queries (I removed the type_ids, and just wrote …ids…
> Here's my function (I also tried stable):
> CREATE OR REPLACE FUNCTION calc_test(a double precision, b integer, c
> integer)
> RETURNS double precision AS $$
> BEGIN
>         return a/b/c* 1000000000::double precision;
> END;
> $$ LANGUAGE plpgsql immutable;
>

this is overhead of plpgsql call. For this simple functions use a SQL
functions instead

CREATE OR REPLACE FUNCTION calc_test(a double precision, b integer, c
 integer)
 RETURNS double precision AS $$
>         SELECT $1/$2/$3* 1000000000::double precision;
> $$ LANGUAGE sql;

Regards

Pavel Stehule

> The query that takes 7.6 seconds, when I calculate the statistic from within
> the query:
> explain analyze
> select
>    agg.primary_id,
>    avg(agg.a / agg.b / agg.c * 1000000000::double precision) foo,
>    stddev(agg.a / agg.b / agg.c * 1000000000::double precision) bar
> from mytable agg
> where agg.type_id in (....ids....)
> group by agg.primary_id;
> The execution plan:
>  HashAggregate  (cost=350380.58..350776.10 rows=9888 width=20) (actual
> time=7300.414..7331.659 rows=20993 loops=1)
>    ->  Bitmap Heap Scan on mytable agg  (cost=28667.90..337509.63
> rows=1716127 width=20) (actual time=200.064..2861.600 rows=2309230 loops=1)
>          Recheck Cond: (type_id = ANY ('{....ids....}'::integer[]))
>          ->  Bitmap Index Scan on mytable_type_id_idx  (cost=0.00..28238.87
> rows=1716127 width=0) (actual time=192.725..192.725 rows=2309230 loops=1)
>                Index Cond: (type_id = ANY ('{....ids....}'::integer[]))
>  Total runtime: 7358.337 ms
> (6 rows)
>
>
>
> The same query, but now I'm calling the function.  When I call the function
> it's taking 15.5 seconds.
> explain analyze select
>    agg.primary_id,
>    avg(calc_test(agg.a,agg.b,agg.c)) foo,
>    stddev(calc_test(agg.a,agg.b,agg.c)) bar
> from mytable agg
> where agg.type_id in (....ids....)
> group by agg.primary_id;
> and, here's the execution plan:
>  HashAggregate  (cost=350380.58..355472.90 rows=9888 width=20) (actual
> time=13660.838..13686.618 rows=20993 loops=1)
>    ->  Bitmap Heap Scan on mytable agg  (cost=28667.90..337509.63
> rows=1716127 width=20) (actual time=170.385..2881.122 rows=2309230 loops=1)
>          Recheck Cond: (type_id = ANY ('{....ids....}'::integer[]))
>          ->  Bitmap Index Scan on mytable_type_id_idx  (cost=0.00..28238.87
> rows=1716127 width=0) (actual time=162.834..162.834 rows=2309230 loops=1)
>                Index Cond: (type_id = ANY ('{....ids....}'::integer[]))
>  Total runtime: 13707.560 ms
>
> Thanks!
> Anish

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Tomas Vondra"
Дата:
Сообщение: Re: Tuning Tips for a new Server
Следующее
От: Gary Doades
Дата:
Сообщение: Re: Raid 5 vs Raid 10 Benchmarks Using bonnie++