Re: Calculating statistic via function rather than with query is slowing my query

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: Calculating statistic via function rather than with query is slowing my query
Дата
Msg-id CAFj8pRBsU1WVHtcttDhCrS_tqRwnFEyaxG+=yyv=knXA9yNPcg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Calculating statistic via function rather than with query is slowing my query  (Anish Kejariwal <anishkej@gmail.com>)
Список pgsql-performance
2011/8/17 Anish Kejariwal <anishkej@gmail.com>:
> Thanks Pavel! that definitely solved it.
> Unfortunately, the function I gave you was a simple/short version of what
> the actual function is going to be.  The actual function is going to get
> parameters passed to it, and based on the parameters will go through some
> if...else conditions, and maybe even call another function.  Based on that,
> I was definitely hoping to use plpgsql, and the overhead is unfortunate.
> Is there any way to get around this overhead?  Will I still have the same
> overhead if I use plperl, plpython, pljava, or write the function in C?

only SQL and C has zero overhead - SQL because uses inlining and C is
just readable assambler.

I am thinking,  overhead of PL/pgSQL is minimal from languages from your list.

Regards

Pavel

>
> Anish
>
> On Wed, Aug 17, 2011 at 11:27 AM, Pavel Stehule <pavel.stehule@gmail.com>
> wrote:
>>
>> Hello
>>
>> 2011/8/17 Anish Kejariwal <anishkej@gmail.com>:
>> > Hi everyone,
>> > I'm using postgres 9.0.3, and here's the OS I'm running this on:
>> > Linux 2.6.18-238.12.1.el5xen #1 SMP Tue May 31 14:02:29 EDT 2011 x86_64
>> > x86_64 x86_64 GNU/Linux
>> > I have a fairly straight forward query.  I'm doing a group by on an ID,
>> > and
>> > then calculating some a statistic on the resulting data.  The problem
>> > I'm
>> > running into is that when I'm calculating the statistics via a function,
>> > it's twice as slow as when I'm calculating the statistics directly in my
>> > query.  I want to be able to use a function, since I'll be using this
>> > particular calculation in many places.
>> > Any idea of what's going on?  Below, I've included my function, and both
>> > queries (I removed the type_ids, and just wrote …ids…
>> > Here's my function (I also tried stable):
>> > CREATE OR REPLACE FUNCTION calc_test(a double precision, b integer, c
>> > integer)
>> > RETURNS double precision AS $$
>> > BEGIN
>> >         return a/b/c* 1000000000::double precision;
>> > END;
>> > $$ LANGUAGE plpgsql immutable;
>> >
>>
>> this is overhead of plpgsql call. For this simple functions use a SQL
>> functions instead
>>
>> CREATE OR REPLACE FUNCTION calc_test(a double precision, b integer, c
>>  integer)
>>  RETURNS double precision AS $$
>> >         SELECT $1/$2/$3* 1000000000::double precision;
>> > $$ LANGUAGE sql;
>>
>> Regards
>>
>> Pavel Stehule
>>
>> > The query that takes 7.6 seconds, when I calculate the statistic from
>> > within
>> > the query:
>> > explain analyze
>> > select
>> >    agg.primary_id,
>> >    avg(agg.a / agg.b / agg.c * 1000000000::double precision) foo,
>> >    stddev(agg.a / agg.b / agg.c * 1000000000::double precision) bar
>> > from mytable agg
>> > where agg.type_id in (....ids....)
>> > group by agg.primary_id;
>> > The execution plan:
>> >  HashAggregate  (cost=350380.58..350776.10 rows=9888 width=20) (actual
>> > time=7300.414..7331.659 rows=20993 loops=1)
>> >    ->  Bitmap Heap Scan on mytable agg  (cost=28667.90..337509.63
>> > rows=1716127 width=20) (actual time=200.064..2861.600 rows=2309230
>> > loops=1)
>> >          Recheck Cond: (type_id = ANY ('{....ids....}'::integer[]))
>> >          ->  Bitmap Index Scan on mytable_type_id_idx
>> >  (cost=0.00..28238.87
>> > rows=1716127 width=0) (actual time=192.725..192.725 rows=2309230
>> > loops=1)
>> >                Index Cond: (type_id = ANY ('{....ids....}'::integer[]))
>> >  Total runtime: 7358.337 ms
>> > (6 rows)
>> >
>> >
>> >
>> > The same query, but now I'm calling the function.  When I call the
>> > function
>> > it's taking 15.5 seconds.
>> > explain analyze select
>> >    agg.primary_id,
>> >    avg(calc_test(agg.a,agg.b,agg.c)) foo,
>> >    stddev(calc_test(agg.a,agg.b,agg.c)) bar
>> > from mytable agg
>> > where agg.type_id in (....ids....)
>> > group by agg.primary_id;
>> > and, here's the execution plan:
>> >  HashAggregate  (cost=350380.58..355472.90 rows=9888 width=20) (actual
>> > time=13660.838..13686.618 rows=20993 loops=1)
>> >    ->  Bitmap Heap Scan on mytable agg  (cost=28667.90..337509.63
>> > rows=1716127 width=20) (actual time=170.385..2881.122 rows=2309230
>> > loops=1)
>> >          Recheck Cond: (type_id = ANY ('{....ids....}'::integer[]))
>> >          ->  Bitmap Index Scan on mytable_type_id_idx
>> >  (cost=0.00..28238.87
>> > rows=1716127 width=0) (actual time=162.834..162.834 rows=2309230
>> > loops=1)
>> >                Index Cond: (type_id = ANY ('{....ids....}'::integer[]))
>> >  Total runtime: 13707.560 ms
>> >
>> > Thanks!
>> > Anish
>
>

В списке pgsql-performance по дате отправления:

Предыдущее
От: Gary Doades
Дата:
Сообщение: Re: Raid 5 vs Raid 10 Benchmarks Using bonnie++
Следующее
От: Ogden
Дата:
Сообщение: Re: Tuning Tips for a new Server