Обсуждение: Function's execute overhead reducing

Поиск
Список
Период
Сортировка

Function's execute overhead reducing

От
Игорь Выскорко
Дата:
Hi all!

The best way to describe my question is to show the code as first:

create table tst(
  id int primary key,
  j1 jsonb,
  j2 jsonb
);

insert into tst
select
  ser,
  jsonb_build_object(
    floor(random() * 10 + 1), floor(random() * 1000 + 1),
    floor(random() * 10 + 1), floor(random() * 1000 + 1),
    floor(random() * 10 + 1), floor(random() * 1000 + 1)
  ),
  jsonb_build_object(
    floor(random() * 10 + 1), floor(random() * 1000 + 1),
    floor(random() * 10 + 1), floor(random() * 1000 + 1),
    floor(random() * 10 + 1), floor(random() * 1000 + 1)
  )
from generate_series(1, 500000) ser;

analyze tst;


-- original func is a bit complicated. But it doesn't matter here
create or replace function tst_func(a jsonb, b jsonb) returns bigint
  stable
  language sql
as $$
  select
    sum(
      ((_a.value::text)::int - (coalesce(b->>_a.key, '0'))::int)::int
    )
  from
    jsonb_each(a) _a
$$;

-- get plain data
explain analyze select
  id,
  j1,
  j2
from
  tst;

-- use subquery (the same code as in function)
explain analyze select
  id,
  j1,
  j2,
  (
    select
      sum(
        ((_a.value::text)::int - (coalesce(j2->>_a.key, '0'))::int)::int
      )
    from
      jsonb_each(j1) _a
  )
from
  tst;

-- use function
explain analyze select
  id,
  j1,
  j2,
  tst_func(j1, j2)
from
  tst;

select version();


And after run it I got following results:

CREATE TABLE
INSERT 0 500000
ANALYZE
CREATE FUNCTION
                                                  QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
 Seq Scan on tst  (cost=0.00..13558.00 rows=500000 width=108) (actual time=0.009..40.348 rows=500000 loops=1)
 Planning time: 0.189 ms
 Execution time: 56.356 ms
(3 rows)

                                                            QUERY PLAN
         
 

----------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on tst  (cost=0.00..1644808.00 rows=500000 width=116) (actual time=0.021..1966.190 rows=500000 loops=1)
   SubPlan 1
     ->  Aggregate  (cost=3.25..3.26 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=500000)
           ->  Function Scan on jsonb_each _a  (cost=0.00..1.00 rows=100 width=64) (actual time=0.002..0.002 rows=3
loops=500000)
 Planning time: 0.072 ms
 Execution time: 1982.192 ms
(6 rows)

                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
 Seq Scan on tst  (cost=0.00..138558.00 rows=500000 width=116) (actual time=0.072..5308.897 rows=500000 loops=1)
 Planning time: 0.067 ms
 Execution time: 5328.196 ms
(3 rows)

                                                                     version
                        
 

-------------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.13 on x86_64-pc-linux-gnu (Ubuntu 9.6.13-1.pgdg16.04+1), compiled by gcc (Ubuntu
5.4.0-6ubuntu1~16.04.11)5.4.0 20160609, 64-bit
 
(1 row)


As you see, subquery version needs 2 seconds when func version needs more than 5. And it's sad to see. I love functions
andactually this func (its production version) is widely used across our project. 
 
I tried to alter function as immutable and even parallel safe but it doesn't help.
I understand that functions execution is not free, but why so much? 
So, my question is: is there any way to make this function works faster? Something similar to prepared statement or
mayberules (create rule) or whatever else is available.
 

PS current situation maybe solved by denormalization with precalculations of function and storing results along with
data,but it's not the way i would like to use, because it leads to more issues to solve (invalidation, for instance)
 





Re: Function's execute overhead reducing

От
Adrian Klaver
Дата:
On 11/16/19 11:23 PM, Игорь Выскорко wrote:
> Hi all!
> 
> The best way to describe my question is to show the code as first:
> 
> create table tst(
>    id int primary key,
>    j1 jsonb,
>    j2 jsonb
> );
> 
> insert into tst
> select
>    ser,
>    jsonb_build_object(
>      floor(random() * 10 + 1), floor(random() * 1000 + 1),
>      floor(random() * 10 + 1), floor(random() * 1000 + 1),
>      floor(random() * 10 + 1), floor(random() * 1000 + 1)
>    ),
>    jsonb_build_object(
>      floor(random() * 10 + 1), floor(random() * 1000 + 1),
>      floor(random() * 10 + 1), floor(random() * 1000 + 1),
>      floor(random() * 10 + 1), floor(random() * 1000 + 1)
>    )
> from generate_series(1, 500000) ser;
> 
> analyze tst;
> 
> 
> -- original func is a bit complicated. But it doesn't matter here
> create or replace function tst_func(a jsonb, b jsonb) returns bigint
>    stable
>    language sql
> as $$
>    select
>      sum(
>        ((_a.value::text)::int - (coalesce(b->>_a.key, '0'))::int)::int
>      )
>    from
>      jsonb_each(a) _a
> $$;
> 
> -- get plain data
> explain analyze select
>    id,
>    j1,
>    j2
> from
>    tst;
> 
> -- use subquery (the same code as in function)
> explain analyze select
>    id,
>    j1,
>    j2,
>    (
>      select
>        sum(
>          ((_a.value::text)::int - (coalesce(j2->>_a.key, '0'))::int)::int
>        )
>      from
>        jsonb_each(j1) _a
>    )
> from
>    tst;
> 
> -- use function
> explain analyze select
>    id,
>    j1,
>    j2,
>    tst_func(j1, j2)
> from
>    tst;
> 
> select version();
> 
> 
> And after run it I got following results:
> 
> CREATE TABLE
> INSERT 0 500000
> ANALYZE
> CREATE FUNCTION
>                                                    QUERY PLAN
> --------------------------------------------------------------------------------------------------------------
>   Seq Scan on tst  (cost=0.00..13558.00 rows=500000 width=108) (actual time=0.009..40.348 rows=500000 loops=1)
>   Planning time: 0.189 ms
>   Execution time: 56.356 ms
> (3 rows)
> 
>                                                              QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------
>   Seq Scan on tst  (cost=0.00..1644808.00 rows=500000 width=116) (actual time=0.021..1966.190 rows=500000 loops=1)
>     SubPlan 1
>       ->  Aggregate  (cost=3.25..3.26 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=500000)
>             ->  Function Scan on jsonb_each _a  (cost=0.00..1.00 rows=100 width=64) (actual time=0.002..0.002 rows=3
loops=500000)
>   Planning time: 0.072 ms
>   Execution time: 1982.192 ms
> (6 rows)
> 
>                                                     QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------
>   Seq Scan on tst  (cost=0.00..138558.00 rows=500000 width=116) (actual time=0.072..5308.897 rows=500000 loops=1)
>   Planning time: 0.067 ms
>   Execution time: 5328.196 ms
> (3 rows)
> 
>                                                                       version
>
-------------------------------------------------------------------------------------------------------------------------------------------------
>   PostgreSQL 9.6.13 on x86_64-pc-linux-gnu (Ubuntu 9.6.13-1.pgdg16.04+1), compiled by gcc (Ubuntu
5.4.0-6ubuntu1~16.04.11)5.4.0 20160609, 64-bit
 
> (1 row)
> 
> 
> As you see, subquery version needs 2 seconds when func version needs more than 5. And it's sad to see. I love
functionsand actually this func (its production version) is widely used across our project.
 
> I tried to alter function as immutable and even parallel safe but it doesn't help.
> I understand that functions execution is not free, but why so much?
> So, my question is: is there any way to make this function works faster? Something similar to prepared statement or
mayberules (create rule) or whatever else is available.
 

My suspicion is that the query in the function is hidden from the 
planner and so it ends up running two separate SELECT queries without 
reference to each other. A quick test and possible solution:

My results for the above on old machine.:

Straight query:
                                                             QUERY PLAN 


----------------------------------------------------------------------------------------------------------------------------------
  Seq Scan on tst  (cost=0.00..1644807.00 rows=500000 width=116) (actual 
time=0.033..2808.596 rows=500000 loops=1)
    SubPlan 1
      ->  Aggregate  (cost=3.25..3.26 rows=1 width=8) (actual 
time=0.005..0.005 rows=1 loops=500000)
            ->  Function Scan on jsonb_each _a  (cost=0.00..1.00 
rows=100 width=64) (actual time=0.003..0.003 rows=3 loops=500000)
  Planning Time: 16.162 ms
  Execution Time: 2846.815 ms


Function in query
                                                    QUERY PLAN 

-----------------------------------------------------------------------------------------------------------------
  Seq Scan on tst  (cost=0.00..138557.00 rows=500000 width=116) (actual 
time=0.119..7048.285 rows=500000 loops=1)
  Planning Time: 0.105 ms
  Execution Time: 7098.057 ms

I changed the function to:

CREATE OR REPLACE FUNCTION public.tst_func(i integer, a jsonb, b jsonb)
  RETURNS TABLE(id integer, val bigint)
  LANGUAGE sql
  STABLE
AS $function$
   select
     i,
     sum(
       ((_a.value::text)::int - (coalesce(b->>_a.key, '0'))::int)::int
     )
   from
     jsonb_each(a) _a
$function$

Using 'table' function:

test=# explain analyze select
   tst.id,
   j1,
   j2
from
   tst
join
   tst_func(id, j1, j2) as f
on
  tst.id = f.id;
                                                              QUERY PLAN 


-------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=1.00..531057.00 rows=1 width=108) (actual 
time=0.042..2002.258 rows=500000 loops=1)
    ->  Seq Scan on tst  (cost=0.00..13557.00 rows=500000 width=108) 
(actual time=0.014..70.936 rows=500000 loops=1)
    ->  Subquery Scan on f  (cost=1.00..1.02 rows=1 width=4) (actual 
time=0.003..0.003 rows=1 loops=500000)
          Filter: (tst.id = f.id)
          ->  Aggregate  (cost=1.00..1.01 rows=1 width=12) (actual 
time=0.003..0.003 rows=1 loops=500000)
                ->  Function Scan on jsonb_each _a  (cost=0.00..1.00 
rows=100 width=0) (actual time=0.003..0.003 rows=3 loops=500000)
  Planning Time: 0.297 ms
  Execution Time: 2037.601 ms

> 
> PS current situation maybe solved by denormalization with precalculations of function and storing results along with
data,but it's not the way i would like to use, because it leads to more issues to solve (invalidation, for instance)
 
> 
> 
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Function's execute overhead reducing

От
Игорь Выскорко
Дата:
> My suspicion is that the query in the function is hidden from the
> planner and so it ends up running two separate SELECT queries without
> reference to each other. A quick test and possible solution:
>
> My results for the above on old machine.:
>
> Straight query:
>                                                              QUERY PLAN
>
>
----------------------------------------------------------------------------------------------------------------------------------
>   Seq Scan on tst (cost=0.00..1644807.00 rows=500000 width=116) (actual
> time=0.033..2808.596 rows=500000 loops=1)
>     SubPlan 1
>       -> Aggregate (cost=3.25..3.26 rows=1 width=8) (actual
> time=0.005..0.005 rows=1 loops=500000)
>             -> Function Scan on jsonb_each _a (cost=0.00..1.00
> rows=100 width=64) (actual time=0.003..0.003 rows=3 loops=500000)
>   Planning Time: 16.162 ms
>   Execution Time: 2846.815 ms
>
> Function in query
>                                                     QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------------------
>   Seq Scan on tst (cost=0.00..138557.00 rows=500000 width=116) (actual
> time=0.119..7048.285 rows=500000 loops=1)
>   Planning Time: 0.105 ms
>   Execution Time: 7098.057 ms
>
> I changed the function to:
>
> CREATE OR REPLACE FUNCTION public.tst_func(i integer, a jsonb, b jsonb)
>   RETURNS TABLE(id integer, val bigint)
>   LANGUAGE sql
>   STABLE
> AS $function$
>    select
>      i,
>      sum(
>        ((_a.value::text)::int - (coalesce(b->>_a.key, '0'))::int)::int
>      )
>    from
>      jsonb_each(a) _a
> $function$
>
> Using 'table' function:
>
> test=# explain analyze select
>    tst.id,
>    j1,
>    j2
> from
>    tst
> join
>    tst_func(id, j1, j2) as f
> on
>   tst.id = f.id;
>                                                               QUERY PLAN
>
>
-------------------------------------------------------------------------------------------------------------------------------------
>   Nested Loop (cost=1.00..531057.00 rows=1 width=108) (actual
> time=0.042..2002.258 rows=500000 loops=1)
>     -> Seq Scan on tst (cost=0.00..13557.00 rows=500000 width=108)
> (actual time=0.014..70.936 rows=500000 loops=1)
>     -> Subquery Scan on f (cost=1.00..1.02 rows=1 width=4) (actual
> time=0.003..0.003 rows=1 loops=500000)
>           Filter: (tst.id = f.id)
>           -> Aggregate (cost=1.00..1.01 rows=1 width=12) (actual
> time=0.003..0.003 rows=1 loops=500000)
>                 -> Function Scan on jsonb_each _a (cost=0.00..1.00
> rows=100 width=0) (actual time=0.003..0.003 rows=3 loops=500000)
>   Planning Time: 0.297 ms
>   Execution Time: 2037.601 ms
>
>>  PS current situation maybe solved by denormalization with precalculations of function and storing results along
withdata, but it's not the way i would like to use, because it leads to more issues to solve (invalidation, for
instance)
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com

Thanks a lot! It works even better than subquery