Обсуждение: Inlining of couple of functions in pl_exec.c improves performance

Поиск
Список
Период
Сортировка

Inlining of couple of functions in pl_exec.c improves performance

От
Amit Khandekar
Дата:
There are a couple of function call overheads I observed in pl/pgsql
code : exec_stmt() and exec_cast_value(). Removing these overheads
resulted in some performance gains.

exec_stmt() :

plpgsql_exec_function() and other toplevel block executors currently
call exec_stmt(). But actually they don't need to do everything that
exec_stmt() does. So they can call a new function instead of
exec_stmt(), and all the exec_stmt() code can be moved to
exec_stmts(). The things that exec_stmt() do, but are not necessary
for a top level block stmt, are :

1. save_estmt = estate->err_stmt; estate->err_stmt = stmt;
For top level blocks, saving the estate->err_stmt is not necessary,
because there is no statement after this block statement. Anyways,
plpgsql_exec_function() assigns estate.err_stmt just before calling
exec_stmt so there is really no point in exec_stmt() setting it again.

2. CHECK_FOR_INTERRUPTS()
This is not necessary for toplevel block callers.

3. exec_stmt_block() can be directly called rather than exec_stmt()
because func->action is a block statement. So the switch statement is
not necessary.

But this one might be necessary for toplevel block statement:
  if (*plpgsql_plugin_ptr && (*plpgsql_plugin_ptr)->stmt_beg)
     ((*plpgsql_plugin_ptr)->stmt_beg) (estate, stmt);

There was already a repetitive code in plpgsql_exec_function() and
other functions around the exec_stmt() call. So in a separate patch
0001*.patch, I moved that code into a common function
exec_toplevel_block(). In the main patch
0002-Get-rid-of-exec_stmt-function-call.patch, I additionally called
plpgsql_plugin_ptr->stmt_beg() inside exec_toplevel_block(). And moved
exec_stmt() code into exec_stmts().



exec_cast_value() :

This function does not do the casting if not required. So moved the
code that actually does the cast into a separate function, so as to
reduce the exec_cast_value() code and make it inline. Attached is the
0003-Inline-exec_cast_value.patch


Testing
----------

I used two available VMs (one x86_64 and the other arm64), and the
benefit showed up on both of these machines. Attached patches 0001,
0002, 0003 are to be applied in that order. 0001 is just a preparatory
patch.

First I tried with a simple for loop with a single assignment
(attached forcounter.sql)

By inlining of the two functions, found noticeable reduction in
execution time as shown (figures are in milliseconds, averaged over
multiple runs; taken from 'explain analyze' execution times) :
ARM VM :
   HEAD : 100 ; Patched : 88 => 13.6% improvement
x86 VM :
   HEAD :  71 ; Patched : 66 => 7.63% improvement.

Then I included many assignment statements as shown in attachment
assignmany.sql. This showed further benefit :
ARM VM :
   HEAD : 1820 ; Patched : 1549  => 17.5% improvement
x86 VM :
   HEAD : 1020 ; Patched :  869  => 17.4% improvement

Inlining just exec_stmt() showed the improvement mainly on the arm64
VM (7.4%). For x86, it was 2.7%
But inlining exec_stmt() and exec_cast_value() together showed
benefits on both machines, as can be seen above.

-- 
Thanks,
-Amit Khandekar
Huawei Technologies

Вложения

Re: Inlining of couple of functions in pl_exec.c improves performance

От
Pavel Stehule
Дата:
Hi

so 23. 5. 2020 v 19:03 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
There are a couple of function call overheads I observed in pl/pgsql
code : exec_stmt() and exec_cast_value(). Removing these overheads
resulted in some performance gains.

exec_stmt() :

plpgsql_exec_function() and other toplevel block executors currently
call exec_stmt(). But actually they don't need to do everything that
exec_stmt() does. So they can call a new function instead of
exec_stmt(), and all the exec_stmt() code can be moved to
exec_stmts(). The things that exec_stmt() do, but are not necessary
for a top level block stmt, are :

1. save_estmt = estate->err_stmt; estate->err_stmt = stmt;
For top level blocks, saving the estate->err_stmt is not necessary,
because there is no statement after this block statement. Anyways,
plpgsql_exec_function() assigns estate.err_stmt just before calling
exec_stmt so there is really no point in exec_stmt() setting it again.

2. CHECK_FOR_INTERRUPTS()
This is not necessary for toplevel block callers.

3. exec_stmt_block() can be directly called rather than exec_stmt()
because func->action is a block statement. So the switch statement is
not necessary.

But this one might be necessary for toplevel block statement:
  if (*plpgsql_plugin_ptr && (*plpgsql_plugin_ptr)->stmt_beg)
     ((*plpgsql_plugin_ptr)->stmt_beg) (estate, stmt);

There was already a repetitive code in plpgsql_exec_function() and
other functions around the exec_stmt() call. So in a separate patch
0001*.patch, I moved that code into a common function
exec_toplevel_block(). In the main patch
0002-Get-rid-of-exec_stmt-function-call.patch, I additionally called
plpgsql_plugin_ptr->stmt_beg() inside exec_toplevel_block(). And moved
exec_stmt() code into exec_stmts().



exec_cast_value() :

This function does not do the casting if not required. So moved the
code that actually does the cast into a separate function, so as to
reduce the exec_cast_value() code and make it inline. Attached is the
0003-Inline-exec_cast_value.patch


Testing
----------

I used two available VMs (one x86_64 and the other arm64), and the
benefit showed up on both of these machines. Attached patches 0001,
0002, 0003 are to be applied in that order. 0001 is just a preparatory
patch.

First I tried with a simple for loop with a single assignment
(attached forcounter.sql)

By inlining of the two functions, found noticeable reduction in
execution time as shown (figures are in milliseconds, averaged over
multiple runs; taken from 'explain analyze' execution times) :
ARM VM :
   HEAD : 100 ; Patched : 88 => 13.6% improvement
x86 VM :
   HEAD :  71 ; Patched : 66 => 7.63% improvement.

Then I included many assignment statements as shown in attachment
assignmany.sql. This showed further benefit :
ARM VM :
   HEAD : 1820 ; Patched : 1549  => 17.5% improvement
x86 VM :
   HEAD : 1020 ; Patched :  869  => 17.4% improvement

Inlining just exec_stmt() showed the improvement mainly on the arm64
VM (7.4%). For x86, it was 2.7%
But inlining exec_stmt() and exec_cast_value() together showed
benefits on both machines, as can be seen above.

 
   FOR counter IN 1..1800000 LOOP
      id = 0; id = 0; id1 = 0;
      id2 = 0; id3 = 0; id1 = 0; id2 = 0;
      id3 = 0; id = 0; id = 0; id1 = 0;
      id2 = 0; id3 = 0; id1 = 0; id2 = 0;
      id3 = 0;
   END LOOP;

This is not too much typical PLpgSQL code. All expressions are not parametrized - so this test is little bit obscure.

Last strange performance plpgsql benchmark did calculation of pi value. It does something real

Regards

Pavel


--
Thanks,
-Amit Khandekar
Huawei Technologies

Re: Inlining of couple of functions in pl_exec.c improves performance

От
Amit Khandekar
Дата:
On Sat, 23 May 2020 at 23:24, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>    FOR counter IN 1..1800000 LOOP
>       id = 0; id = 0; id1 = 0;
>       id2 = 0; id3 = 0; id1 = 0; id2 = 0;
>       id3 = 0; id = 0; id = 0; id1 = 0;
>       id2 = 0; id3 = 0; id1 = 0; id2 = 0;
>       id3 = 0;
>    END LOOP;
>
> This is not too much typical PLpgSQL code. All expressions are not parametrized - so this test is little bit
obscure.
>
> Last strange performance plpgsql benchmark did calculation of pi value. It does something real

Yeah, basically I wanted to have many statements, and that too with
many assignments where casts are not required. Let me check if I can
come up with a real-enough testcase. Thanks.



Re: Inlining of couple of functions in pl_exec.c improves performance

От
Amit Khandekar
Дата:
On Tue, 26 May 2020 at 09:06, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> On Sat, 23 May 2020 at 23:24, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> >
> >    FOR counter IN 1..1800000 LOOP
> >       id = 0; id = 0; id1 = 0;
> >       id2 = 0; id3 = 0; id1 = 0; id2 = 0;
> >       id3 = 0; id = 0; id = 0; id1 = 0;
> >       id2 = 0; id3 = 0; id1 = 0; id2 = 0;
> >       id3 = 0;
> >    END LOOP;
> >
> > This is not too much typical PLpgSQL code. All expressions are not parametrized - so this test is little bit
obscure.
> >
> > Last strange performance plpgsql benchmark did calculation of pi value. It does something real
>
> Yeah, basically I wanted to have many statements, and that too with
> many assignments where casts are not required. Let me check if I can
> come up with a real-enough testcase. Thanks.

create table tab (id int[]);
insert into tab select array((select ((random() * 100000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 600000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 1000000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 100000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 600000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 1000000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 100000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 600000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 1000000)::bigint) id
from generate_series(1, 30000) order by 1));


-- Return how much two consecutive array elements are apart from each
other, on average; i.e. how much the numbers are spaced out.
-- Input is an ordered array of integers.
CREATE OR REPLACE FUNCTION avg_space(int[]) RETURNS bigint AS $$
DECLARE
  diff int = 0;
  num int;
  prevnum int = 1;
BEGIN
  FOREACH num IN ARRAY $1
  LOOP
    diff = diff + num - prevnum;
    prevnum = num;
  END LOOP;
  RETURN diff/array_length($1, 1);
END;
$$ LANGUAGE plpgsql;

explain analyze select avg_space(id) from tab;
Like earlier figures, these are execution times in milliseconds, taken
from explain-analyze.
ARM VM:
   HEAD                             : 49.8
   patch 0001+0002           : 47.8 => 4.2%
   patch 0001+0002+0003 : 42.9 => 16.1%
x86 VM:
   HEAD                             : 32.8
   patch 0001+0002           : 32.7 => 0%
   patch 0001+0002+0003 : 28.0 => 17.1%



Re: Inlining of couple of functions in pl_exec.c improves performance

От
Pavel Stehule
Дата:
Hi

st 27. 5. 2020 v 13:31 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
On Tue, 26 May 2020 at 09:06, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> On Sat, 23 May 2020 at 23:24, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> >
> >    FOR counter IN 1..1800000 LOOP
> >       id = 0; id = 0; id1 = 0;
> >       id2 = 0; id3 = 0; id1 = 0; id2 = 0;
> >       id3 = 0; id = 0; id = 0; id1 = 0;
> >       id2 = 0; id3 = 0; id1 = 0; id2 = 0;
> >       id3 = 0;
> >    END LOOP;
> >
> > This is not too much typical PLpgSQL code. All expressions are not parametrized - so this test is little bit obscure.
> >
> > Last strange performance plpgsql benchmark did calculation of pi value. It does something real
>
> Yeah, basically I wanted to have many statements, and that too with
> many assignments where casts are not required. Let me check if I can
> come up with a real-enough testcase. Thanks.

create table tab (id int[]);
insert into tab select array((select ((random() * 100000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 600000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 1000000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 100000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 600000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 1000000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 100000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 600000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 1000000)::bigint) id
from generate_series(1, 30000) order by 1));


-- Return how much two consecutive array elements are apart from each
other, on average; i.e. how much the numbers are spaced out.
-- Input is an ordered array of integers.
CREATE OR REPLACE FUNCTION avg_space(int[]) RETURNS bigint AS $$
DECLARE
  diff int = 0;
  num int;
  prevnum int = 1;
BEGIN
  FOREACH num IN ARRAY $1
  LOOP
    diff = diff + num - prevnum;
    prevnum = num;
  END LOOP;
  RETURN diff/array_length($1, 1);
END;
$$ LANGUAGE plpgsql;

explain analyze select avg_space(id) from tab;
Like earlier figures, these are execution times in milliseconds, taken
from explain-analyze.
ARM VM:
   HEAD                             : 49.8
   patch 0001+0002           : 47.8 => 4.2%
   patch 0001+0002+0003 : 42.9 => 16.1%
x86 VM:
   HEAD                             : 32.8
   patch 0001+0002           : 32.7 => 0%
   patch 0001+0002+0003 : 28.0 => 17.1%

I tested these patches on my notebook - Lenovo T520 (x64) - on pi calculation

CREATE OR REPLACE FUNCTION pi_est_1(n int)
RETURNS numeric AS $$
DECLARE
  accum double precision DEFAULT 1.0;
  c1 double precision DEFAULT 2.0;
  c2 double precision DEFAULT 1.0;
BEGIN
  FOR i IN 1..n
  LOOP
    accum := accum * ((c1 * c1) / (c2 * (c2 + 2.0)));
    c1 := c1 + 2.0;
    c2 := c2 + 2.0;
  END LOOP;
  RETURN accum * 2.0;
END;
$$ LANGUAGE plpgsql;

and I see about 3-5% of speedup

extra simply test shows

do $$ declare i int default 0; begin while i < 100000000 loop i := i + 1; end loop; raise notice 'i=%', i;end $$;

2% speedup

I don't see 17% anywhere, but 3-5% is not bad.

patch 0001 has sense and can help with code structure
patch 0002 it is little bit against simplicity, but for PLpgSQL with blocks structure it is correct.
patch 0003 has sense too

tested on Fedora 32 with gcc 10.1.1 and -O2 option

Regards

Pavel





Re: Inlining of couple of functions in pl_exec.c improves performance

От
Amit Khandekar
Дата:
On Thu, 28 May 2020 at 14:39, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> I don't see 17% anywhere, but 3-5% is not bad.
Did you see 3-5% only for the pi function, or did you see the same
improvement also for the functions that I wrote ? I was getting a
consistent result of 14-18 % on both of the VMs. Also, is your test
machine running on Windows ? All the machines I tested were on Linux
kernel (Ubuntu)

Below are my results for your pi_est_1() function. For this function,
I am consistently getting 5-9 % improvement. I tested on 3 machines :

gcc : 8.4.0. -O2 option
OS : Ubuntu Bionic

explain analyze select pi_est_1(10000000)

1. x86_64 laptop VM (Intel Core i7-8665U)
HEAD :    2666 2617 2600 2630
Patched : 2502 2409 2460 2444


2. x86_64 VM (Xeon Gold 6151)
HEAD :    1664 1662 1721 1660
Patched : 1541 1548 1537 1526

3. ARM64 VM (Kunpeng)
HEAD :    2873 2864 2860 2861
Patched : 2568 2513 2501 2538


>
> patch 0001 has sense and can help with code structure
> patch 0002 it is little bit against simplicity, but for PLpgSQL with blocks structure it is correct.

Here, I moved the exec_stmt code into exec_stmts() function because
exec_stmts() was the only caller, and that function is not that big. I
am assuming you were referring to this point when you said it is a bit
against simplicity. But I didn't get what you implied by "but for
PLpgSQL with blocks structure it is correct"

-- 
Thanks,
-Amit Khandekar
Huawei Technologies



Re: Inlining of couple of functions in pl_exec.c improves performance

От
Pavel Stehule
Дата:


so 30. 5. 2020 v 7:28 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
On Thu, 28 May 2020 at 14:39, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> I don't see 17% anywhere, but 3-5% is not bad.
Did you see 3-5% only for the pi function, or did you see the same
improvement also for the functions that I wrote ? I was getting a
consistent result of 14-18 % on both of the VMs. Also, is your test
machine running on Windows ? All the machines I tested were on Linux
kernel (Ubuntu)

It was similar with your example too.

I tested it on Linux Fedora Core 32 - laptop T520 - I7.

I think so the effect of these patches strongly depends on CPU and compiler - but it is micro optimization, and when I look to profiler, the bottle neck is elsewhere.



Below are my results for your pi_est_1() function. For this function,
I am consistently getting 5-9 % improvement. I tested on 3 machines :

gcc : 8.4.0. -O2 option
OS : Ubuntu Bionic

explain analyze select pi_est_1(10000000)

1. x86_64 laptop VM (Intel Core i7-8665U)
HEAD :    2666 2617 2600 2630
Patched : 2502 2409 2460 2444


2. x86_64 VM (Xeon Gold 6151)
HEAD :    1664 1662 1721 1660
Patched : 1541 1548 1537 1526

3. ARM64 VM (Kunpeng)
HEAD :    2873 2864 2860 2861
Patched : 2568 2513 2501 2538


>
> patch 0001 has sense and can help with code structure
> patch 0002 it is little bit against simplicity, but for PLpgSQL with blocks structure it is correct.

Here, I moved the exec_stmt code into exec_stmts() function because
exec_stmts() was the only caller, and that function is not that big. I
am assuming you were referring to this point when you said it is a bit
against simplicity. But I didn't get what you implied by "but for
PLpgSQL with blocks structure it is correct"

Nested statement in PLpgSQL is always a list of statements. It is not single statement ever. So is not too strange don't have a function execute_stmt.

Pavel


--
Thanks,
-Amit Khandekar
Huawei Technologies

Re: Inlining of couple of functions in pl_exec.c improves performance

От
Michael Paquier
Дата:
On Sat, May 23, 2020 at 10:33:43PM +0530, Amit Khandekar wrote:
> By inlining of the two functions, found noticeable reduction in
> execution time as shown (figures are in milliseconds, averaged over
> multiple runs; taken from 'explain analyze' execution times) :
> ARM VM :
>    HEAD : 100 ; Patched : 88 => 13.6% improvement
> x86 VM :
>    HEAD :  71 ; Patched : 66 => 7.63% improvement.
>
> Then I included many assignment statements as shown in attachment
> assignmany.sql. This showed further benefit :
> ARM VM :
>    HEAD : 1820 ; Patched : 1549  => 17.5% improvement
> x86 VM :
>    HEAD : 1020 ; Patched :  869  => 17.4% improvement
>
> Inlining just exec_stmt() showed the improvement mainly on the arm64
> VM (7.4%). For x86, it was 2.7%
> But inlining exec_stmt() and exec_cast_value() together showed
> benefits on both machines, as can be seen above.

This stuff is interesting.  Do you have some perf profiles to share?
I am wondering what's the effect of the inlining with your test
cases.
--
Michael

Вложения

Re: Inlining of couple of functions in pl_exec.c improves performance

От
Amit Khandekar
Дата:
On Sun, 31 May 2020 at 08:04, Michael Paquier <michael@paquier.xyz> wrote:
> This stuff is interesting.  Do you have some perf profiles to share?
> I am wondering what's the effect of the inlining with your test
> cases.

Below are the perf numbers for asignmany.sql :

HEAD :

+   16.88%  postgres  postgres           [.] CachedPlanIsSimplyValid
+   16.64%  postgres  plpgsql.so         [.] exec_stmt
+   15.56%  postgres  plpgsql.so         [.] exec_eval_expr
+   13.58%  postgres  plpgsql.so         [.] exec_assign_value
+    7.49%  postgres  plpgsql.so         [.] exec_cast_value
+    7.17%  postgres  plpgsql.so         [.] exec_assign_expr
+    5.39%  postgres  postgres           [.] MemoryContextReset
+    3.91%  postgres  postgres           [.] ExecJustConst
+    3.33%  postgres  postgres           [.] recomputeNamespacePath
+    2.88%  postgres  postgres           [.] OverrideSearchPathMatchesCurrent
+    2.18%  postgres  plpgsql.so         [.] exec_eval_cleanup.isra.17
+    2.15%  postgres  plpgsql.so         [.] exec_stmts
+    1.32%  postgres  plpgsql.so         [.] MemoryContextReset@plt
+    0.57%  postgres  plpgsql.so         [.] CachedPlanIsSimplyValid@plt
+    0.57%  postgres  postgres           [.] GetUserId
     0.30%  postgres  plpgsql.so         [.] assign_simple_var.isra.13
     0.05%  postgres  [kernel.kallsyms]  [k] unmap_page_range

Patched :

+   18.22%  postgres  postgres           [.] CachedPlanIsSimplyValid
+   17.25%  postgres  plpgsql.so         [.] exec_eval_expr
+   16.31%  postgres  plpgsql.so         [.] exec_stmts
+   15.00%  postgres  plpgsql.so         [.] exec_assign_value
+    7.56%  postgres  plpgsql.so         [.] exec_assign_expr
+    5.64%  postgres  postgres           [.] MemoryContextReset
+    5.16%  postgres  postgres           [.] ExecJustConst
+    4.86%  postgres  postgres           [.] recomputeNamespacePath
+    4.54%  postgres  postgres           [.] OverrideSearchPathMatchesCurrent
+    2.33%  postgres  plpgsql.so         [.] exec_eval_cleanup.isra.17
+    1.26%  postgres  plpgsql.so         [.] MemoryContextReset@plt
+    0.81%  postgres  postgres           [.] GetUserId
+    0.71%  postgres  plpgsql.so         [.] CachedPlanIsSimplyValid@plt
     0.26%  postgres  plpgsql.so         [.] assign_simple_var.isra.13
     0.03%  postgres  [kernel.kallsyms]  [k] unmap_page_range
     0.02%  postgres  [kernel.kallsyms]  [k] mark_page_accessed

Notice the reduction in percentages :
HEAD : exec_stmts + exec_stmt = 18.79
Patched : exec_stmts = 16.31

HEAD : exec_assign_value + exec_cast_value : 21.07
Patched : exec_assign_value = 15.00

As expected, reduction of percentage in these two functions caused
other functions like CachedPlanIsSimplyValid() and exec_eval_expr() to
show rise in their percentages.



Re: Inlining of couple of functions in pl_exec.c improves performance

От
Amit Khandekar
Дата:
On Sat, 30 May 2020 at 11:11, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> I think so the effect of these patches strongly depends on CPU and compile

I quickly tried pi() with gcc 10 as well, and saw more or less the
same benefit. I think, we are bound to see some differences in the
benefits across architectures, kernels and compilers; but looks like
some benefit is always there.

> but it is micro optimization, and when I look to profiler, the bottle neck is elsewhere.

Please check the perf numbers in my reply to Michael. I suppose you
meant CachedPlanIsSimplyValid() when you say the bottle neck is
elsewhere ? Yeah, this function is always the hottest spot, which I
recall is being discussed elsewhere. But I always see exec_stmt(),
exec_assign_value as the next functions.

>> > patch 0002 it is little bit against simplicity, but for PLpgSQL with blocks structure it is correct.
>>
>> Here, I moved the exec_stmt code into exec_stmts() function because
>> exec_stmts() was the only caller, and that function is not that big. I
>> am assuming you were referring to this point when you said it is a bit
>> against simplicity. But I didn't get what you implied by "but for
>> PLpgSQL with blocks structure it is correct"
>
>
> Nested statement in PLpgSQL is always a list of statements. It is not single statement ever. So is not too strange
don'thave a function execute_stmt.
 

Right.



Re: Inlining of couple of functions in pl_exec.c improves performance

От
Pavel Stehule
Дата:


po 1. 6. 2020 v 8:15 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
On Sat, 30 May 2020 at 11:11, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> I think so the effect of these patches strongly depends on CPU and compile

I quickly tried pi() with gcc 10 as well, and saw more or less the
same benefit. I think, we are bound to see some differences in the
benefits across architectures, kernels and compilers; but looks like
some benefit is always there.

> but it is micro optimization, and when I look to profiler, the bottle neck is elsewhere.

Please check the perf numbers in my reply to Michael. I suppose you
meant CachedPlanIsSimplyValid() when you say the bottle neck is
elsewhere ? Yeah, this function is always the hottest spot, which I
recall is being discussed elsewhere. But I always see exec_stmt(),
exec_assign_value as the next functions.

It is hard to read the profile result, because these functions are nested together. For your example 

18.22%  postgres  postgres           [.] CachedPlanIsSimplyValid

Is little bit strange, and probably this is real bottleneck in your simple example, and maybe some work can be done there, because you assign just constant.

On second hand, your example is pretty unrealistic - and against any developer's best practices for writing cycles.

I think so we can look on PostGIS, where is some computing heavy routines in PLpgSQL, and we can look on real profiles.

Probably the most people will have benefit from these optimization.




>> > patch 0002 it is little bit against simplicity, but for PLpgSQL with blocks structure it is correct.
>>
>> Here, I moved the exec_stmt code into exec_stmts() function because
>> exec_stmts() was the only caller, and that function is not that big. I
>> am assuming you were referring to this point when you said it is a bit
>> against simplicity. But I didn't get what you implied by "but for
>> PLpgSQL with blocks structure it is correct"
>
>
> Nested statement in PLpgSQL is always a list of statements. It is not single statement ever. So is not too strange don't have a function execute_stmt.

Right.

Re: Inlining of couple of functions in pl_exec.c improves performance

От
Amit Khandekar
Дата:
On Mon, 1 Jun 2020 at 12:27, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> po 1. 6. 2020 v 8:15 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
>>
>> On Sat, 30 May 2020 at 11:11, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> > I think so the effect of these patches strongly depends on CPU and compile
>>
>> I quickly tried pi() with gcc 10 as well, and saw more or less the
>> same benefit. I think, we are bound to see some differences in the
>> benefits across architectures, kernels and compilers; but looks like
>> some benefit is always there.
>>
>> > but it is micro optimization, and when I look to profiler, the bottle neck is elsewhere.
>>
>> Please check the perf numbers in my reply to Michael. I suppose you
>> meant CachedPlanIsSimplyValid() when you say the bottle neck is
>> elsewhere ? Yeah, this function is always the hottest spot, which I
>> recall is being discussed elsewhere. But I always see exec_stmt(),
>> exec_assign_value as the next functions.
>
>
> It is hard to read the profile result, because these functions are nested together. For your example
>
> 18.22%  postgres  postgres           [.] CachedPlanIsSimplyValid
>
> Is little bit strange, and probably this is real bottleneck in your simple example, and maybe some work can be done
there,because you assign just constant. 

I had earlier had a quick look on this one. CachedPlanIsSimplyValid()
was, I recall, hitting a hotspot when it tries to access
plansource->search_path (possibly cacheline miss). But didn't get a
chance to further dig on that. For now, i am focusing on these other
functions for which the patches were submitted.


>
> On second hand, your example is pretty unrealistic - and against any developer's best practices for writing cycles.
>
> I think so we can look on PostGIS, where is some computing heavy routines in PLpgSQL, and we can look on real
profiles.
>
> Probably the most people will have benefit from these optimization.

I understand it's not a real world example. For generating perf
figures, I had to use an example which amplifies the benefits, so that
the effect of the patches on the perf figures also becomes visible.
Hence, used that example. I had shown the benefits up-thread using a
practical function avg_space(). But the perf figures for that example
were varying a lot.

So below, what I did was : Run the avg_space() ~150 times, and took
perf report. This stabilized the results a bit :

HEAD :
+   16.10%  17.29%  16.82%  postgres  postgres            [.]
ExecInterpExpr
+   13.80%  13.56%  14.49%  postgres  plpgsql.so          [.]
exec_assign_value
+   12.64%  12.10%  12.09%  postgres  plpgsql.so          [.]
plpgsql_param_eval_var
+   12.15%  11.28%  11.05%  postgres  plpgsql.so          [.]
exec_stmt
+   10.81%  10.24%  10.55%  postgres  plpgsql.so          [.]
exec_eval_expr
+    9.50%   9.35%   9.37%  postgres  plpgsql.so          [.]
exec_cast_value
.....
+    1.19%   1.06%   1.21%  postgres  plpgsql.so          [.]
exec_stmts


0001+0002 patches applied (i.e. inline exec_stmt) :
+   16.90%  17.20%  16.54%  postgres  postgres            [.]
ExecInterpExpr
+   16.42%  15.37%  15.28%  postgres  plpgsql.so          [.]
exec_assign_value
+   11.34%  11.92%  11.93%  postgres  plpgsql.so          [.]
plpgsql_param_eval_var
+   11.18%  11.86%  10.99%  postgres  plpgsql.so          [.] exec_stmts.part.0
+   10.51%   9.52%  10.61%  postgres  plpgsql.so          [.]
exec_eval_expr
+    9.39%   9.48%   9.30%  postgres  plpgsql.so          [.]
exec_cast_value

HEAD : exec_stmts + exec_stmt = ~12.7 %
Patched (0001+0002): exec_stmts = 11.3 %

Just 0003 patch applied (i.e. inline exec_cast_value) :
+   17.00%  16.77%  17.09% postgres  postgres           [.] ExecInterpExpr
+   15.21%  15.64%  15.09% postgres  plpgsql.so         [.] exec_assign_value
+   14.48%  14.06%  13.94% postgres  plpgsql.so         [.] exec_stmt
+   13.26%  13.30%  13.14% postgres  plpgsql.so         [.]
plpgsql_param_eval_var
+   11.48%  11.64%  12.66% postgres  plpgsql.so         [.] exec_eval_expr
....
+    1.03%   0.85%   0.87% postgres  plpgsql.so         [.] exec_stmts

HEAD : exec_assign_value + exec_cast_value = ~23.4 %
Patched (0001+0002): exec_assign_value =  15.3%


Time in milliseconds after calling avg_space() 150 times :
HEAD  : 7210
Patch 0001+0002 : 6925
Patch 0003 : 6670
Patch 0001+0002+0003 : 6346



Re: Inlining of couple of functions in pl_exec.c improves performance

От
Pavel Stehule
Дата:


po 1. 6. 2020 v 15:59 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
On Mon, 1 Jun 2020 at 12:27, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> po 1. 6. 2020 v 8:15 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
>>
>> On Sat, 30 May 2020 at 11:11, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> > I think so the effect of these patches strongly depends on CPU and compile
>>
>> I quickly tried pi() with gcc 10 as well, and saw more or less the
>> same benefit. I think, we are bound to see some differences in the
>> benefits across architectures, kernels and compilers; but looks like
>> some benefit is always there.
>>
>> > but it is micro optimization, and when I look to profiler, the bottle neck is elsewhere.
>>
>> Please check the perf numbers in my reply to Michael. I suppose you
>> meant CachedPlanIsSimplyValid() when you say the bottle neck is
>> elsewhere ? Yeah, this function is always the hottest spot, which I
>> recall is being discussed elsewhere. But I always see exec_stmt(),
>> exec_assign_value as the next functions.
>
>
> It is hard to read the profile result, because these functions are nested together. For your example
>
> 18.22%  postgres  postgres           [.] CachedPlanIsSimplyValid
>
> Is little bit strange, and probably this is real bottleneck in your simple example, and maybe some work can be done there, because you assign just constant.

I had earlier had a quick look on this one. CachedPlanIsSimplyValid()
was, I recall, hitting a hotspot when it tries to access
plansource->search_path (possibly cacheline miss). But didn't get a
chance to further dig on that. For now, i am focusing on these other
functions for which the patches were submitted.


>
> On second hand, your example is pretty unrealistic - and against any developer's best practices for writing cycles.
>
> I think so we can look on PostGIS, where is some computing heavy routines in PLpgSQL, and we can look on real profiles.
>
> Probably the most people will have benefit from these optimization.

I understand it's not a real world example. For generating perf
figures, I had to use an example which amplifies the benefits, so that
the effect of the patches on the perf figures also becomes visible.
Hence, used that example. I had shown the benefits up-thread using a
practical function avg_space(). But the perf figures for that example
were varying a lot.

So below, what I did was : Run the avg_space() ~150 times, and took
perf report. This stabilized the results a bit :

HEAD :
+   16.10%  17.29%  16.82%  postgres  postgres            [.]
ExecInterpExpr
+   13.80%  13.56%  14.49%  postgres  plpgsql.so          [.]
exec_assign_value
+   12.64%  12.10%  12.09%  postgres  plpgsql.so          [.]
plpgsql_param_eval_var
+   12.15%  11.28%  11.05%  postgres  plpgsql.so          [.]
exec_stmt
+   10.81%  10.24%  10.55%  postgres  plpgsql.so          [.]
exec_eval_expr
+    9.50%   9.35%   9.37%  postgres  plpgsql.so          [.]
exec_cast_value
.....
+    1.19%   1.06%   1.21%  postgres  plpgsql.so          [.]
exec_stmts


0001+0002 patches applied (i.e. inline exec_stmt) :
+   16.90%  17.20%  16.54%  postgres  postgres            [.]
ExecInterpExpr
+   16.42%  15.37%  15.28%  postgres  plpgsql.so          [.]
exec_assign_value
+   11.34%  11.92%  11.93%  postgres  plpgsql.so          [.]
plpgsql_param_eval_var
+   11.18%  11.86%  10.99%  postgres  plpgsql.so          [.] exec_stmts.part.0
+   10.51%   9.52%  10.61%  postgres  plpgsql.so          [.]
exec_eval_expr
+    9.39%   9.48%   9.30%  postgres  plpgsql.so          [.]
exec_cast_value

HEAD : exec_stmts + exec_stmt = ~12.7 %
Patched (0001+0002): exec_stmts = 11.3 %

Just 0003 patch applied (i.e. inline exec_cast_value) :
+   17.00%  16.77%  17.09% postgres  postgres           [.] ExecInterpExpr
+   15.21%  15.64%  15.09% postgres  plpgsql.so         [.] exec_assign_value
+   14.48%  14.06%  13.94% postgres  plpgsql.so         [.] exec_stmt
+   13.26%  13.30%  13.14% postgres  plpgsql.so         [.]
plpgsql_param_eval_var
+   11.48%  11.64%  12.66% postgres  plpgsql.so         [.] exec_eval_expr
....
+    1.03%   0.85%   0.87% postgres  plpgsql.so         [.] exec_stmts

HEAD : exec_assign_value + exec_cast_value = ~23.4 %
Patched (0001+0002): exec_assign_value =  15.3%


Time in milliseconds after calling avg_space() 150 times :
HEAD  : 7210
Patch 0001+0002 : 6925
Patch 0003 : 6670
Patch 0001+0002+0003 : 6346

Is your patch in commitfest in commitfest application?

Regards

Pavel

Re: Inlining of couple of functions in pl_exec.c improves performance

От
Amit Khandekar
Дата:
On Tue, 9 Jun 2020 at 21:49, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Is your patch in commitfest in commitfest application?

Thanks for reminding me. Just added.
https://commitfest.postgresql.org/28/2590/



Re: Inlining of couple of functions in pl_exec.c improves performance

От
Tom Lane
Дата:
Amit Khandekar <amitdkhan.pg@gmail.com> writes:
> There are a couple of function call overheads I observed in pl/pgsql
> code : exec_stmt() and exec_cast_value(). Removing these overheads
> resulted in some performance gains.

I took a look at the 0001/0002 patches (not 0003 as yet).  I do not
like 0001 too much.  The most concrete problem with it is that
you broke translatability of the error messages, cf the first
translatability guideline at [1].  While that could be fixed by passing
the entire message not just part of it, I don't see anything that we're
gaining by moving that stuff into exec_toplevel_block in the first place.
Certainly, passing a string that describes what will happen *after*
exec_toplevel_block is just weird.  I think what you've got here is
a very arbitrary chopping-up of the existing code based on chance
similarities of the existing callers.  I think we're better off to make
exec_toplevel_block be as nearly as possible a match for exec_stmts'
semantics.

Hence, I propose the attached 0001 to replace 0001/0002.  This should
be basically indistinguishable performance-wise, though I have not
tried to benchmark.  Note that for reviewability's sake, I did not
reindent the former body of exec_stmt, though we'd want to do that
before committing.

Also, 0002 is a small patch on top of that to avoid redundant saves
and restores of estate->err_stmt within the loop in exec_stmts.  This
may well not be a measurable improvement, but it's a pretty obvious
inefficiency in exec_stmts now that it's refactored this way.

            regards, tom lane

[1] https://www.postgresql.org/docs/devel/nls-programmer.html#NLS-GUIDELINES

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index f41d675d65..b02567c88d 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -260,12 +260,12 @@ static MemoryContext get_stmt_mcontext(PLpgSQL_execstate *estate);
 static void push_stmt_mcontext(PLpgSQL_execstate *estate);
 static void pop_stmt_mcontext(PLpgSQL_execstate *estate);

+static int    exec_toplevel_block(PLpgSQL_execstate *estate,
+                                PLpgSQL_stmt_block *block);
 static int    exec_stmt_block(PLpgSQL_execstate *estate,
                             PLpgSQL_stmt_block *block);
 static int    exec_stmts(PLpgSQL_execstate *estate,
                        List *stmts);
-static int    exec_stmt(PLpgSQL_execstate *estate,
-                      PLpgSQL_stmt *stmt);
 static int    exec_stmt_assign(PLpgSQL_execstate *estate,
                              PLpgSQL_stmt_assign *stmt);
 static int    exec_stmt_perform(PLpgSQL_execstate *estate,
@@ -599,8 +599,7 @@ plpgsql_exec_function(PLpgSQL_function *func, FunctionCallInfo fcinfo,
      * Now call the toplevel block of statements
      */
     estate.err_text = NULL;
-    estate.err_stmt = (PLpgSQL_stmt *) (func->action);
-    rc = exec_stmt(&estate, (PLpgSQL_stmt *) func->action);
+    rc = exec_toplevel_block(&estate, func->action);
     if (rc != PLPGSQL_RC_RETURN)
     {
         estate.err_stmt = NULL;
@@ -1015,8 +1014,7 @@ plpgsql_exec_trigger(PLpgSQL_function *func,
      * Now call the toplevel block of statements
      */
     estate.err_text = NULL;
-    estate.err_stmt = (PLpgSQL_stmt *) (func->action);
-    rc = exec_stmt(&estate, (PLpgSQL_stmt *) func->action);
+    rc = exec_toplevel_block(&estate, func->action);
     if (rc != PLPGSQL_RC_RETURN)
     {
         estate.err_stmt = NULL;
@@ -1176,8 +1174,7 @@ plpgsql_exec_event_trigger(PLpgSQL_function *func, EventTriggerData *trigdata)
      * Now call the toplevel block of statements
      */
     estate.err_text = NULL;
-    estate.err_stmt = (PLpgSQL_stmt *) (func->action);
-    rc = exec_stmt(&estate, (PLpgSQL_stmt *) func->action);
+    rc = exec_toplevel_block(&estate, func->action);
     if (rc != PLPGSQL_RC_RETURN)
     {
         estate.err_stmt = NULL;
@@ -1584,6 +1581,38 @@ exception_matches_conditions(ErrorData *edata, PLpgSQL_condition *cond)
 }


+/* ----------
+ * exec_toplevel_block            Execute the toplevel block
+ *
+ * This is intentionally equivalent to executing exec_stmts() with a
+ * list consisting of the one statement.  One tiny difference is that
+ * we do not bother to save and restore estate->err_stmt; the caller
+ * is expected to clear that after we return.
+ * ----------
+ */
+static int
+exec_toplevel_block(PLpgSQL_execstate *estate, PLpgSQL_stmt_block *block)
+{
+    int            rc;
+
+    estate->err_stmt = (PLpgSQL_stmt *) block;
+
+    /* Let the plugin know that we are about to execute this statement */
+    if (*plpgsql_plugin_ptr && (*plpgsql_plugin_ptr)->stmt_beg)
+        ((*plpgsql_plugin_ptr)->stmt_beg) (estate, (PLpgSQL_stmt *) block);
+
+    CHECK_FOR_INTERRUPTS();
+
+    rc = exec_stmt_block(estate, block);
+
+    /* Let the plugin know that we have finished executing this statement */
+    if (*plpgsql_plugin_ptr && (*plpgsql_plugin_ptr)->stmt_end)
+        ((*plpgsql_plugin_ptr)->stmt_end) (estate, (PLpgSQL_stmt *) block);
+
+    return rc;
+}
+
+
 /* ----------
  * exec_stmt_block            Execute a block of statements
  * ----------
@@ -1933,24 +1962,6 @@ exec_stmts(PLpgSQL_execstate *estate, List *stmts)
     foreach(s, stmts)
     {
         PLpgSQL_stmt *stmt = (PLpgSQL_stmt *) lfirst(s);
-        int            rc = exec_stmt(estate, stmt);
-
-        if (rc != PLPGSQL_RC_OK)
-            return rc;
-    }
-
-    return PLPGSQL_RC_OK;
-}
-
-
-/* ----------
- * exec_stmt            Distribute one statement to the statements
- *                type specific execution function.
- * ----------
- */
-static int
-exec_stmt(PLpgSQL_execstate *estate, PLpgSQL_stmt *stmt)
-{
     PLpgSQL_stmt *save_estmt;
     int            rc = -1;

@@ -2088,7 +2099,11 @@ exec_stmt(PLpgSQL_execstate *estate, PLpgSQL_stmt *stmt)

     estate->err_stmt = save_estmt;

-    return rc;
+    if (rc != PLPGSQL_RC_OK)
+        return rc;
+    }                            /* end of loop over statements */
+
+    return PLPGSQL_RC_OK;
 }


diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index b02567c88d..8bc4a7a3d2 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -1946,6 +1946,7 @@ exec_stmt_block(PLpgSQL_execstate *estate, PLpgSQL_stmt_block *block)
 static int
 exec_stmts(PLpgSQL_execstate *estate, List *stmts)
 {
+    PLpgSQL_stmt *save_estmt = estate->err_stmt;
     ListCell   *s;

     if (stmts == NIL)
@@ -1962,10 +1963,8 @@ exec_stmts(PLpgSQL_execstate *estate, List *stmts)
     foreach(s, stmts)
     {
         PLpgSQL_stmt *stmt = (PLpgSQL_stmt *) lfirst(s);
-    PLpgSQL_stmt *save_estmt;
     int            rc = -1;

-    save_estmt = estate->err_stmt;
     estate->err_stmt = stmt;

     /* Let the plugin know that we are about to execute this statement */
@@ -2097,12 +2096,14 @@ exec_stmts(PLpgSQL_execstate *estate, List *stmts)
     if (*plpgsql_plugin_ptr && (*plpgsql_plugin_ptr)->stmt_end)
         ((*plpgsql_plugin_ptr)->stmt_end) (estate, stmt);

-    estate->err_stmt = save_estmt;
-
     if (rc != PLPGSQL_RC_OK)
+    {
+        estate->err_stmt = save_estmt;
         return rc;
+    }
     }                            /* end of loop over statements */

+    estate->err_stmt = save_estmt;
     return PLPGSQL_RC_OK;
 }


Re: Inlining of couple of functions in pl_exec.c improves performance

От
Amit Khandekar
Дата:
On Thu, 2 Jul 2020 at 03:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Amit Khandekar <amitdkhan.pg@gmail.com> writes:
> > There are a couple of function call overheads I observed in pl/pgsql
> > code : exec_stmt() and exec_cast_value(). Removing these overheads
> > resulted in some performance gains.
>
> I took a look at the 0001/0002 patches (not 0003 as yet).  I do not
> like 0001 too much.  The most concrete problem with it is that
> you broke translatability of the error messages, cf the first
> translatability guideline at [1].

Yeah, I thought we can safely use %s for proper nouns such as "trigger
procedure" or "function" as those would not be translated. But looks
like even if they won't be translated, the difference in word order
among languages might create problems with this.

> While that could be fixed by passing
> the entire message not just part of it, I don't see anything that we're
> gaining by moving that stuff into exec_toplevel_block in the first place.
> Certainly, passing a string that describes what will happen *after*
> exec_toplevel_block is just weird.  I think what you've got here is
> a very arbitrary chopping-up of the existing code based on chance
> similarities of the existing callers.  I think we're better off to make
> exec_toplevel_block be as nearly as possible a match for exec_stmts'
> semantics.

I thought some of those things that I kept in exec_toplevel_block() do
look like they belong to a top level function. But what you are saying
also makes sense : better to keep it similar to exec_stmts.

>
> Hence, I propose the attached 0001 to replace 0001/0002.  This should
> be basically indistinguishable performance-wise, though I have not
> tried to benchmark.

Thanks for the patches. Yeah, performance-wise it does look similar;
but anyways I tried running, and got similar performance numbers.

> Note that for reviewability's sake, I did not
> reindent the former body of exec_stmt, though we'd want to do that
> before committing.

Right.

>
> Also, 0002 is a small patch on top of that to avoid redundant saves
> and restores of estate->err_stmt within the loop in exec_stmts.  This
> may well not be a measurable improvement, but it's a pretty obvious
> inefficiency in exec_stmts now that it's refactored this way.

0002 also makes sense.



Re: Inlining of couple of functions in pl_exec.c improves performance

От
Tom Lane
Дата:
I did some performance testing on 0001+0002 here, and found that
for me, there's basically no change on x86_64 but a win of 2 to 3
percent on aarch64, using Pavel's pi_est_1() as a representative
case for simple plpgsql statements.  That squares with your original
results I believe.  It's not clear to me whether any of the later
tests in this thread measured these changes in isolation, or only
with 0003 added.

Anyway, that's good enough for me, so I pushed 0001+0002 after a
little bit of additional cosmetic tweaking.

I attach your original 0003 here (it still applies, with some line
offsets).  That's just so the cfbot doesn't get confused about what
it's supposed to test now.

            regards, tom lane

From 56aac7dff8243ff6dc9b8e72651cb1d9a018f1b3 Mon Sep 17 00:00:00 2001
From: Amit Khandekar <amitdkhan.pg@gmail.com>
Date: Sat, 23 May 2020 21:53:24 +0800
Subject: [PATCH 3/3] Inline exec_cast_value().

This function does not do the casting if not required. So move the
code that actually does the cast into a separate function, so as to
reduce the exec_cast_value() code and make it inline.

There are frequent calls of this function, so inlining it has shown to
improve performance by as much as 14%
---
 src/pl/plpgsql/src/pl_exec.c | 63 ++++++++++++++++++++++++------------
 1 file changed, 43 insertions(+), 20 deletions(-)

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index d5377a6dad..4028a3f0f6 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -425,10 +425,14 @@ static void instantiate_empty_record_variable(PLpgSQL_execstate *estate,
                                               PLpgSQL_rec *rec);
 static char *convert_value_to_string(PLpgSQL_execstate *estate,
                                      Datum value, Oid valtype);
-static Datum exec_cast_value(PLpgSQL_execstate *estate,
+static inline Datum exec_cast_value(PLpgSQL_execstate *estate,
                              Datum value, bool *isnull,
                              Oid valtype, int32 valtypmod,
                              Oid reqtype, int32 reqtypmod);
+static Datum do_cast_value(PLpgSQL_execstate *estate,
+                Datum value, bool *isnull,
+                Oid valtype, int32 valtypmod,
+                Oid reqtype, int32 reqtypmod);
 static plpgsql_CastHashEntry *get_cast_hashentry(PLpgSQL_execstate *estate,
                                                  Oid srctype, int32 srctypmod,
                                                  Oid dsttype, int32 dsttypmod);
@@ -7764,9 +7768,11 @@ convert_value_to_string(PLpgSQL_execstate *estate, Datum value, Oid valtype)
  * also contain the result Datum if we have to do a conversion to a pass-
  * by-reference data type.  Be sure to do an exec_eval_cleanup() call when
  * done with the result.
+ * The actual code to cast is kept outside of this function, to keep it short
+ * since it is an inline function, being called frequently.
  * ----------
  */
-static Datum
+static inline Datum
 exec_cast_value(PLpgSQL_execstate *estate,
                 Datum value, bool *isnull,
                 Oid valtype, int32 valtypmod,
@@ -7777,31 +7783,48 @@ exec_cast_value(PLpgSQL_execstate *estate,
      */
     if (valtype != reqtype ||
         (valtypmod != reqtypmod && reqtypmod != -1))
-    {
-        plpgsql_CastHashEntry *cast_entry;
+        value = do_cast_value(estate, value, isnull, valtype, valtypmod,
+                              reqtype, reqtypmod);

-        cast_entry = get_cast_hashentry(estate,
-                                        valtype, valtypmod,
-                                        reqtype, reqtypmod);
-        if (cast_entry)
-        {
-            ExprContext *econtext = estate->eval_econtext;
-            MemoryContext oldcontext;
+    return value;
+}

-            oldcontext = MemoryContextSwitchTo(get_eval_mcontext(estate));
+/* ----------
+ * do_cast_value            cast the input value.
+ *
+ * Returns the cast value.
+ * Check comments in the wrapper function exec_cast_value().
+ * ----------
+ */
+static Datum
+do_cast_value(PLpgSQL_execstate *estate,
+                Datum value, bool *isnull,
+                Oid valtype, int32 valtypmod,
+                Oid reqtype, int32 reqtypmod)
+{
+    plpgsql_CastHashEntry *cast_entry;

-            econtext->caseValue_datum = value;
-            econtext->caseValue_isNull = *isnull;
+    cast_entry = get_cast_hashentry(estate,
+                                    valtype, valtypmod,
+                                    reqtype, reqtypmod);
+    if (cast_entry)
+    {
+        ExprContext *econtext = estate->eval_econtext;
+        MemoryContext oldcontext;

-            cast_entry->cast_in_use = true;
+        oldcontext = MemoryContextSwitchTo(get_eval_mcontext(estate));

-            value = ExecEvalExpr(cast_entry->cast_exprstate, econtext,
-                                 isnull);
+        econtext->caseValue_datum = value;
+        econtext->caseValue_isNull = *isnull;

-            cast_entry->cast_in_use = false;
+        cast_entry->cast_in_use = true;

-            MemoryContextSwitchTo(oldcontext);
-        }
+        value = ExecEvalExpr(cast_entry->cast_exprstate, econtext,
+                             isnull);
+
+        cast_entry->cast_in_use = false;
+
+        MemoryContextSwitchTo(oldcontext);
     }

     return value;
--
2.17.1


Re: Inlining of couple of functions in pl_exec.c improves performance

От
Tom Lane
Дата:
I wrote:
> I attach your original 0003 here (it still applies, with some line
> offsets).  That's just so the cfbot doesn't get confused about what
> it's supposed to test now.

Pushed that part now, too.

BTW, the first test run I did on this (on x86_64) was actually several
percent *slower* than HEAD.  I couldn't reproduce that after restarting
the postmaster; all later tests concurred that there was a speedup.
So I suppose that was just some phase-of-the-moon effect, perhaps caused
by an ASLR-dependent collision of bits of code in processor cache.
Still, that illustrates the difficulty of getting useful, reproducible
improvements when doing this kind of hacking.  I tend to think that
most of the time we're better off leaving this to the compiler.

            regards, tom lane



Re: Inlining of couple of functions in pl_exec.c improves performance

От
Amit Khandekar
Дата:
On Sat, 4 Jul 2020 at 01:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I did some performance testing on 0001+0002 here, and found that
> for me, there's basically no change on x86_64 but a win of 2 to 3
> percent on aarch64, using Pavel's pi_est_1() as a representative
> case for simple plpgsql statements.  That squares with your original
> results I believe.  It's not clear to me whether any of the later
> tests in this thread measured these changes in isolation, or only
> with 0003 added.

Yeah I had the same observation. 0001+0002 seems to benefit mostly on
aarch64. And 0003 (exec_case_value) benefited both on amd64 and
aarch64.

>
> Anyway, that's good enough for me, so I pushed 0001+0002 after a
> little bit of additional cosmetic tweaking.
>
> I attach your original 0003 here (it still applies, with some line
> offsets).  That's just so the cfbot doesn't get confused about what
> it's supposed to test now.

Thanks for pushing all the three !

Thanks,
-Amit Khandekar
Huawei Technologies