Обсуждение: calling procedures is slow and consumes extra much memory againstcalling function

Поиск

Список

Период

Сортировка

calling procedures is slow and consumes extra much memory againstcalling function

От

Pavel Stehule

Дата:

10 мая 2020 г., 23:20:53

I try to use procedures in Orafce package, and I did some easy performance tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$

begin v := random() * r; end

$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.

When I rewrite same to functions then

create or replace function p1func2(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

do $$
declare r int default 100; x int; re record;
begin
for i in 1..300000 loop
re := p1func2(r, x);
end loop;
end;
$$;

Then execution is about 1 sec, and memory requirements are +/- zero.

Minimally it looks so CALL statements has a memory issue.

Regards

Pavel

Re: calling procedures is slow and consumes extra much memory againstcalling function

От

Pavel Stehule

Дата:

11 мая 2020 г., 08:25:20

ne 10. 5. 2020 v 22:20 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:

Hi

I try to use procedures in Orafce package, and I did some easy performance tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.

When I rewrite same to functions then

create or replace function p1func2(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

do $$
declare r int default 100; x int; re record;
begin
for i in 1..300000 loop
re := p1func2(r, x);
end loop;
end;
$$;

Then execution is about 1 sec, and memory requirements are +/- zero.

Minimally it looks so CALL statements has a memory issue.

The problem is in plpgsql implementation of CALL statement

In non atomic case - case of using procedures from DO block, the expression plan is not cached, and plan is generating any time. This is reason why it is slow.

Unfortunately, generated plans are not released until SPI_finish. Attached patch fixed this issue.

Regards

Pavel

Regards

Pavel

Вложения

plpgsq-call-fix.patch

Re: calling procedures is slow and consumes extra much memoryagainst calling function

От

Michael Paquier

Дата:

11 мая 2020 г., 09:07:34

On Sun, May 10, 2020 at 10:20:53PM +0200, Pavel Stehule wrote:
> When I rewrite same to functions then
>
> create or replace function p1func2(inout r int, inout v int) as $$
> begin v := random() * r; end
> $$ language plpgsql;
>
> Then execution is about 1 sec, and memory requirements are +/- zero.
>
> Minimally it looks so CALL statements has a memory issue.

Behavior not limited to plpgsql.  A plain SQL function shows the same
leak patterns:
create or replace procedure p1_sql(in r int, in v int)
  as $$ SELECT r + v; $$ language sql;
  And I cannot get valgrind to complain about lost references, so this
  looks like some missing memory context handling.

Also, I actually don't quite get why the context created by
CreateExprContext() cannot be freed before the procedure returns.  A
short test shows no problems in calling FreeExprContext() at the end
of ExecuteCallStmt(), but that does not address everything.  Perhaps a
lack of tests with pass-by-reference expressions and procedures?

Peter?
--
Michael

Вложения

signature.asc

Re: calling procedures is slow and consumes extra much memory againstcalling function

От

Pavel Stehule

Дата:

11 мая 2020 г., 09:07:48

po 11. 5. 2020 v 7:25 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:

Hi

ne 10. 5. 2020 v 22:20 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hi

I try to use procedures in Orafce package, and I did some easy performance tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.

When I rewrite same to functions then

create or replace function p1func2(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

do $$
declare r int default 100; x int; re record;
begin
for i in 1..300000 loop
re := p1func2(r, x);
end loop;
end;
$$;

Then execution is about 1 sec, and memory requirements are +/- zero.

Minimally it looks so CALL statements has a memory issue.

The problem is in plpgsql implementation of CALL statement

In non atomic case - case of using procedures from DO block, the expression plan is not cached, and plan is generating any time. This is reason why it is slow.

Unfortunately, generated plans are not released until SPI_finish. Attached patch fixed this issue.

But now, recursive calling doesn't work :-(. So this patch is not enough

Regards

Pavel

Regards

Pavel

Re: calling procedures is slow and consumes extra much memory againstcalling function

От

Pavel Stehule

Дата:

15 мая 2020 г., 21:36:20

The problem is in plpgsql implementation of CALL statement

In non atomic case - case of using procedures from DO block, the expression plan is not cached, and plan is generating any time. This is reason why it is slow.

Unfortunately, generated plans are not released until SPI_finish. Attached patch fixed this issue.

But now, recursive calling doesn't work :-(. So this patch is not enough

Attached patch is working - all tests passed

It doesn't solve performance, and doesn't solve all memory problems, but significantly reduce memory requirements from 5007 bytes to 439 bytes per one CALL

Regards

Pavel

Regards

Pavel

Regards

Pavel

Вложения

plpgsql-call-fix.patch

Re: calling procedures is slow and consumes extra much memory againstcalling function

От

Ranier Vilela

Дата:

16 мая 2020 г., 01:33:45

Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <pavel.stehule@gmail.com> escreveu:

Hi

I try to use procedures in Orafce package, and I did some easy performance tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.

I am having a consistent result of 3 secs, with a modified version (exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in the testing hardware?

regards,

Ranier Vilela

Re: calling procedures is slow and consumes extra much memory againstcalling function

От

Pavel Stehule

Дата:

16 мая 2020 г., 06:06:34

so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:

Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <pavel.stehule@gmail.com> escreveu:
Hi

I try to use procedures in Orafce package, and I did some easy performance tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.
I am having a consistent result of 3 secs, with a modified version (exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in the testing hardware?

My notebook is old T520, and more I have a configured Postgres with --enable-cassert option.

regards

Pavel

regards,
Ranier Vilela

Re: calling procedures is slow and consumes extra much memory againstcalling function

От

Ranier Vilela

Дата:

16 мая 2020 г., 06:54:47

Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule <pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:
Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <pavel.stehule@gmail.com> escreveu:
Hi

I try to use procedures in Orafce package, and I did some easy performance tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.
I am having a consistent result of 3 secs, with a modified version (exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in the testing hardware?

My notebook is old T520, and more I have a configured Postgres with --enable-cassert option.

The hardware is definitely making a difference, but if you have time and don't mind testing it,

I can send you a patch, not that the modifications are a big deal, but maybe they'll help.

regards,

Ranier Vilela

Re: calling procedures is slow and consumes extra much memory againstcalling function

От

Pavel Stehule

Дата:

16 мая 2020 г., 07:10:03

so 16. 5. 2020 v 5:55 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:

Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule <pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:
Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <pavel.stehule@gmail.com> escreveu:
Hi

I try to use procedures in Orafce package, and I did some easy performance tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.
I am having a consistent result of 3 secs, with a modified version (exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in the testing hardware?

My notebook is old T520, and more I have a configured Postgres with --enable-cassert option.
The hardware is definitely making a difference, but if you have time and don't mind testing it,
I can send you a patch, not that the modifications are a big deal, but maybe they'll help.

send me a patch, please

Pavel

regards,
Ranier Vilela

Re: calling procedures is slow and consumes extra much memory againstcalling function

От

Ranier Vilela

Дата:

16 мая 2020 г., 14:39:18

Em sáb., 16 de mai. de 2020 às 01:10, Pavel Stehule <pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 5:55 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:
Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule <pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:
Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <pavel.stehule@gmail.com> escreveu:
Hi

I try to use procedures in Orafce package, and I did some easy performance tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.
I am having a consistent result of 3 secs, with a modified version (exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in the testing hardware?

My notebook is old T520, and more I have a configured Postgres with --enable-cassert option.
The hardware is definitely making a difference, but if you have time and don't mind testing it,
I can send you a patch, not that the modifications are a big deal, but maybe they'll help.

With more testing, I found that latency increases response time.
With 3 (secs) the test is with localhost.
With 6 (secs) the test is with tcp (local, not between pcs).

Anyway, I would like to know if we have the number of parameters previously, why use List instead of Arrays?
It would not be faster to create plpgsql variables.

regards,

Ranier Vilela

Вложения

pl_exec_test.patch

Re: calling procedures is slow and consumes extra much memory againstcalling function

От

Pavel Stehule

Дата:

16 мая 2020 г., 15:34:42

so 16. 5. 2020 v 13:40 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:

Em sáb., 16 de mai. de 2020 às 01:10, Pavel Stehule <pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 5:55 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:
Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule <pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:
Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <pavel.stehule@gmail.com> escreveu:
Hi

I try to use procedures in Orafce package, and I did some easy performance tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.
I am having a consistent result of 3 secs, with a modified version (exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in the testing hardware?

My notebook is old T520, and more I have a configured Postgres with --enable-cassert option.
The hardware is definitely making a difference, but if you have time and don't mind testing it,
I can send you a patch, not that the modifications are a big deal, but maybe they'll help.
With more testing, I found that latency increases response time.
With 3 (secs) the test is with localhost.
With 6 (secs) the test is with tcp (local, not between pcs).

Anyway, I would like to know if we have the number of parameters previously, why use List instead of Arrays?
It would not be faster to create plpgsql variables.

Why you check SPI_processed?

+ if (SPI_processed == 1)
+ {
+ if (!stmt->target)
+ elog(ERROR, "DO statement returned a row, query \"%s\"", expr->query);
+ }
+ else if (SPI_processed > 1)
+ elog(ERROR, "Procedure call returned more than one row, query \"%s\"", expr->query);

CALL cannot to return rows, so these checks has not sense

regards,
Ranier Vilela

Re: calling procedures is slow and consumes extra much memory againstcalling function

От

Ranier Vilela

Дата:

16 мая 2020 г., 16:23:09

Em sáb., 16 de mai. de 2020 às 09:35, Pavel Stehule <pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 13:40 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:
Em sáb., 16 de mai. de 2020 às 01:10, Pavel Stehule <pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 5:55 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:
Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule <pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:
Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <pavel.stehule@gmail.com> escreveu:
Hi

I try to use procedures in Orafce package, and I did some easy performance tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.
I am having a consistent result of 3 secs, with a modified version (exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in the testing hardware?

My notebook is old T520, and more I have a configured Postgres with --enable-cassert option.
The hardware is definitely making a difference, but if you have time and don't mind testing it,
I can send you a patch, not that the modifications are a big deal, but maybe they'll help.
With more testing, I found that latency increases response time.
With 3 (secs) the test is with localhost.
With 6 (secs) the test is with tcp (local, not between pcs).

Anyway, I would like to know if we have the number of parameters previously, why use List instead of Arrays?
It would not be faster to create plpgsql variables.

Why you check SPI_processed?

+ if (SPI_processed == 1)
+ {
+ if (!stmt->target)
+ elog(ERROR, "DO statement returned a row, query \"%s\"", expr->query);
+ }
+ else if (SPI_processed > 1)
+ elog(ERROR, "Procedure call returned more than one row, query \"%s\"", expr->query);

CALL cannot to return rows, so these checks has not sense

Looking at the original file, this already done, from line 2351,

I just put all the tests together to, if applicable, get out quickly.

regards,

Ranier Vilela

Re: calling procedures is slow and consumes extra much memory againstcalling function

От

Pavel Stehule

Дата:

16 мая 2020 г., 17:06:40

so 16. 5. 2020 v 15:24 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:

Em sáb., 16 de mai. de 2020 às 09:35, Pavel Stehule <pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 13:40 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:
Em sáb., 16 de mai. de 2020 às 01:10, Pavel Stehule <pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 5:55 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:
Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule <pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:
Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <pavel.stehule@gmail.com> escreveu:
Hi

I try to use procedures in Orafce package, and I did some easy performance tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.
I am having a consistent result of 3 secs, with a modified version (exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in the testing hardware?

My notebook is old T520, and more I have a configured Postgres with --enable-cassert option.
The hardware is definitely making a difference, but if you have time and don't mind testing it,
I can send you a patch, not that the modifications are a big deal, but maybe they'll help.
With more testing, I found that latency increases response time.
With 3 (secs) the test is with localhost.
With 6 (secs) the test is with tcp (local, not between pcs).

Anyway, I would like to know if we have the number of parameters previously, why use List instead of Arrays?
It would not be faster to create plpgsql variables.

Why you check SPI_processed?

+ if (SPI_processed == 1)
+ {
+ if (!stmt->target)
+ elog(ERROR, "DO statement returned a row, query \"%s\"", expr->query);
+ }
+ else if (SPI_processed > 1)
+ elog(ERROR, "Procedure call returned more than one row, query \"%s\"", expr->query);

CALL cannot to return rows, so these checks has not sense
Looking at the original file, this already done, from line 2351,
I just put all the tests together to, if applicable, get out quickly.

It's little bit messy. Is not good to mix bugfix and refactoring things together

regards,
Ranier Vilela

Re: calling procedures is slow and consumes extra much memory againstcalling function

От

Ranier Vilela

Дата:

16 мая 2020 г., 17:20:32

Em sáb., 16 de mai. de 2020 às 11:07, Pavel Stehule <pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 15:24 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:
Em sáb., 16 de mai. de 2020 às 09:35, Pavel Stehule <pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 13:40 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:
Em sáb., 16 de mai. de 2020 às 01:10, Pavel Stehule <pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 5:55 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:
Em sáb., 16 de mai. de 2020 às 00:07, Pavel Stehule <pavel.stehule@gmail.com> escreveu:

so 16. 5. 2020 v 0:34 odesílatel Ranier Vilela <ranier.vf@gmail.com> napsal:
Em dom., 10 de mai. de 2020 às 17:21, Pavel Stehule <pavel.stehule@gmail.com> escreveu:
Hi

I try to use procedures in Orafce package, and I did some easy performance tests. I found some hard problems:

1. test case

create or replace procedure p1(inout r int, inout v int) as $$
begin v := random() * r; end
$$ language plpgsql;

This command requires

do $$
declare r int default 100; x int;
begin
for i in 1..300000 loop
call p1(r, x);
end loop;
end;
$$;

about 2.2GB RAM and 10 sec.
I am having a consistent result of 3 secs, with a modified version (exec_stmt_call) of your patch.
But my notebook is (Core 5, 8GB and SSD), could it be a difference in the testing hardware?

My notebook is old T520, and more I have a configured Postgres with --enable-cassert option.
The hardware is definitely making a difference, but if you have time and don't mind testing it,
I can send you a patch, not that the modifications are a big deal, but maybe they'll help.
With more testing, I found that latency increases response time.
With 3 (secs) the test is with localhost.
With 6 (secs) the test is with tcp (local, not between pcs).

Anyway, I would like to know if we have the number of parameters previously, why use List instead of Arrays?
It would not be faster to create plpgsql variables.

Why you check SPI_processed?

+ if (SPI_processed == 1)
+ {
+ if (!stmt->target)
+ elog(ERROR, "DO statement returned a row, query \"%s\"", expr->query);
+ }
+ else if (SPI_processed > 1)
+ elog(ERROR, "Procedure call returned more than one row, query \"%s\"", expr->query);

CALL cannot to return rows, so these checks has not sense
Looking at the original file, this already done, from line 2351,
I just put all the tests together to, if applicable, get out quickly.

It's little bit messy. Is not good to mix bugfix and refactoring things together

Ok, I can understand that.

regards,

Ranier Vilela

Re: calling procedures is slow and consumes extra much memory againstcalling function

От

Amit Khandekar

Дата:

10 июня 2020 г., 12:52:25

On Sat, 16 May 2020 at 00:07, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> Hi
>
>>>
>>> The problem is in plpgsql implementation of CALL statement
>>>
>>> In non atomic case -  case of using procedures from DO block, the expression plan is not cached, and plan is
generatingany time. This is reason why it is slow.

>>>
>>> Unfortunately, generated plans are not released until SPI_finish. Attached patch fixed this issue.
>>
>>
>> But now, recursive calling doesn't work :-(. So this patch is not enough
>
>
> Attached patch is working - all tests passed

Could you show an example testcase that tests this recursive scenario,
with which your earlier patch fails the test, and this v2 patch passes
it ? I am trying to understand the recursive scenario and the re-use
of expr->plan.

>
> It doesn't solve performance, and doesn't solve all memory problems, but significantly reduce memory requirements
from5007 bytes to 439 bytes per one CALL

So now  this patch's intention is to reduce memory consumption, and it
doesn't target slowness improvement, right ?

-- 
Thanks,
-Amit Khandekar
Huawei Technologies

Re: calling procedures is slow and consumes extra much memory againstcalling function

От

Pavel Stehule

Дата:

10 июня 2020 г., 14:42:11

st 10. 6. 2020 v 12:26 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:

On Sat, 16 May 2020 at 00:07, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> Hi
>
>>>
>>> The problem is in plpgsql implementation of CALL statement
>>>
>>> In non atomic case - case of using procedures from DO block, the expression plan is not cached, and plan is generating any time. This is reason why it is slow.
>>>
>>> Unfortunately, generated plans are not released until SPI_finish. Attached patch fixed this issue.
>>
>>
>> But now, recursive calling doesn't work :-(. So this patch is not enough
>
>
> Attached patch is working - all tests passed

Could you show an example testcase that tests this recursive scenario,
with which your earlier patch fails the test, and this v2 patch passes
it ? I am trying to understand the recursive scenario and the re-use
of expr->plan.

it hangs on plpgsql tests. So you can apply first version of patch

and "make check"

>
> It doesn't solve performance, and doesn't solve all memory problems, but significantly reduce memory requirements from 5007 bytes to 439 bytes per one CALL

So now this patch's intention is to reduce memory consumption, and it
doesn't target slowness improvement, right ?

yes. There is a problem with planning every execution when the procedure was called from not top context.

--
Thanks,
-Amit Khandekar
Huawei Technologies

Re: calling procedures is slow and consumes extra much memory againstcalling function

От

Amit Khandekar

Дата:

17 июня 2020 г., 08:52:04

On Wed, 10 Jun 2020 at 17:12, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> st 10. 6. 2020 v 12:26 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
>> Could you show an example testcase that tests this recursive scenario,
>> with which your earlier patch fails the test, and this v2 patch passes
>> it ? I am trying to understand the recursive scenario and the re-use
>> of expr->plan.
>
>
> it hangs on plpgsql tests. So you can apply first version of patch
>
> and "make check"

I could not reproduce the make check hang with the v1 patch. But I
could see a crash with the below testcase. So I understand the purpose
of the plan_owner variable that you introduced in v2.

Consider this recursive test :

create or replace procedure p1(in r int) as $$
begin
   RAISE INFO 'r : % ', r;
   if r < 3 then
      call p1(r+1);
   end if;
end
$$ language plpgsql;

do $$
declare r int default 1;
begin
    call p1(r);
end;
$$;

In p1() with r=2, when the stmt "call p1(r+1)" is being executed,
consider this code of exec_stmt_call() with your v2 patch applied:
if (expr->plan && !expr->plan->saved)
{
   if (plan_owner)
      SPI_freeplan(expr->plan);
   expr->plan = NULL;
}

Here, plan_owner is false. So SPI_freeplan() will not be called, and
expr->plan is set to NULL. Now I have observed that the stmt pointer
and expr pointer is shared between the p1() execution at this r=2
level and the p1() execution at r=1 level. So after the above code is
executed at r=2, when the upper level (r=1) exec_stmt_call() lands to
the same above code snippet, it gets the same expr pointer, but it's
expr->plan is already set to NULL without being freed. From this
logic, it looks like the plan won't get freed whenever the expr/stmt
pointers are shared across recursive levels, since expr->plan is set
to NULL at the lowermost level ? Basically, the handle to the plan is
lost so no one at the upper recursion level can explicitly free it
using SPI_freeplan(), right ? This looks the same as the main issue
where the plan does not get freed for non-recursive calls. I haven't
got a chance to check if we can develop a testcase for this, similar
to your testcase where the memory keeps on increasing.

-Amit

Re: calling procedures is slow and consumes extra much memory againstcalling function

От

Pavel Stehule

Дата:

17 июня 2020 г., 11:23:44

st 17. 6. 2020 v 7:52 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:

On Wed, 10 Jun 2020 at 17:12, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> st 10. 6. 2020 v 12:26 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
>> Could you show an example testcase that tests this recursive scenario,
>> with which your earlier patch fails the test, and this v2 patch passes
>> it ? I am trying to understand the recursive scenario and the re-use
>> of expr->plan.
>
>
> it hangs on plpgsql tests. So you can apply first version of patch
>
> and "make check"

I could not reproduce the make check hang with the v1 patch. But I
could see a crash with the below testcase. So I understand the purpose
of the plan_owner variable that you introduced in v2.

Consider this recursive test :

create or replace procedure p1(in r int) as $$
begin
RAISE INFO 'r : % ', r;
if r < 3 then
call p1(r+1);
end if;
end
$$ language plpgsql;

do $$
declare r int default 1;
begin
call p1(r);
end;
$$;

In p1() with r=2, when the stmt "call p1(r+1)" is being executed,
consider this code of exec_stmt_call() with your v2 patch applied:
if (expr->plan && !expr->plan->saved)
{
if (plan_owner)
SPI_freeplan(expr->plan);
expr->plan = NULL;
}

Here, plan_owner is false. So SPI_freeplan() will not be called, and
expr->plan is set to NULL. Now I have observed that the stmt pointer
and expr pointer is shared between the p1() execution at this r=2
level and the p1() execution at r=1 level. So after the above code is
executed at r=2, when the upper level (r=1) exec_stmt_call() lands to
the same above code snippet, it gets the same expr pointer, but it's
expr->plan is already set to NULL without being freed. From this
logic, it looks like the plan won't get freed whenever the expr/stmt
pointers are shared across recursive levels, since expr->plan is set
to NULL at the lowermost level ? Basically, the handle to the plan is
lost so no one at the upper recursion level can explicitly free it
using SPI_freeplan(), right ? This looks the same as the main issue
where the plan does not get freed for non-recursive calls. I haven't
got a chance to check if we can develop a testcase for this, similar
to your testcase where the memory keeps on increasing.

This is a good consideration.

I am sending updated patch

Pavel

-Amit

Вложения

plpgsql-call-fix-2.patch

Re: calling procedures is slow and consumes extra much memory against calling function

От

Amit Khandekar

Дата:

09 июля 2020 г., 09:27:34

On Wed, 17 Jun 2020 at 13:54, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>
>
> st 17. 6. 2020 v 7:52 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
>>
>> On Wed, 10 Jun 2020 at 17:12, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> > st 10. 6. 2020 v 12:26 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
>> >> Could you show an example testcase that tests this recursive scenario,
>> >> with which your earlier patch fails the test, and this v2 patch passes
>> >> it ? I am trying to understand the recursive scenario and the re-use
>> >> of expr->plan.
>> >
>> >
>> > it hangs on plpgsql tests. So you can apply first version of patch
>> >
>> > and "make check"
>>
>> I could not reproduce the make check hang with the v1 patch. But I
>> could see a crash with the below testcase. So I understand the purpose
>> of the plan_owner variable that you introduced in v2.
>>
>> Consider this recursive test :
>>
>> create or replace procedure p1(in r int) as $$
>> begin
>>    RAISE INFO 'r : % ', r;
>>    if r < 3 then
>>       call p1(r+1);
>>    end if;
>> end
>> $$ language plpgsql;
>>
>> do $$
>> declare r int default 1;
>> begin
>>     call p1(r);
>> end;
>> $$;
>>
>> In p1() with r=2, when the stmt "call p1(r+1)" is being executed,
>> consider this code of exec_stmt_call() with your v2 patch applied:
>> if (expr->plan && !expr->plan->saved)
>> {
>>    if (plan_owner)
>>       SPI_freeplan(expr->plan);
>>    expr->plan = NULL;
>> }
>>
>> Here, plan_owner is false. So SPI_freeplan() will not be called, and
>> expr->plan is set to NULL. Now I have observed that the stmt pointer
>> and expr pointer is shared between the p1() execution at this r=2
>> level and the p1() execution at r=1 level. So after the above code is
>> executed at r=2, when the upper level (r=1) exec_stmt_call() lands to
>> the same above code snippet, it gets the same expr pointer, but it's
>> expr->plan is already set to NULL without being freed. From this
>> logic, it looks like the plan won't get freed whenever the expr/stmt
>> pointers are shared across recursive levels, since expr->plan is set
>> to NULL at the lowermost level ? Basically, the handle to the plan is
>> lost so no one at the upper recursion level can explicitly free it
>> using SPI_freeplan(), right ? This looks the same as the main issue
>> where the plan does not get freed for non-recursive calls. I haven't
>> got a chance to check if we can develop a testcase for this, similar
>> to your testcase where the memory keeps on increasing.
>
>
> This is a good consideration.
>
> I am sending updated patch

Checked the latest patch. Looks like using a local plan rather than
expr->plan pointer for doing the checks does seem to resolve the issue
I raised. That made me think of another scenario :

Now we are checking for plan value and then null'ifying the expr->plan
value. What if expr->plan is different from plan ? Is it possible ? I
was thinking of such scenarios. But couldn't find one. As long as a
plan is always created with saved=true for all levels, or with
saved=false for all levels, we are ok. If we can have a mix of saved
and unsaved plans at different recursion levels, then expr->plan can
be different from the outer local plan because then the expr->plan
will not be set to NULL in the inner level, while the outer level may
have created its own plan. But I think a mix of saved and unsaved
plans are not possible. If you agree, then I think we should at least
have an assert that looks like :

    if (plan && !plan->saved)
    {
        if (plan_owner)
            SPI_freeplan(plan);

        /* If expr->plan  is present, it must be the same plan that we
allocated */
       Assert ( !expr->plan || plan == expr->plan) );

        expr->plan = NULL;
    }

Other than this, I have no other issues. I understand that we have to
do this special handling only for this exec_stmt_call() because it is
only here that we call exec_prepare_plan() with keep_plan = false, so
doing special handling for freeing the plan seems to make sense.

Re: calling procedures is slow and consumes extra much memory against calling function

От

Pavel Stehule

Дата:

11 июля 2020 г., 08:38:21

čt 9. 7. 2020 v 8:28 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:

On Wed, 17 Jun 2020 at 13:54, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>
>
> st 17. 6. 2020 v 7:52 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
>>
>> On Wed, 10 Jun 2020 at 17:12, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> > st 10. 6. 2020 v 12:26 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
>> >> Could you show an example testcase that tests this recursive scenario,
>> >> with which your earlier patch fails the test, and this v2 patch passes
>> >> it ? I am trying to understand the recursive scenario and the re-use
>> >> of expr->plan.
>> >
>> >
>> > it hangs on plpgsql tests. So you can apply first version of patch
>> >
>> > and "make check"
>>
>> I could not reproduce the make check hang with the v1 patch. But I
>> could see a crash with the below testcase. So I understand the purpose
>> of the plan_owner variable that you introduced in v2.
>>
>> Consider this recursive test :
>>
>> create or replace procedure p1(in r int) as $$
>> begin
>> RAISE INFO 'r : % ', r;
>> if r < 3 then
>> call p1(r+1);
>> end if;
>> end
>> $$ language plpgsql;
>>
>> do $$
>> declare r int default 1;
>> begin
>> call p1(r);
>> end;
>> $$;
>>
>> In p1() with r=2, when the stmt "call p1(r+1)" is being executed,
>> consider this code of exec_stmt_call() with your v2 patch applied:
>> if (expr->plan && !expr->plan->saved)
>> {
>> if (plan_owner)
>> SPI_freeplan(expr->plan);
>> expr->plan = NULL;
>> }
>>
>> Here, plan_owner is false. So SPI_freeplan() will not be called, and
>> expr->plan is set to NULL. Now I have observed that the stmt pointer
>> and expr pointer is shared between the p1() execution at this r=2
>> level and the p1() execution at r=1 level. So after the above code is
>> executed at r=2, when the upper level (r=1) exec_stmt_call() lands to
>> the same above code snippet, it gets the same expr pointer, but it's
>> expr->plan is already set to NULL without being freed. From this
>> logic, it looks like the plan won't get freed whenever the expr/stmt
>> pointers are shared across recursive levels, since expr->plan is set
>> to NULL at the lowermost level ? Basically, the handle to the plan is
>> lost so no one at the upper recursion level can explicitly free it
>> using SPI_freeplan(), right ? This looks the same as the main issue
>> where the plan does not get freed for non-recursive calls. I haven't
>> got a chance to check if we can develop a testcase for this, similar
>> to your testcase where the memory keeps on increasing.
>
>
> This is a good consideration.
>
> I am sending updated patch

Checked the latest patch. Looks like using a local plan rather than
expr->plan pointer for doing the checks does seem to resolve the issue
I raised. That made me think of another scenario :

Now we are checking for plan value and then null'ifying the expr->plan
value. What if expr->plan is different from plan ? Is it possible ? I
was thinking of such scenarios. But couldn't find one. As long as a
plan is always created with saved=true for all levels, or with
saved=false for all levels, we are ok. If we can have a mix of saved
and unsaved plans at different recursion levels, then expr->plan can
be different from the outer local plan because then the expr->plan
will not be set to NULL in the inner level, while the outer level may
have created its own plan. But I think a mix of saved and unsaved
plans are not possible. If you agree, then I think we should at least
have an assert that looks like :

if (plan && !plan->saved)
{
if (plan_owner)
SPI_freeplan(plan);

/* If expr->plan is present, it must be the same plan that we
allocated */
Assert ( !expr->plan || plan == expr->plan) );

expr->plan = NULL;
}

Other than this, I have no other issues. I understand that we have to
do this special handling only for this exec_stmt_call() because it is
only here that we call exec_prepare_plan() with keep_plan = false, so
doing special handling for freeing the plan seems to make sense.

attached patch with assert.

all regress tests passed. I think this short patch can be applied on older releases as bugfix.

This weekend I'll try to check different strategy - try to save a plan and release it at the end of the transaction.

Regards

Pavel

Вложения

plpgsql-stmt_call_release_plan.patch

Re: calling procedures is slow and consumes extra much memory against calling function

От

Pavel Stehule

Дата:

12 июля 2020 г., 16:18:47

so 11. 7. 2020 v 7:38 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:

čt 9. 7. 2020 v 8:28 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
On Wed, 17 Jun 2020 at 13:54, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>
>
> st 17. 6. 2020 v 7:52 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
>>
>> On Wed, 10 Jun 2020 at 17:12, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> > st 10. 6. 2020 v 12:26 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
>> >> Could you show an example testcase that tests this recursive scenario,
>> >> with which your earlier patch fails the test, and this v2 patch passes
>> >> it ? I am trying to understand the recursive scenario and the re-use
>> >> of expr->plan.
>> >
>> >
>> > it hangs on plpgsql tests. So you can apply first version of patch
>> >
>> > and "make check"
>>
>> I could not reproduce the make check hang with the v1 patch. But I
>> could see a crash with the below testcase. So I understand the purpose
>> of the plan_owner variable that you introduced in v2.
>>
>> Consider this recursive test :
>>
>> create or replace procedure p1(in r int) as $$
>> begin
>> RAISE INFO 'r : % ', r;
>> if r < 3 then
>> call p1(r+1);
>> end if;
>> end
>> $$ language plpgsql;
>>
>> do $$
>> declare r int default 1;
>> begin
>> call p1(r);
>> end;
>> $$;
>>
>> In p1() with r=2, when the stmt "call p1(r+1)" is being executed,
>> consider this code of exec_stmt_call() with your v2 patch applied:
>> if (expr->plan && !expr->plan->saved)
>> {
>> if (plan_owner)
>> SPI_freeplan(expr->plan);
>> expr->plan = NULL;
>> }
>>
>> Here, plan_owner is false. So SPI_freeplan() will not be called, and
>> expr->plan is set to NULL. Now I have observed that the stmt pointer
>> and expr pointer is shared between the p1() execution at this r=2
>> level and the p1() execution at r=1 level. So after the above code is
>> executed at r=2, when the upper level (r=1) exec_stmt_call() lands to
>> the same above code snippet, it gets the same expr pointer, but it's
>> expr->plan is already set to NULL without being freed. From this
>> logic, it looks like the plan won't get freed whenever the expr/stmt
>> pointers are shared across recursive levels, since expr->plan is set
>> to NULL at the lowermost level ? Basically, the handle to the plan is
>> lost so no one at the upper recursion level can explicitly free it
>> using SPI_freeplan(), right ? This looks the same as the main issue
>> where the plan does not get freed for non-recursive calls. I haven't
>> got a chance to check if we can develop a testcase for this, similar
>> to your testcase where the memory keeps on increasing.
>
>
> This is a good consideration.
>
> I am sending updated patch

Checked the latest patch. Looks like using a local plan rather than
expr->plan pointer for doing the checks does seem to resolve the issue
I raised. That made me think of another scenario :

Now we are checking for plan value and then null'ifying the expr->plan
value. What if expr->plan is different from plan ? Is it possible ? I
was thinking of such scenarios. But couldn't find one. As long as a
plan is always created with saved=true for all levels, or with
saved=false for all levels, we are ok. If we can have a mix of saved
and unsaved plans at different recursion levels, then expr->plan can
be different from the outer local plan because then the expr->plan
will not be set to NULL in the inner level, while the outer level may
have created its own plan. But I think a mix of saved and unsaved
plans are not possible. If you agree, then I think we should at least
have an assert that looks like :

if (plan && !plan->saved)
{
if (plan_owner)
SPI_freeplan(plan);

/* If expr->plan is present, it must be the same plan that we
allocated */
Assert ( !expr->plan || plan == expr->plan) );

expr->plan = NULL;
}

Other than this, I have no other issues. I understand that we have to
do this special handling only for this exec_stmt_call() because it is
only here that we call exec_prepare_plan() with keep_plan = false, so
doing special handling for freeing the plan seems to make sense.

attached patch with assert.

all regress tests passed. I think this short patch can be applied on older releases as bugfix.

This weekend I'll try to check different strategy - try to save a plan and release it at the end of the transaction.

I check it, and this state of patch is good enough for this moment. Another fix needs more invasive changes to handling plan cache.

Regards

Pavel

Regards

Pavel

Re: calling procedures is slow and consumes extra much memory against calling function

От

Pavel Stehule

Дата:

16 июля 2020 г., 22:08:09

I am sending another patch that tries to allow CachedPlans for CALL statements. I think this patch is very accurate, but it is not nice, because it is smudging very precious reference counting for CachedPlans.

Current issue:

==========

I found a problem with repeated CALL statements from DO command. For every execution of a CALL statement a plan is created that is released at the time of the end of DO block.

create or replace procedure insert_into_foo(i int)
as $$
begin
insert into foo values(i, i || 'Ahoj');
if i % 1000 = 0 then raise notice '%', i;
--commit;
end if;
end;
$$ language plpgsql;

and DO

do $$
begin
for i in 1..500000
loop
call insert_into_foo(i);
end loop;
end
$$;

Requires about 2.5GB RAM (execution time is 18 sec). The problem is "long transaction" with 500M iteration of CALL statement.

If I try to remove a comment before COMMIT - then I get 500 transactions. But still it needs 2.5GB memory.

The reason for this behaviour is disabling plan cache for CALL statements executed in atomic mode.

So I wrote patch 1, that releases the not saved plan immediately. This patch is very simple, and fixes memory issues. It is a little bit faster (14 sec), and Postgres consumes about 200KB.

Patch 1 is simple, clean, nice but execution of CALL statements is slow due repeated planning.

I tried to fix this issue another way - by little bit different work with plan cache reference counting. Current design expects only statements wrapped inside transactions. It is not designed for new possibilities in CALL statements, when more transactions can be finished inside one statement. Principally - cached plans should not be reused in different transactions (simple expressions are an exception). So if we try to use cached plans for CALL statements, there is no clean responsibility who has to close a cached plan. It can be SPI (when execution stays in the same transaction), or resource owner (when transaction is finished inside execution of SPI).

The Peter wrote a comment about it

<--><--><-->/*
<--><--><--> * Don't save the plan if not in atomic context. Otherwise,
<--><--><--> * transaction ends would cause errors about plancache leaks.
<--><--><--> *

This comment is not fully accurate. If we try to save the plan, then execution (with finished transaction inside) ends by segfault. Cached plan is released on transaction end (by resource owner) and related memory context is released. But next time this structure is accessed. There is only a warning about unclosed plan cache (it maybe depends on other things).

I wrote a patch 2 that marks CALL statement related plans as "fragile". In this case the plan is cached every time. There is a special mark "fragile", that blocks immediately releasing related memory context, and it blocks warnings and errors because for this case we expect closing plan cache by resource owner or by SPI statement.

It reduces well CPU and memory overhead - execution time (in one big transaction is only 8sec) - memory overhead is +/- 0

Patch2 is not too clear, and too readable although I think so it is more correct. It better fixes SPI behaviour against new state - possibility to commit, rollback inside procedures (inside SPI call).

All regress tests passed.

Regards

Pavel

Вложения

Re: calling procedures is slow and consumes extra much memory against calling function

От

Michael Paquier

Дата:

17 сентября 2020 г., 08:36:53

On Thu, Jul 16, 2020 at 09:08:09PM +0200, Pavel Stehule wrote:
> I am sending another patch that tries to allow CachedPlans for CALL
> statements. I think this patch is very accurate, but it is not nice,
> because it is smudging very precious reference counting for CachedPlans.

Amit, you are registered as a reviewer of this patch for two months
now.  Are you planning to look at it more?  If you are not planning to
do so, that's fine, but it may be better to remove your name as
reviewer then.
--
Michael

Вложения

signature.asc

Re: calling procedures is slow and consumes extra much memory against calling function

От

Amit Khandekar

Дата:

17 сентября 2020 г., 12:05:44

On Thu, 17 Sep 2020 at 11:07, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Jul 16, 2020 at 09:08:09PM +0200, Pavel Stehule wrote:
> > I am sending another patch that tries to allow CachedPlans for CALL
> > statements. I think this patch is very accurate, but it is not nice,
> > because it is smudging very precious reference counting for CachedPlans.
>
> Amit, you are registered as a reviewer of this patch for two months
> now.  Are you planning to look at it more?  If you are not planning to
> do so, that's fine, but it may be better to remove your name as
> reviewer then.

Thanks Michael for reminding. I *had* actually planned to do some more
review. But I think I might end up not getting bandwidth for this one
during this month. So I have removed my name. But I have kept my name
as reviewer for bitmaps and correlation :
"https://commitfest.postgresql.org/29/2310/ since I do plan to do some
review on that one.

Thanks,
-Amit Khandekar
Huawei Technologies

Re: calling procedures is slow and consumes extra much memory against calling function

От

Tom Lane

Дата:

28 сентября 2020 г., 04:04:06

Pavel Stehule <pavel.stehule@gmail.com> writes:
> I am sending another patch that tries to allow CachedPlans for CALL
> statements. I think this patch is very accurate, but it is not nice,
> because it is smudging very precious reference counting for CachedPlans.

I spent some time testing this.  Although the #1 patch gets rid of
the major memory leak of cached plans, the original test case still
shows a pretty substantial leak across repeated executions of a CALL.
The reason is that the stanza for rebuilding stmt->target also gets
executed each time through, and that leaks not only the relatively
small PLpgSQL_row datum but also a bunch of catalog lookup cruft
created on the way to building the datum.  Basically this code forgot
that plpgsql's outer execution layer can't assume that it's running
in a short-lived context.

I attach a revised #1 that takes care of that problem, and also
cleans up what seems to me to be pretty sloppy thinking in both
the original code and Pavel's #1 patch: we should be restoring
the previous value of expr->plan, not cavalierly assuming that
it was necessarily NULL.  I didn't care for looking at the plan's
"saved" field to decide what was happening, either.  We really
should have a local flag variable clearly defining which behavior
it is that we're implementing.

With this patch, I see zero memory bloat on Pavel's original example,
even with a much larger repeat count.

I don't like much of anything about plpgsql-stmt_call-fix-2.patch.
It feels confused and poorly documented, possibly because "fragile"
is not a very clear term for whatever property it is you're trying to
attribute to plans.  But in any case, I think it's fixing the problem
in the wrong place.  I think the right way to fix it probably is to
manage a CALL's saved plan the same as every other plpgsql plan,
but arrange for the transient refcount on that plan to be held by a
ResourceOwner that is not a child of any transaction resowner, but
rather belongs to the procedure's execution and will be released on
the way out of the procedure.

In any case, I doubt we'd risk back-patching either the #2 patch
or any other approach to avoiding the repeat planning.  We need a
back-patchable fix that at least tamps down the memory bloat,
and this seems like it'd do.

            regards, tom lane

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index d4a3d58daa..7a2f2dfe91 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2145,40 +2145,60 @@ exec_stmt_perform(PLpgSQL_execstate *estate, PLpgSQL_stmt_perform *stmt)

 /*
  * exec_stmt_call
+ *
+ * NOTE: this is used for both CALL and DO statements.
  */
 static int
 exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 {
     PLpgSQL_expr *expr = stmt->expr;
+    SPIPlanPtr    orig_plan = expr->plan;
+    bool        local_plan;
+    PLpgSQL_variable *volatile cur_target = stmt->target;
     volatile LocalTransactionId before_lxid;
     LocalTransactionId after_lxid;
     volatile bool pushed_active_snap = false;
     volatile int rc;

+    /*
+     * If not in atomic context, we make a local plan that we'll just use for
+     * this invocation, and will free at the end.  Otherwise, transaction ends
+     * would cause errors about plancache leaks.
+     *
+     * XXX This would be fixable with some plancache/resowner surgery
+     * elsewhere, but for now we'll just work around this here.
+     */
+    local_plan = !estate->atomic;
+
     /* PG_TRY to ensure we clear the plan link, if needed, on failure */
     PG_TRY();
     {
         SPIPlanPtr    plan = expr->plan;
         ParamListInfo paramLI;

-        if (plan == NULL)
+        /*
+         * Make a plan if we don't have one, or if we need a local one.  Note
+         * that we'll overwrite expr->plan either way; the PG_TRY block will
+         * ensure we undo that on the way out, if the plan is local.
+         */
+        if (plan == NULL || local_plan)
         {
+            /* Don't let SPI save the plan if it's going to be local */
+            exec_prepare_plan(estate, expr, 0, !local_plan);
+            plan = expr->plan;

             /*
-             * Don't save the plan if not in atomic context.  Otherwise,
-             * transaction ends would cause errors about plancache leaks.
-             *
-             * XXX This would be fixable with some plancache/resowner surgery
-             * elsewhere, but for now we'll just work around this here.
+             * A CALL should never be a simple expression.  (If it could be,
+             * we'd have to worry about saving/restoring the previous values
+             * of the related expr fields, not just expr->plan.)
              */
-            exec_prepare_plan(estate, expr, 0, estate->atomic);
+            Assert(!expr->expr_simple_expr);

             /*
              * The procedure call could end transactions, which would upset
              * the snapshot management in SPI_execute*, so don't let it do it.
              * Instead, we set the snapshots ourselves below.
              */
-            plan = expr->plan;
             plan->no_snapshots = true;

             /*
@@ -2186,14 +2206,20 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
              * case the procedure's argument list has changed.
              */
             stmt->target = NULL;
+            cur_target = NULL;
         }

         /*
          * We construct a DTYPE_ROW datum representing the plpgsql variables
          * associated with the procedure's output arguments.  Then we can use
-         * exec_move_row() to do the assignments.
+         * exec_move_row() to do the assignments.  If we're using a local
+         * plan, also make a local target; otherwise, since the above code
+         * will force a new plan each time through, we'd repeatedly leak the
+         * memory for the target.  (Note: we also leak the target when a plan
+         * change is forced, but that isn't so likely to cause excessive
+         * memory leaks.)
          */
-        if (stmt->is_call && stmt->target == NULL)
+        if (stmt->is_call && cur_target == NULL)
         {
             Node       *node;
             FuncExpr   *funcexpr;
@@ -2208,6 +2234,9 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
             int            i;
             ListCell   *lc;

+            /* Use eval_mcontext for any cruft accumulated here */
+            oldcontext = MemoryContextSwitchTo(get_eval_mcontext(estate));
+
             /*
              * Get the parsed CallStmt, and look up the called procedure
              */
@@ -2239,9 +2268,11 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
             ReleaseSysCache(func_tuple);

             /*
-             * Begin constructing row Datum
+             * Begin constructing row Datum; keep it in fn_cxt if it's to be
+             * long-lived.
              */
-            oldcontext = MemoryContextSwitchTo(estate->func->fn_cxt);
+            if (!local_plan)
+                MemoryContextSwitchTo(estate->func->fn_cxt);

             row = (PLpgSQL_row *) palloc0(sizeof(PLpgSQL_row));
             row->dtype = PLPGSQL_DTYPE_ROW;
@@ -2249,7 +2280,8 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
             row->lineno = -1;
             row->varnos = (int *) palloc(sizeof(int) * list_length(funcargs));

-            MemoryContextSwitchTo(oldcontext);
+            if (!local_plan)
+                MemoryContextSwitchTo(get_eval_mcontext(estate));

             /*
              * Examine procedure's argument list.  Each output arg position
@@ -2293,7 +2325,13 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)

             row->nfields = nfields;

-            stmt->target = (PLpgSQL_variable *) row;
+            cur_target = (PLpgSQL_variable *) row;
+
+            /* We can save and re-use the target datum, if it's not local */
+            if (!local_plan)
+                stmt->target = cur_target;
+
+            MemoryContextSwitchTo(oldcontext);
         }

         paramLI = setup_param_list(estate, expr);
@@ -2316,17 +2354,27 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
     PG_CATCH();
     {
         /*
-         * If we aren't saving the plan, unset the pointer.  Note that it
-         * could have been unset already, in case of a recursive call.
+         * If we are using a local plan, restore the old plan link.
          */
-        if (expr->plan && !expr->plan->saved)
-            expr->plan = NULL;
+        if (local_plan)
+            expr->plan = orig_plan;
         PG_RE_THROW();
     }
     PG_END_TRY();

-    if (expr->plan && !expr->plan->saved)
-        expr->plan = NULL;
+    /*
+     * If we are using a local plan, restore the old plan link; then free the
+     * local plan to avoid memory leaks.  (Note that the error exit path above
+     * just clears the link without risking calling SPI_freeplan; we expect
+     * that xact cleanup will take care of the mess in that case.)
+     */
+    if (local_plan)
+    {
+        SPIPlanPtr    plan = expr->plan;
+
+        expr->plan = orig_plan;
+        SPI_freeplan(plan);
+    }

     if (rc < 0)
         elog(ERROR, "SPI_execute_plan_with_paramlist failed executing query \"%s\": %s",
@@ -2363,10 +2411,10 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
     {
         SPITupleTable *tuptab = SPI_tuptable;

-        if (!stmt->target)
+        if (!cur_target)
             elog(ERROR, "DO statement returned a row");

-        exec_move_row(estate, stmt->target, tuptab->vals[0], tuptab->tupdesc);
+        exec_move_row(estate, cur_target, tuptab->vals[0], tuptab->tupdesc);
     }
     else if (SPI_processed > 1)
         elog(ERROR, "procedure call returned more than one row");

Re: calling procedures is slow and consumes extra much memory against calling function

От

Pavel Stehule

Дата:

28 сентября 2020 г., 12:14:39

po 28. 9. 2020 v 3:04 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Pavel Stehule <pavel.stehule@gmail.com> writes:
> I am sending another patch that tries to allow CachedPlans for CALL
> statements. I think this patch is very accurate, but it is not nice,
> because it is smudging very precious reference counting for CachedPlans.

I spent some time testing this. Although the #1 patch gets rid of
the major memory leak of cached plans, the original test case still
shows a pretty substantial leak across repeated executions of a CALL.
The reason is that the stanza for rebuilding stmt->target also gets
executed each time through, and that leaks not only the relatively
small PLpgSQL_row datum but also a bunch of catalog lookup cruft
created on the way to building the datum. Basically this code forgot
that plpgsql's outer execution layer can't assume that it's running
in a short-lived context.

I attach a revised #1 that takes care of that problem, and also
cleans up what seems to me to be pretty sloppy thinking in both
the original code and Pavel's #1 patch: we should be restoring
the previous value of expr->plan, not cavalierly assuming that
it was necessarily NULL. I didn't care for looking at the plan's
"saved" field to decide what was happening, either. We really
should have a local flag variable clearly defining which behavior
it is that we're implementing.

With this patch, I see zero memory bloat on Pavel's original example,
even with a much larger repeat count.

I don't like much of anything about plpgsql-stmt_call-fix-2.patch.
It feels confused and poorly documented, possibly because "fragile"
is not a very clear term for whatever property it is you're trying to
attribute to plans. But in any case, I think it's fixing the problem
in the wrong place. I think the right way to fix it probably is to
manage a CALL's saved plan the same as every other plpgsql plan,
but arrange for the transient refcount on that plan to be held by a
ResourceOwner that is not a child of any transaction resowner, but
rather belongs to the procedure's execution and will be released on
the way out of the procedure.

In any case, I doubt we'd risk back-patching either the #2 patch
or any other approach to avoiding the repeat planning. We need a
back-patchable fix that at least tamps down the memory bloat,
and this seems like it'd do.

I agree with these conclusions. I'll try to look if I can do #2 patch better for pg14. Probably it can fix more issues related to CALL statement, and I agree so this should not be backapatched.

It can be great to use CALL without memory leaks (and it can be better (in future) if the performance of CALL statements should be good).

Thank you for enhancing and fixing this patch

Regards

Pavel

regards, tom lane

Re: calling procedures is slow and consumes extra much memory against calling function

От

Tom Lane

Дата:

29 сентября 2020 г., 18:20:35

Pavel Stehule <pavel.stehule@gmail.com> writes:
> I agree with these conclusions.  I'll try to look if I can do #2 patch
> better for pg14. Probably it can fix more issues related to CALL statement,
> and I agree so this should not be backapatched.

I've pushed this and marked the CF entry committed.  Please start a
new thread and new CF entry whenever you have a more ambitious patch.

            regards, tom lane

Re: calling procedures is slow and consumes extra much memory against calling function

От

Pavel Stehule

Дата:

29 сентября 2020 г., 19:39:02

út 29. 9. 2020 v 17:20 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Pavel Stehule <pavel.stehule@gmail.com> writes:
> I agree with these conclusions. I'll try to look if I can do #2 patch
> better for pg14. Probably it can fix more issues related to CALL statement,
> and I agree so this should not be backapatched.

I've pushed this and marked the CF entry committed. Please start a
new thread and new CF entry whenever you have a more ambitious patch.

Thank you

Pavel

regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: calling procedures is slow and consumes extra much memory againstcalling function

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения