Обсуждение: [PERFORM] Bulk persistence strategy

Поиск
Список
Период
Сортировка

[PERFORM] Bulk persistence strategy

От
Riaan Stander
Дата:

Good day

I’ve got a performance and best practice question. We’re busy writing our persistence framework for our application port to PostgreSQL.
We have business entities that are split over multiple tables and in an effort to not have multiple trips to the database we’re trying to batch these together. Some of the extensions uses serials, necessitating output from the one query to be used in the other. It is at this point that we’re running into the PostgreSQL limitation of only declaring variables in functions.

I've come up with generating functions on the go, but I'm concerned about the performance impact of this. I first wanted to use an anonoumys code block, but then I cannot do parameter binding from npgsql.

Example:
create table table1
(
    id bigserial,
    value1 text
);

create table table2
(
    id bigserial,
    value2 text
);

create table table3
(
    id bigserial,
    value3 text,
    table1_id1 bigint,
    table1_id2 bigint,
    table2_id bigint
);

I then generate this on the fly to insert a new entity

CREATE OR REPLACE FUNCTION tmp_641f51c9_d188_4386_93f3_c40001b191e7(table1_value1_0 Text, table1_value1_1 Text, table2_value2_0 Text, table3_value3_0 Text)
RETURNS BIGINT AS $$
DECLARE
    _table1_id1 bigint;
    _table1_id2 bigint;
    _table2_id bigint;
    _id bigint;
    table1_value1_0 ALIAS FOR $1;
    table2_value2_0 ALIAS FOR $2;
    table3_value3_0 ALIAS FOR $3;
BEGIN

    INSERT INTO public.table1 (value1) VALUES (table1_value1_0)
    RETURNING id INTO _table1_id1;

    INSERT INTO public.table1 (value1) VALUES (table1_value1_1)
    RETURNING id INTO _table1_id2;

    INSERT INTO public.table2 (value2) VALUES (table2_value2_0)
    RETURNING id INTO _table2_id;

    INSERT INTO public.table3 (value3, table1_id1, table1_id2, table2_id) VALUES (table3_value3_0, _table1_id1, _table1_id2, _table2_id)
    RETURNING id INTO _id;

    RETURN _id;
END;
$$ LANGUAGE plpgsql;

SELECT tmp_641f51c9_d188_4386_93f3_c40001b191e7(@table1_value1_0, @table1_value1_1, @table2_value2_0, @table3_value3_0);

DROP FUNCTION IF EXISTS tmp_641f51c9_d188_4386_93f3_c40001b191e7(Text,Text,Text,Text);

Is there a better way I'm missing and is "temp" function creation in Postgres a big performance concern, especially if a server is under load?

Regards
Riaan Stander

Re: [PERFORM] Bulk persistence strategy

От
Tom Lane
Дата:
Riaan Stander <rstander@exa.co.za> writes:
> I've come up with generating functions on the go, but I'm concerned about
> the performance impact of this. I first wanted to use an anonoumys code
> block, but then I cannot do parameter binding from npgsql.
> ...
> Is there a better way I'm missing and is "temp" function creation in
> Postgres a big performance concern, especially if a server is under load?

The function itself is only one pg_proc row, but if you're expecting
to do this thousands of times a minute you might have to adjust autovacuum
settings to avoid bad bloat in pg_proc.

If you're intending that these functions be use-once, it's fairly unclear
to me why you bother, as opposed to just issuing the underlying SQL
statements.

            regards, tom lane


Re: [PERFORM] Bulk persistence strategy

От
Riaan Stander
Дата:

Riaan Stander <rstander@exa.co.za> writes:
I've come up with generating functions on the go, but I'm concerned about 
the performance impact of this. I first wanted to use an anonoumys code 
block, but then I cannot do parameter binding from npgsql.
...
Is there a better way I'm missing and is "temp" function creation in 
Postgres a big performance concern, especially if a server is under load?
The function itself is only one pg_proc row, but if you're expecting
to do this thousands of times a minute you might have to adjust autovacuum
settings to avoid bad bloat in pg_proc.

If you're intending that these functions be use-once, it's fairly unclear
to me why you bother, as opposed to just issuing the underlying SQL
statements.
		regards, tom lane

The intended use is use-once. The reason is that the statements might differ per call, especially when we start doing updates. The ideal would be to just issue the sql statements, but I was trying to cut down on network calls. To batch them together and get output from one query as input for the others (declare variables), I have to wrap them in a function in Postgres. Or am I missing something? In SQL Server TSQL I could declare variables in any statement as required.


Re: [PERFORM] Bulk persistence strategy

От
Tom Lane
Дата:
Riaan Stander <rstander@exa.co.za> writes:
> The intended use is use-once. The reason is that the statements might
> differ per call, especially when we start doing updates. The ideal would
> be to just issue the sql statements, but I was trying to cut down on
> network calls. To batch them together and get output from one query as
> input for the others (declare variables), I have to wrap them in a
> function in Postgres. Or am I missing something? In SQL Server TSQL I
> could declare variables in any statement as required.

Hm, well, feeding data forward to the next query without a network
round trip is a valid concern.

How stylized are these commands?  Have you considered pushing the
generation logic into the function, so that you just have one (or
a few) persistent functions, and the variability slack is taken
up through EXECUTE'd strings?  That'd likely be significantly
more efficient than one-use functions.  Even disregarding the
pg_proc update traffic, plpgsql isn't going to shine in that usage
because it's optimized for repeated execution of functions.

            regards, tom lane


Re: [PERFORM] Bulk persistence strategy

От
Riaan Stander
Дата:
> Riaan Stander <rstander@exa.co.za> writes:
>> The intended use is use-once. The reason is that the statements might
>> differ per call, especially when we start doing updates. The ideal would
>> be to just issue the sql statements, but I was trying to cut down on
>> network calls. To batch them together and get output from one query as
>> input for the others (declare variables), I have to wrap them in a
>> function in Postgres. Or am I missing something? In SQL Server TSQL I
>> could declare variables in any statement as required.
> Hm, well, feeding data forward to the next query without a network
> round trip is a valid concern.
>
> How stylized are these commands?  Have you considered pushing the
> generation logic into the function, so that you just have one (or
> a few) persistent functions, and the variability slack is taken
> up through EXECUTE'd strings?  That'd likely be significantly
> more efficient than one-use functions.  Even disregarding the
> pg_proc update traffic, plpgsql isn't going to shine in that usage
> because it's optimized for repeated execution of functions.
>
>             regards, tom lane
The commands are generated from a complex object/type in the
application. Some of them can be quite large. With modifications they do
state tracking too, so that we only update fields that actually changed
and can do optimistic concurrency checking.

It'll probably make more sense to try create a function per type of
object that deals with the query generation. That way I can create a
Postgres type that maps from the application object.

Thanks for the advice. I'll give that a shot.

Regards
Riaan Stander




Re: [PERFORM] Bulk persistence strategy

От
Simon Riggs
Дата:
On 22 May 2017 at 03:14, Riaan Stander <rstander@exa.co.za> wrote:
>
>> Riaan Stander <rstander@exa.co.za> writes:
>>>
>>> The intended use is use-once. The reason is that the statements might
>>> differ per call, especially when we start doing updates. The ideal would
>>> be to just issue the sql statements, but I was trying to cut down on
>>> network calls. To batch them together and get output from one query as
>>> input for the others (declare variables), I have to wrap them in a
>>> function in Postgres. Or am I missing something? In SQL Server TSQL I
>>> could declare variables in any statement as required.
>>
>> Hm, well, feeding data forward to the next query without a network
>> round trip is a valid concern.
>>
>> How stylized are these commands?  Have you considered pushing the
>> generation logic into the function, so that you just have one (or
>> a few) persistent functions, and the variability slack is taken
>> up through EXECUTE'd strings?  That'd likely be significantly
>> more efficient than one-use functions.  Even disregarding the
>> pg_proc update traffic, plpgsql isn't going to shine in that usage
>> because it's optimized for repeated execution of functions.
>>
>>                         regards, tom lane
>
> The commands are generated from a complex object/type in the application.
> Some of them can be quite large. With modifications they do state tracking
> too, so that we only update fields that actually changed and can do
> optimistic concurrency checking.
>
> It'll probably make more sense to try create a function per type of object
> that deals with the query generation. That way I can create a Postgres type
> that maps from the application object.
>
> Thanks for the advice. I'll give that a shot.

It sounds like you don't know about anonymous code blocks with DO
https://www.postgresql.org/docs/devel/static/sql-do.html

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [PERFORM] Bulk persistence strategy

От
Riaan Stander
Дата:
> On 22 May 2017 at 03:14, Riaan Stander <rstander@exa.co.za> wrote:
>>> Riaan Stander <rstander@exa.co.za> writes:
>>>> The intended use is use-once. The reason is that the statements might
>>>> differ per call, especially when we start doing updates. The ideal would
>>>> be to just issue the sql statements, but I was trying to cut down on
>>>> network calls. To batch them together and get output from one query as
>>>> input for the others (declare variables), I have to wrap them in a
>>>> function in Postgres. Or am I missing something? In SQL Server TSQL I
>>>> could declare variables in any statement as required.
>>> Hm, well, feeding data forward to the next query without a network
>>> round trip is a valid concern.
>>>
>>> How stylized are these commands?  Have you considered pushing the
>>> generation logic into the function, so that you just have one (or
>>> a few) persistent functions, and the variability slack is taken
>>> up through EXECUTE'd strings?  That'd likely be significantly
>>> more efficient than one-use functions.  Even disregarding the
>>> pg_proc update traffic, plpgsql isn't going to shine in that usage
>>> because it's optimized for repeated execution of functions.
>>>
>>>                          regards, tom lane
>> The commands are generated from a complex object/type in the application.
>> Some of them can be quite large. With modifications they do state tracking
>> too, so that we only update fields that actually changed and can do
>> optimistic concurrency checking.
>>
>> It'll probably make more sense to try create a function per type of object
>> that deals with the query generation. That way I can create a Postgres type
>> that maps from the application object.
>>
>> Thanks for the advice. I'll give that a shot.
> It sounds like you don't know about anonymous code blocks with DO
> https://www.postgresql.org/docs/devel/static/sql-do.html
>

Yes I do know about that feature. My first implemented generated an
anonymous code block, but to my utter dismay once I tried actually doing
parameter binding from the application it did not work. This seems to be
a Postgres limitation actually stated in the documentation. The
anonymous code block is treated as a function body with no parameters.

Thanks for the suggestion though.

Regards
Riaan Stander



Re: [PERFORM] Bulk persistence strategy

От
Simon Riggs
Дата:
On 22 May 2017 at 09:06, Riaan Stander <rstander@exa.co.za> wrote:

>> It sounds like you don't know about anonymous code blocks with DO
>> https://www.postgresql.org/docs/devel/static/sql-do.html
>>
>
> Yes I do know about that feature. My first implemented generated an
> anonymous code block, but to my utter dismay once I tried actually doing
> parameter binding from the application it did not work. This seems to be a
> Postgres limitation actually stated in the documentation. The anonymous code
> block is treated as a function body with no parameters.
>
> Thanks for the suggestion though.

Perhaps we should look into parameterisable DO statements.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [PERFORM] Bulk persistence strategy

От
Riaan Stander
Дата:
> On 22 May 2017 at 09:06, Riaan Stander <rstander@exa.co.za> wrote:
>
>>> It sounds like you don't know about anonymous code blocks with DO
>>> https://www.postgresql.org/docs/devel/static/sql-do.html
>>>
>> Yes I do know about that feature. My first implemented generated an
>> anonymous code block, but to my utter dismay once I tried actually doing
>> parameter binding from the application it did not work. This seems to be a
>> Postgres limitation actually stated in the documentation. The anonymous code
>> block is treated as a function body with no parameters.
>>
>> Thanks for the suggestion though.
> Perhaps we should look into parameterisable DO statements.
>
Now that I would second!!




Re: [PERFORM] Bulk persistence strategy

От
Tom Lane
Дата:
Simon Riggs <simon@2ndquadrant.com> writes:
> It sounds like you don't know about anonymous code blocks with DO
> https://www.postgresql.org/docs/devel/static/sql-do.html

No, the problem was that there are also some parameters to be passed
in from the application, and DO doesn't take any parameters; so that
would require inserting them manually into the DO text, with all the
attendant hazards of getting-it-wrong.

We've speculated before about letting DO grow some parameter handling,
but it's not gotten to the top of anyone's to-do list.

            regards, tom lane


Re: [PERFORM] Bulk persistence strategy

От
Merlin Moncure
Дата:
On Sun, May 21, 2017 at 5:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Riaan Stander <rstander@exa.co.za> writes:
>> The intended use is use-once. The reason is that the statements might
>> differ per call, especially when we start doing updates. The ideal would
>> be to just issue the sql statements, but I was trying to cut down on
>> network calls.
>
> Hm, well, feeding data forward to the next query without a network
> round trip is a valid concern.
>
> How stylized are these commands?  Have you considered pushing the
> generation logic into the function, so that you just have one (or
> a few) persistent functions, and the variability slack is taken
> up through EXECUTE'd strings?

+1.  If 'DO' could return a value and take arguments, we'd probably
just use that.  With the status quo however the SQL generation
facilities need to be moved into the database as dynamic SQL (that is,
executed with EXECUTE).  This will provide the speed benefits while
maintaining (albeit with some rethinking) your abstraction model
Please make liberal use of quote_ident() and quote_literal() to
minimize security risks.

merlin