Обсуждение: [PERFORM] Bulk persistence strategy
Good day
I’ve got a performance and best practice question. We’re busy writing our persistence framework for our application port to PostgreSQL.
We have business entities that are split over multiple tables and in an effort to not have multiple trips to the database we’re trying to batch these together. Some of the extensions uses serials, necessitating output from the one query to be used in the other. It is at this point that we’re running into the PostgreSQL limitation of only declaring variables in functions.
I've come up with generating functions on the go, but I'm concerned about the performance impact of this. I first wanted to use an anonoumys code block, but then I cannot do parameter binding from npgsql.
Example:
create table table1
(
id bigserial,
value1 text
);
create table table2
(
id bigserial,
value2 text
);
create table table3
(
id bigserial,
value3 text,
table1_id1 bigint,
table1_id2 bigint,
table2_id bigint
);
I then generate this on the fly to insert a new entity
CREATE OR REPLACE FUNCTION tmp_641f51c9_d188_4386_93f3_c40001b191e7(table1_value1_0 Text, table1_value1_1 Text, table2_value2_0 Text, table3_value3_0 Text)
RETURNS BIGINT AS $$
DECLARE
_table1_id1 bigint;
_table1_id2 bigint;
_table2_id bigint;
_id bigint;
table1_value1_0 ALIAS FOR $1;
table2_value2_0 ALIAS FOR $2;
table3_value3_0 ALIAS FOR $3;
BEGIN
INSERT INTO public.table1 (value1) VALUES (table1_value1_0)
RETURNING id INTO _table1_id1;
INSERT INTO public.table1 (value1) VALUES (table1_value1_1)
RETURNING id INTO _table1_id2;
INSERT INTO public.table2 (value2) VALUES (table2_value2_0)
RETURNING id INTO _table2_id;
INSERT INTO public.table3 (value3, table1_id1, table1_id2, table2_id) VALUES (table3_value3_0, _table1_id1, _table1_id2, _table2_id)
RETURNING id INTO _id;
RETURN _id;
END;
$$ LANGUAGE plpgsql;
SELECT tmp_641f51c9_d188_4386_93f3_c40001b191e7(@table1_value1_0, @table1_value1_1, @table2_value2_0, @table3_value3_0);
DROP FUNCTION IF EXISTS tmp_641f51c9_d188_4386_93f3_c40001b191e7(Text,Text,Text,Text);
Is there a better way I'm missing and is "temp" function creation in Postgres a big performance concern, especially if a server is under load?
Regards
Riaan Stander
Riaan Stander <rstander@exa.co.za> writes: > I've come up with generating functions on the go, but I'm concerned about > the performance impact of this. I first wanted to use an anonoumys code > block, but then I cannot do parameter binding from npgsql. > ... > Is there a better way I'm missing and is "temp" function creation in > Postgres a big performance concern, especially if a server is under load? The function itself is only one pg_proc row, but if you're expecting to do this thousands of times a minute you might have to adjust autovacuum settings to avoid bad bloat in pg_proc. If you're intending that these functions be use-once, it's fairly unclear to me why you bother, as opposed to just issuing the underlying SQL statements. regards, tom lane
Riaan Stander <rstander@exa.co.za> writes:I've come up with generating functions on the go, but I'm concerned about the performance impact of this. I first wanted to use an anonoumys code block, but then I cannot do parameter binding from npgsql. ... Is there a better way I'm missing and is "temp" function creation in Postgres a big performance concern, especially if a server is under load?The function itself is only one pg_proc row, but if you're expecting to do this thousands of times a minute you might have to adjust autovacuum settings to avoid bad bloat in pg_proc. If you're intending that these functions be use-once, it's fairly unclear to me why you bother, as opposed to just issuing the underlying SQL statements. regards, tom lane
The intended use is use-once. The reason is that the statements might differ per call, especially when we start doing updates. The ideal would be to just issue the sql statements, but I was trying to cut down on network calls. To batch them together and get output from one query as input for the others (declare variables), I have to wrap them in a function in Postgres. Or am I missing something? In SQL Server TSQL I could declare variables in any statement as required.
Riaan Stander <rstander@exa.co.za> writes: > The intended use is use-once. The reason is that the statements might > differ per call, especially when we start doing updates. The ideal would > be to just issue the sql statements, but I was trying to cut down on > network calls. To batch them together and get output from one query as > input for the others (declare variables), I have to wrap them in a > function in Postgres. Or am I missing something? In SQL Server TSQL I > could declare variables in any statement as required. Hm, well, feeding data forward to the next query without a network round trip is a valid concern. How stylized are these commands? Have you considered pushing the generation logic into the function, so that you just have one (or a few) persistent functions, and the variability slack is taken up through EXECUTE'd strings? That'd likely be significantly more efficient than one-use functions. Even disregarding the pg_proc update traffic, plpgsql isn't going to shine in that usage because it's optimized for repeated execution of functions. regards, tom lane
> Riaan Stander <rstander@exa.co.za> writes: >> The intended use is use-once. The reason is that the statements might >> differ per call, especially when we start doing updates. The ideal would >> be to just issue the sql statements, but I was trying to cut down on >> network calls. To batch them together and get output from one query as >> input for the others (declare variables), I have to wrap them in a >> function in Postgres. Or am I missing something? In SQL Server TSQL I >> could declare variables in any statement as required. > Hm, well, feeding data forward to the next query without a network > round trip is a valid concern. > > How stylized are these commands? Have you considered pushing the > generation logic into the function, so that you just have one (or > a few) persistent functions, and the variability slack is taken > up through EXECUTE'd strings? That'd likely be significantly > more efficient than one-use functions. Even disregarding the > pg_proc update traffic, plpgsql isn't going to shine in that usage > because it's optimized for repeated execution of functions. > > regards, tom lane The commands are generated from a complex object/type in the application. Some of them can be quite large. With modifications they do state tracking too, so that we only update fields that actually changed and can do optimistic concurrency checking. It'll probably make more sense to try create a function per type of object that deals with the query generation. That way I can create a Postgres type that maps from the application object. Thanks for the advice. I'll give that a shot. Regards Riaan Stander
On 22 May 2017 at 03:14, Riaan Stander <rstander@exa.co.za> wrote: > >> Riaan Stander <rstander@exa.co.za> writes: >>> >>> The intended use is use-once. The reason is that the statements might >>> differ per call, especially when we start doing updates. The ideal would >>> be to just issue the sql statements, but I was trying to cut down on >>> network calls. To batch them together and get output from one query as >>> input for the others (declare variables), I have to wrap them in a >>> function in Postgres. Or am I missing something? In SQL Server TSQL I >>> could declare variables in any statement as required. >> >> Hm, well, feeding data forward to the next query without a network >> round trip is a valid concern. >> >> How stylized are these commands? Have you considered pushing the >> generation logic into the function, so that you just have one (or >> a few) persistent functions, and the variability slack is taken >> up through EXECUTE'd strings? That'd likely be significantly >> more efficient than one-use functions. Even disregarding the >> pg_proc update traffic, plpgsql isn't going to shine in that usage >> because it's optimized for repeated execution of functions. >> >> regards, tom lane > > The commands are generated from a complex object/type in the application. > Some of them can be quite large. With modifications they do state tracking > too, so that we only update fields that actually changed and can do > optimistic concurrency checking. > > It'll probably make more sense to try create a function per type of object > that deals with the query generation. That way I can create a Postgres type > that maps from the application object. > > Thanks for the advice. I'll give that a shot. It sounds like you don't know about anonymous code blocks with DO https://www.postgresql.org/docs/devel/static/sql-do.html -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On 22 May 2017 at 03:14, Riaan Stander <rstander@exa.co.za> wrote: >>> Riaan Stander <rstander@exa.co.za> writes: >>>> The intended use is use-once. The reason is that the statements might >>>> differ per call, especially when we start doing updates. The ideal would >>>> be to just issue the sql statements, but I was trying to cut down on >>>> network calls. To batch them together and get output from one query as >>>> input for the others (declare variables), I have to wrap them in a >>>> function in Postgres. Or am I missing something? In SQL Server TSQL I >>>> could declare variables in any statement as required. >>> Hm, well, feeding data forward to the next query without a network >>> round trip is a valid concern. >>> >>> How stylized are these commands? Have you considered pushing the >>> generation logic into the function, so that you just have one (or >>> a few) persistent functions, and the variability slack is taken >>> up through EXECUTE'd strings? That'd likely be significantly >>> more efficient than one-use functions. Even disregarding the >>> pg_proc update traffic, plpgsql isn't going to shine in that usage >>> because it's optimized for repeated execution of functions. >>> >>> regards, tom lane >> The commands are generated from a complex object/type in the application. >> Some of them can be quite large. With modifications they do state tracking >> too, so that we only update fields that actually changed and can do >> optimistic concurrency checking. >> >> It'll probably make more sense to try create a function per type of object >> that deals with the query generation. That way I can create a Postgres type >> that maps from the application object. >> >> Thanks for the advice. I'll give that a shot. > It sounds like you don't know about anonymous code blocks with DO > https://www.postgresql.org/docs/devel/static/sql-do.html > Yes I do know about that feature. My first implemented generated an anonymous code block, but to my utter dismay once I tried actually doing parameter binding from the application it did not work. This seems to be a Postgres limitation actually stated in the documentation. The anonymous code block is treated as a function body with no parameters. Thanks for the suggestion though. Regards Riaan Stander
On 22 May 2017 at 09:06, Riaan Stander <rstander@exa.co.za> wrote: >> It sounds like you don't know about anonymous code blocks with DO >> https://www.postgresql.org/docs/devel/static/sql-do.html >> > > Yes I do know about that feature. My first implemented generated an > anonymous code block, but to my utter dismay once I tried actually doing > parameter binding from the application it did not work. This seems to be a > Postgres limitation actually stated in the documentation. The anonymous code > block is treated as a function body with no parameters. > > Thanks for the suggestion though. Perhaps we should look into parameterisable DO statements. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On 22 May 2017 at 09:06, Riaan Stander <rstander@exa.co.za> wrote: > >>> It sounds like you don't know about anonymous code blocks with DO >>> https://www.postgresql.org/docs/devel/static/sql-do.html >>> >> Yes I do know about that feature. My first implemented generated an >> anonymous code block, but to my utter dismay once I tried actually doing >> parameter binding from the application it did not work. This seems to be a >> Postgres limitation actually stated in the documentation. The anonymous code >> block is treated as a function body with no parameters. >> >> Thanks for the suggestion though. > Perhaps we should look into parameterisable DO statements. > Now that I would second!!
Simon Riggs <simon@2ndquadrant.com> writes: > It sounds like you don't know about anonymous code blocks with DO > https://www.postgresql.org/docs/devel/static/sql-do.html No, the problem was that there are also some parameters to be passed in from the application, and DO doesn't take any parameters; so that would require inserting them manually into the DO text, with all the attendant hazards of getting-it-wrong. We've speculated before about letting DO grow some parameter handling, but it's not gotten to the top of anyone's to-do list. regards, tom lane
On Sun, May 21, 2017 at 5:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Riaan Stander <rstander@exa.co.za> writes: >> The intended use is use-once. The reason is that the statements might >> differ per call, especially when we start doing updates. The ideal would >> be to just issue the sql statements, but I was trying to cut down on >> network calls. > > Hm, well, feeding data forward to the next query without a network > round trip is a valid concern. > > How stylized are these commands? Have you considered pushing the > generation logic into the function, so that you just have one (or > a few) persistent functions, and the variability slack is taken > up through EXECUTE'd strings? +1. If 'DO' could return a value and take arguments, we'd probably just use that. With the status quo however the SQL generation facilities need to be moved into the database as dynamic SQL (that is, executed with EXECUTE). This will provide the speed benefits while maintaining (albeit with some rethinking) your abstraction model Please make liberal use of quote_ident() and quote_literal() to minimize security risks. merlin