Обсуждение: Re: JDBC prepared statement: a 32767 limit of arguments number

Поиск
Список
Период
Сортировка

Re: JDBC prepared statement: a 32767 limit of arguments number

От
"David G. Johnston"
Дата:
On Wednesday, March 9, 2022, Vladislav Malyshkin <mal@gromco.com> wrote:
Currently postgres JDBC driver has a  32767 limit for the number of prepared statement arguments, see e.g.
https://luppeng.wordpress.com/2020/05/20/postgresql-jdbc-driver-upper-limit-on-parameters-in-preparedstatement/
Can this limit be lifted. There is no any such a limit in JDBC spec.

There is little motivation to try since if you approach that limit you should probably write your query differently anyway - like using a temp table and a join instead of an IN operator.

David H.

Re: JDBC prepared statement: a 32767 limit of arguments number

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Wednesday, March 9, 2022, Vladislav Malyshkin <mal@gromco.com> wrote:
>> Currently postgres JDBC driver has a  32767 limit for the number of
>> prepared statement arguments, see e.g.
>> https://luppeng.wordpress.com/2020/05/20/postgresql-jdbc-
>> driver-upper-limit-on-parameters-in-preparedstatement/
>> Can this limit be lifted. There is no any such a limit in JDBC spec.

> There is little motivation to try since if you approach that limit you
> should probably write your query differently anyway - like using a temp
> table and a join instead of an IN operator.

This is a PG wire protocol limitation: the number-of-parameters
field in Bind messages is int16 [1].  So there's little that the
JDBC driver could do differently.

I concur with David's opinion that if you think you need more
parameters, you're doing it wrong.  One idea to consider is
aggregating similar values into an array parameter.

            regards, tom lane

[1] https://www.postgresql.org/docs/current/protocol-message-formats.html



Re: JDBC prepared statement: a 32767 limit of arguments number

От
"David G. Johnston"
Дата:
On Wed, Mar 9, 2022 at 12:38 PM Vladislav Malyshkin <mal@gromco.com> wrote:
On 09/03/2022 10.03, Tom Lane wrote:

I concur with David's opinion that if you think you need more
parameters, you're doing it wrong.  One idea to consider is
aggregating similar values into an array parameter.

I disagree:

Ok.  If PostgreSQL ever does end up releasing a new protocol version I would say it is at least in the realm of possibility that this limit would be increased.  But releasing a new protocol version is hard to get consensus to do given the incompatibilities doing so creates.  The curse of a mature and widely used application.  It isn't like this limitation has gone unnoticed until now; yet still we haven't released a new protocol version in many years nor presently have concrete plans to do so.

1. It is extremely convenient to insert multiple values in a single SQL insert:

INSERT INTO table_name (f1,f2,f3) VALUES (1,"text",2.4), (2,"text2",2.5),... Setting all values as JDBC parameters is the easiest way to integrate java/scala and SQL.
A single insert of 10000 records is several orders of magnitudes faster than 10000 separate inserts, not to mention transaction simplification.


PostgreSQL would expect that you would use the COPY API in this kind of situation.


2. For automatic scala<->jdbc integration tools such as https://github.com/mal19992/sqlps a number of  JDBC arguments can be generated by an automatic transformation, the SQL and JDBC arguments are autogenerated and can be a very large number.
Suggested by David approach "like using a temp table and a join instead of an IN operator." is extremely inconvenient for automatic tools.


This is too vague to comment upon or be convinced by.

David J.


Re: JDBC prepared statement: a 32767 limit of arguments number

От
"David G. Johnston"
Дата:
On Wed, Mar 9, 2022 at 1:09 PM Vladislav Malyshkin <mal@gromco.com> wrote:
On 09/03/2022 14.55, David G. Johnston wrote:
>PostgreSQL would expect that you would use the COPY API in this kind of situation.

Yes, one can use copyIn  method but data transformation to InputStream loses data type information of the programming language. The beauty of PreparedStatement is that the methods like setLong, setString preserve programming language data types

I don't understand why this matters.  As soon as the data hits the database table the programming language data types no longer matter as the data types of the columns become official.  You do have to deal with converting programming data types to their textual representations for the COPY command but the driver should be capable of helping with that - especially since I'm pretty sure it already has to do that anyway for those various setXXX methods.  The gain in server parsing and bind savings alone should make the client-side effort worthwhile.

David J.

Re: JDBC prepared statement: a 32767 limit of arguments number

От
Vladimir Sitnikov
Дата:
Vladislav,

Have you tried PreparedStatement#addBatch + #executeBatch APIs and reWriteBatchedInserts=true connection option?
It might yield the expected speedup without resorting to PG-specific APIs and syntax like COPY.

David>You do have to deal with converting programming data types to their textual representations for the COPY command but the driver should be capable of helping with that

That's right, and the driver might add language-specific APIs for converting data for COPY-FROM.
However,

1) JDBC specification has nothing for COPY. In other words, the code that uses
COPY-specific APIs is hard to port to other databases.
Of course, pull requests for adding Java API on top of COPY are welcome,
however, nobody cared to add the wrapper APIs yet.

2) COPY FROM is either binary or text. It does not allow "fallback to text format".
We might need to revise the decisions, however, I know pgjdbc always sends date/timestamp-like
values as text even though it knows the way to encode and decode them.
Structs and arrays are not supported in binary coding yet :'(

I think COPY syntax could be relaxed here to allow per-field binary/text format configuration.

3) COPY FROM is "insert-only", and it can't handle conflicts (COPY FROM does not support insert on conflict / merge)

4) There's a connection option reWriteBatchedInserts=true so pgjdbc could rewrite
batch calls for "insert into..." into a multi-values insert:

insert into table(c1, c2, c3, ...) values (....) 
=>
insert into table(c1, c2, c3, ...) values (....), (...), (...)

It is a pity that the driver has to parse SQL and detect if it looks like "insert into ..".
On the other hand, the optimization is compatible with the standard SQL syntax,
and it does not require application changes except for not relying on the
"number of affected rows for each bound row".

Typically executeBatch() returns the exact row count for each batched row,
and the driver loses that information if it flips to "rewrite into multivalues" mode.
So can't activate batch rewrite by default.

----

It might be interesting to check if the driver could rewrite simple insert statements
into COPY FROM STDIN, however, that "autoconversion from INSERT to COPY"
would have to be disabled as soon as any non-trivial INSERT feature appears.

For instance, the driver can't convert literal values and expressions in "values" into COPY:
INSERT INTO testbatch(id, value, comment) VALUES (?, ?, 'no comment');
INSERT INTO testbatch(id, value) VALUES (1+2, ?);

Vladimir

Re: JDBC prepared statement: a 32767 limit of arguments number

От
Andrew Dunstan
Дата:
On 3/9/22 14:38, Vladislav Malyshkin wrote:
> On 09/03/2022 10.03, Tom Lane wrote:
>>
>> I concur with David's opinion that if you think you need more
>> parameters, you're doing it wrong.  One idea to consider is
>> aggregating similar values into an array parameter.
> I disagree:
>
> 1. It is extremely convenient to insert multiple values in a single
> SQL insert:
>
> *INSERT INTO table_name (f1,f2,f3) VALUES (1,"text",2.4),
> (2,"text2",2.5),...*
>
> Setting all values as JDBC parameters is the easiest way to integrate
> java/scala and SQL.
> A single insert of 10000 records is several orders of magnitudes
> *faster* than 10000 separate inserts, not to mention transaction
> simplification.


Those aren't your only alternatives, of course. Even for a fairly wide
table of say 100 columns you could insert a batch of 327 sets of values
in a single statement. Experiments I did some years ago on multi-valued
inserts suggested that the benefit didn't scale linearly (no, I don't
have a reference, I'm relying on memory). Your example above could of
course accommodate the 10,000 sets of values you refer to.


>
> 2. For automatic scala<->jdbc integration tools such as
> https://github.com/mal19992/sqlps a number of  JDBC arguments can be
> generated by an automatic transformation, the SQL and JDBC arguments
> are autogenerated and can be a very large number.
> Suggested by David approach "like using a temp table and a join
> instead of an IN operator." is extremely inconvenient for automatic tools.


Above you were concerned about performance, but here you want to avoid a
more performant usage pattern for the sake of convenience. Some years
ago I managed to speed up a client's app by about an order of magnitude
by replacing an IN clause with 3000 values with a temp table join, so
regardless of the limit on the number of placeholders this is something
you should consider. If your automated tools find that inconvenient then
that's a problem they should deal with.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: JDBC prepared statement: a 32767 limit of arguments number

От
Dave Cramer
Дата:


On Fri, 11 Mar 2022 at 06:59, Vladislav Malyshkin <mal@gromco.com> wrote:
On 10/03/2022 04.08, Vladimir Sitnikov wrote:

Have you tried PreparedStatement#addBatch + #executeBatch APIs and reWriteBatchedInserts=true connection option?
It might yield the expected speedup without resorting to PG-specific APIs and syntax like COPY.


Yes, I tried addBatch + executeBatch. It was very inconvenient in my specific case. I used all over the place postgresql extension

  insert into table(c1, c2, c3, ...) values (....) RETURNING *

this way I can "get back" the data inserted. The PG extension INSERT/UPDATE/DELETE ... RETURNING *
was so convenient to: 1. check  the data, 2. autoincrements, 3. Return updated data, etc, that I started to use in with almost all insert/update/delete.
The  #executeBatch returns the number of rows changed, not the data as with the RETURNING *

Well using copy or any other mechanism such as an array isn't going to provide you with the data returned. 

If that is the requirement then there is little the driver can do as this is a protocol limitation.

Regards,

Dave

Vladislav

Re: JDBC prepared statement: a 32767 limit of arguments number

От
Sehrope Sarkuni
Дата:
On Fri, Mar 11, 2022 at 6:59 AM Vladislav Malyshkin <mal@gromco.com> wrote:
Yes, I tried addBatch + executeBatch. It was very inconvenient in my specific case. I used all over the place postgresql extension

  insert into table(c1, c2, c3, ...) values (....) RETURNING *

this way I can "get back" the data inserted. The PG extension INSERT/UPDATE/DELETE ... RETURNING *
was so convenient to: 1. check  the data, 2. autoincrements, 3. Return updated data, etc, that I started to use in with almost all insert/update/delete.
The  #executeBatch returns the number of rows changed, not the data as with the RETURNING *

You can insert in bulk _and_ get the RETURNING for auto generated values back using arrays for parameters. One array per column:

=> CREATE TABLE t (a serial, b int, c text);
CREATE TABLE

=> INSERT INTO t (b, c)
   SELECT t.b, t.c
   FROM UNNEST(
          '{100,200,300}'::int[],
          '{a,b,c}'::text[]
        ) AS t(b,c)
   RETURNING *;
 a |  b  | c
---+-----+---
 1 | 100 | a
 2 | 200 | b
 3 | 300 | c
(3 rows)

INSERT 0 3


That works fine when executed via the JDBC driver as a regular query with a result set. You can parameterize the arrays either yourself or via the built-in APIs.

The number of actual parameters to the query at the protocol level will be the number of columns which presumably is less than the 32K limit. The maximum number of values (i.e. number of columns X number of you want to insert) is limited by the maximum message size on the wire protocol and the size of your serialized columns. Unless the size of the fields is huge, you should have no issues with 10,000s of rows though.

Rather than separate array parameters, you can even (ab)use JSON to pass in everything as one gigantic parameter:

=> INSERT INTO t (b,c)
   SELECT (t#>>'{0}')::int, (t#>>'{1}')::text
   FROM json_array_elements('[[100,"a"],[200,"b"],[300,"c"]]') AS t
   RETURNING *;
 a |  b  | c
---+-----+---
 4 | 100 | a
 5 | 200 | b
 6 | 300 | c
(3 rows)

INSERT 0 3


Neither of these will be fast as COPY but it does allow you to do just about any SQL and use extremely large numbers of parameters.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

 

Re: JDBC prepared statement: a 32767 limit of arguments number

От
Dave Cramer
Дата:
Hi Sehrope,


On Fri, 11 Mar 2022 at 08:47, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
On Fri, Mar 11, 2022 at 6:59 AM Vladislav Malyshkin <mal@gromco.com> wrote:
Yes, I tried addBatch + executeBatch. It was very inconvenient in my specific case. I used all over the place postgresql extension

  insert into table(c1, c2, c3, ...) values (....) RETURNING *

this way I can "get back" the data inserted. The PG extension INSERT/UPDATE/DELETE ... RETURNING *
was so convenient to: 1. check  the data, 2. autoincrements, 3. Return updated data, etc, that I started to use in with almost all insert/update/delete.
The  #executeBatch returns the number of rows changed, not the data as with the RETURNING *

You can insert in bulk _and_ get the RETURNING for auto generated values back using arrays for parameters. One array per column:

=> CREATE TABLE t (a serial, b int, c text);
CREATE TABLE

=> INSERT INTO t (b, c)
   SELECT t.b, t.c
   FROM UNNEST(
          '{100,200,300}'::int[],
          '{a,b,c}'::text[]
        ) AS t(b,c)
   RETURNING *;
 a |  b  | c
---+-----+---
 1 | 100 | a
 2 | 200 | b
 3 | 300 | c
(3 rows)

INSERT 0 3


That works fine when executed via the JDBC driver as a regular query with a result set. You can parameterize the arrays either yourself or via the built-in APIs.

The number of actual parameters to the query at the protocol level will be the number of columns which presumably is less than the 32K limit. The maximum number of values (i.e. number of columns X number of you want to insert) is limited by the maximum message size on the wire protocol and the size of your serialized columns. Unless the size of the fields is huge, you should have no issues with 10,000s of rows though.

Rather than separate array parameters, you can even (ab)use JSON to pass in everything as one gigantic parameter:

=> INSERT INTO t (b,c)
   SELECT (t#>>'{0}')::int, (t#>>'{1}')::text
   FROM json_array_elements('[[100,"a"],[200,"b"],[300,"c"]]') AS t
   RETURNING *;
 a |  b  | c
---+-----+---
 4 | 100 | a
 5 | 200 | b
 6 | 300 | c
(3 rows)

INSERT 0 3


Neither of these will be fast as COPY but it does allow you to do just about any SQL and use extremely large numbers of parameters.

I'd be willing to bet it's pretty close to COPY. Do we have any numbers ?


Dave Cramer
www.postgres.rocks

Re: JDBC prepared statement: a 32767 limit of arguments number

От
Sehrope Sarkuni
Дата:
On Fri, Mar 11, 2022 at 8:55 AM Dave Cramer <davecramer@postgres.rocks> wrote:
I'd be willing to bet it's pretty close to COPY. Do we have any numbers ?

You'd win that bet.

I just tried it out with 25K inserts using eithers VALUES, UNNEST arrays, and COPY IN. 

In a completely unscientific test on my active machine, the array and COPY IN were on par with each other, running in about 30-40ms.

Both of them were 250x faster than using individual separate statements with VALUES.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/