Обсуждение: [PATCH v1] Fix parsing of a complex type that has an array of complex types

Поиск
Список
Период
Сортировка

[PATCH v1] Fix parsing of a complex type that has an array of complex types

От
Arjan Marku
Дата:
Hi hackers,

I ran into an issue today when I was trying to insert a complex types 
where one of its attributes is also an array of complex types,

As an example:

CREATE TYPE inventory_item AS
(
     name        text,
     supplier_id integer,
     price       numeric
);
CREATE TYPE item_2d AS (id int, items inventory_item[][]);

CREATE TABLE item_2d_table (id int, item item_2d);

INSERT INTO item_2d_table VALUES(1, '(1,{{("inv a",42,1.99),("inv 
b",42,1.99)},{("inv c",42,1.99),("inv d",42,2)}})');

The INSERT statement will fail due to how complex types are parsed, I 
have included a patch in this email to support this scenario.

The reason why this fails is because record_in lacks support of 
detecting an array string when one of the attributes is of type array.
Due to this, it will stop processing the column value prematurely, which 
results in a corrupted value for that particular column.
As a result array_in will receive a malformed string which is bound to 
error.

To fix this, record_in can detect columns that are of type array and in 
such cases leave the input intact.
array_in will attempt to extract the elements one by one. In case it is 
dealing with unquoted elements, the logic needs to slightly change, 
since if the element is a record, quotes can be allowed, ex: {{("test 
field")}

There are some adjustments that can be made to the patch, for example:
We can detect the number of the dimensions of the array in record_in, do 
we want to error out in case the string has more dimensions than MAXDIM 
in array.h, (to prevent number over/underflow-ing) or whether we want to 
error out if number of dimensions is not the same with the number of 
dimensions that the attribute is supposed to have, or both?

Regards,
Arjan Marku

Вложения
Arjan Marku <arjanmarku02@gmail.com> writes:
> INSERT INTO item_2d_table VALUES(1, '(1,{{("inv a",42,1.99),("inv
> b",42,1.99)},{("inv c",42,1.99),("inv d",42,2)}})');

> The INSERT statement will fail due to how complex types are parsed, I
> have included a patch in this email to support this scenario.

The actual problem with this input is that it's inadequately quoted.
The record fields need to be quoted according to the rules in

https://www.postgresql.org/docs/current/rowtypes.html#ROWTYPES-IO-SYNTAX

and then in the field that is an array, the array elements need to be
quoted according to the rules in

https://www.postgresql.org/docs/current/arrays.html#ARRAYS-IO

and then the innermost record fields need yet another level of quoting
(well, you might be able to skip that given that there are no special
characters in this particular data, but in general you'd have to).

It looks to me like your patch is trying to make array_in and
record_in sufficiently aware of each other's rules that the outer
quoting could be omitted, but I believe that that's a seriously bad
idea.  In the first place, it's far from clear that that's even
possible without ambiguity (much less that this specific patch does
it correctly).  In the second place, this will make it even more
difficult for these functions to issue on-point error messages for
incorrect input.  (They're already struggling with that, see
e.g. [1].)  And in the third place, these functions are nearly
unmaintainable spaghetti already.  (There have also been complaints
that they're too slow, which this won't improve.)  We don't need
another big dollop of complexity here.

Our normal recommendation for handwritten input is to not try to deal
with the complications of correctly quoting nested array/record data.
Instead use row() and array[] constructors.  So you could write
something like

INSERT INTO item_2d_table
VALUES(1,
       row(1, array[[row('inv a',42,1.99), row('inv b',42,1.99)],
                    [row('inv c',42,1.99), row('inv d',42,2)]]::inventory_item[]));

In this particular example we need an explicit cast to cue the
parser about the type of the array elements, but then it can
cope with casting the outer row() construct automatically.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/CACJufxExAcpvrkbLGrZGdZ%3DbFAuj7OVp1mOhk%2BfsBzeUbOGuHQ%40mail.gmail.com



I agree with all your points as I encountered them myself, especially ambiguity and error handling.

The introduced dependency between those functions also is for me a bad idea but it seemed like the only way to support that use case, but turns out my assumption that the array literal shouldn't be quoted was wrong,

I managed to execute this query fine:
INSERT INTO item_2d_table VALUES(1, '(1,"{{""(\\""inv a\\"",42,1.99)"",""(\\"inv b\\",42,1.99)""},{""(\\""inv c\\"",42,1.99)"",""(\\""inv d\\"",42,2)""}}")');

Thanks for your insights on this,

Kind regards,
Arjan Marku

On 7/15/24 9:15 PM, Tom Lane wrote:
INSERT INTO item_2d_table VALUES(1, '(1,{{("inv a",42,1.99),("inv 
b",42,1.99)},{("inv c",42,1.99),("inv d",42,2)}})');