Обсуждение: [MASSMAIL]Problems caused by type resolution for the iso-8859-1 type

Поиск
Список
Период
Сортировка

[MASSMAIL]Problems caused by type resolution for the iso-8859-1 type

От
rwxrw@posteo.net
Дата:
Hello,

I've identified a number of cases that suffer from problems caused by 
the type
resolution algorithm when the "unknown" type is present.

The most common case is that of a UNION query which is even mentioned in the
documentation at https://www.postgresql.org/docs/16/typeconv-union-case.html
under "Example 10.13. Type Resolution in a Nested Union".

I had initially made a false assumption that only UNION, CASE, and Related
Constructs (listed in the documentation) are affected by this. Soon, 
however,
I identified some more, which made me consider this a bug or at least a
significant inconvenience for PostgreSQL users because it inflicts the 
burden
of inserting explicit type casts. This is especially impactful for
SQL-generating code which has to perform a lot of additional analysis to
determine the appropriate type cast.

Beside the UNION case from the documentation, here are the problematic 
cases I
identified:

1. JOIN on integer = unknown (text)

--
SELECT *
FROM (SELECT 2    AS a) AS t1
JOIN (SELECT NULL AS b) AS t2
ON t1.a = t2.b;

SELECT *
FROM (SELECT  2  AS a) AS t1
JOIN (SELECT '2' AS b) AS t2
ON t1.a = t2.b;
--

Both queries fail with:

ERROR:  operator does not exist: integer = text
LINE 4: ON t1.a = t2.b;
HINT:  No operator matches the given name and argument types. You might 
need to
add explicit type casts.

2. WHERE with a comparison between integer and unknown (text)

--
CREATE TABLE mytable (a int);

SELECT a FROM mytable
WHERE
a = (SELECT NULL);

SELECT a FROM Mytable
WHERE
a = (SELECT '1');
--

Both fail with:

ERROR:  operator does not exist: integer = text
LINE 3: a = (SELECT NULL); // or LINE 3: a = (SELECT '1');
HINT:  No operator matches the given name and argument types. You might 
need to add explicit type casts.

3. CASE WHEN (which is subject to the same rules as UNION)

--
CREATE TABLE test_data(a int, b int);

SELECT a FROM test_data
WHERE
a = (CASE WHEN b > 0 THEN NULL ELSE '1' END)
--

Fails with:

ERROR:  operator does not exist: integer = text
LINE 3: a = (CASE WHEN b > 0 THEN NULL ELSE '1' END)
HINT:  No operator matches the given name and argument types. You might 
need to add explicit type casts.

--

Note that all queries above succeed on Microsoft SQL Server.

I wonder if the type resolution algorithm could be changed to delay the 
final
decision until the very last moment instead of resolving "unknown" as 
"text".

Alternatively, maybe implicit type casting could be made use of during query
analysis, since the following query succeeds:

SELECT ('1'::text)::integer;

--

Cheers



Re: Problems caused by type resolution for the iso-8859-1 type

От
"David G. Johnston"
Дата:
On Thursday, March 28, 2024, <rwxrw@posteo.net> wrote:

I've identified a number of cases that suffer from problems caused by the type
resolution algorithm when the "unknown" type is present.

This has been brought up many times before, though not for a long while now.  I’d suggest searching the mailing list archives for prior discussions.

David J.
 

Re: Problems caused by type resolution for the iso-8859-1 type

От
Tom Lane
Дата:
rwxrw@posteo.net writes:
> I had initially made a false assumption that only UNION, CASE, and Related
> Constructs (listed in the documentation) are affected by this.

The examples you show aren't particularly about those things, they
are about use of sub-SELECTs, and specifically about the fact that
once we've finished parsing a sub-SELECT it's too late to revisit
decisions about what its output column types are.  The recursive
invocation of the parser will end by resolving the undecorated
NULL as "text", since there isn't any better choice.  Then the
outer invocation fails with a type mismatch.

In trivial cases such as your examples, we could imagine simply
leaving the sub-SELECT's output column type unresolved, but there
are problems with that:

1. It's inconsistent with what has to happen in other cases:
the sub-SELECT might contain constructs that force resolution
of the output column type, for instance SELECT DISTINCT.

2. We actually used to do it that way.  It was not better.
If you try this pre-v10 you get something like

regression=# SELECT *
FROM (SELECT 2    AS a) AS t1
JOIN (SELECT NULL AS b) AS t2
ON t1.a = t2.b;
ERROR:  failed to find conversion function from unknown to integer

Avoiding that would have required giving up a lot of type-safety.
You might find this thread of interest:

https://www.postgresql.org/message-id/flat/CAH2L28uwwbL9HUM-WR%3DhromW1Cvamkn7O-g8fPY2m%3D_7muJ0oA%40mail.gmail.com

> Note that all queries above succeed on Microsoft SQL Server.

SQL Server doesn't need to worry about an extensible type system.

            regards, tom lane