Обсуждение: BUG #17637: case-when branches taken even if they dont match, raising errors

Поиск
Список
Период
Сортировка

BUG #17637: case-when branches taken even if they dont match, raising errors

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      17637
Logged by:          Facundo Etchezar
Email address:      hctf90@gmail.com
PostgreSQL version: 14.5
Operating system:   Debian 11.3.0-3
Description:

Hi! I've come across some weird behavior. I'm inserting a row into a table,
and I parse/cast a text column in one way or the other depending on the
result of a join with a case-when expression. The issue is that for some
reason it seems the branches that aren't taken are evaluated anyway, which
in turn raises a cast error. Here is a tiny repro where I make two tables
and then try to insert in the test table a float8 or bool value based on the
result of the joined tmap table.

This tiny repro below raises the error SQL Error [22P02]: ERROR: invalid
input syntax for type boolean: "123.4" Like it's trying to parse the text
column as bool even if it shouldn't reach that part of the case-when.

drop table if exists test;
drop table if exists tmap;
create table test(
 id int8,
 vf float8,
 vb bool
);
create table tmap(
 id int8,
 mapped_to int8
);
insert into tmap values(1, 1);
insert into tmap values(2, 2);
insert into test 
with tmp as (select 1::int8 id, '123.4'::text v)
select t.id,
case m.mapped_to when 1 then v::float8 else null end,
case m.mapped_to when 2 then v::bool else null end
from tmp t
join tmap m on m.id = t.id;

Weirdly enough if you forego the join, doing this below, it works fine:

drop table if exists test;
create table test(
 id int8,
 vf float8,
 vb bool
);
insert into test 
with tmp as (select 1::int8 id, '123.4'::text v)
select t.id,
case t.id when 1 then v::float8 else null end,
case t.id when 2 then v::bool else null end
from tmp t;

This should result in the same behavior yet it works fine, without throwing
an error.

I've also tested this in https://www.db-fiddle.com with versions 15 beta,
13, 12. All with the same error. Versions 11 and 10 seem to work fine.

Thank you for your time.


Re: BUG #17637: case-when branches taken even if they dont match, raising errors

От
Richard Guo
Дата:

On Thu, Oct 13, 2022 at 6:08 PM PG Bug reporting form <noreply@postgresql.org> wrote:
drop table if exists test;
drop table if exists tmap;
create table test(
 id int8,
 vf float8,
 vb bool
);
create table tmap(
 id int8,
 mapped_to int8
);
insert into tmap values(1, 1);
insert into tmap values(2, 2);
insert into test
with tmp as (select 1::int8 id, '123.4'::text v)
select t.id,
case m.mapped_to when 1 then v::float8 else null end,
case m.mapped_to when 2 then v::bool else null end
from tmp t
join tmap m on m.id = t.id;
 
I think this has something to do with the CTE used here. In
preprocess_expression, we do not know the value of m.mapped_to, so we
cannot tell the test condition is constant FALSE. Thus we need go on
processing the result. But thanks to the CTE, we know t.v is const
'123.4'::text, and we want to convert it to boolean, which triggers the
error.

I'm not sure about this being a bug.

Thanks
Richard

Re: BUG #17637: case-when branches taken even if they dont match, raising errors

От
Tom Lane
Дата:
Richard Guo <guofenglinux@gmail.com> writes:
> On Thu, Oct 13, 2022 at 6:08 PM PG Bug reporting form <
> noreply@postgresql.org> wrote:
>> with tmp as (select 1::int8 id, '123.4'::text v)
>> select t.id,
>> case m.mapped_to when 1 then v::float8 else null end,
>> case m.mapped_to when 2 then v::bool else null end
>> from tmp t
>> join tmap m on m.id = t.id;

> I'm not sure about this being a bug.

I'm not buying it either.  The point is that the constants from the
WITH clause get pulled up into the outer query, whereupon you have

    case m.mapped_to when 2 then '123.4'::text::bool else null end

and then we apply constant-folding which tries to perform the bool
conversion.  (There are some folding rules whereby if a WHEN condition
reduces to constant true or constant false, we drop all the
therefore-unreachable THEN/ELSE arms without folding them --- but
that doesn't help here since m.mapped_to isn't a constant.)

I'm not especially eager to lobotomize the const-folding rules
in order to make toy examples like this one work.  I don't think
it's representative of real queries; but we *would* be penalizing
real queries if we didn't perform such folding.

I believe you could dodge the issue in this particular case
by marking the WITH query as MATERIALIZED, which'll serve as
an optimization fence to prevent the constants from being
hoisted into the outer query.

            regards, tom lane



Re: BUG #17637: case-when branches taken even if they dont match, raising errors

От
Facundo Etchezar
Дата:
I see. So is this behavior expected? The two snippets should work the same when you look at them but one errors out and the other doesn't. I'm thinking either both should work or both should error out.

I had this issue on an insert from a NiFi instance that received JSON data (field sensor measurements from edge devices) and inserted it into a PostgreSQL instance. The snippet is a short "toy example" simply because I managed to make an easy reproduction case for your convenience to test it easily, that's all.

That being said, using the materialized CTE makes it work just fine, so I'll have this in mind if it happens again. Thank you.

On Thu, Oct 13, 2022 at 11:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
> On Thu, Oct 13, 2022 at 6:08 PM PG Bug reporting form <
> noreply@postgresql.org> wrote:
>> with tmp as (select 1::int8 id, '123.4'::text v)
>> select t.id,
>> case m.mapped_to when 1 then v::float8 else null end,
>> case m.mapped_to when 2 then v::bool else null end
>> from tmp t
>> join tmap m on m.id = t.id;

> I'm not sure about this being a bug.

I'm not buying it either.  The point is that the constants from the
WITH clause get pulled up into the outer query, whereupon you have

    case m.mapped_to when 2 then '123.4'::text::bool else null end

and then we apply constant-folding which tries to perform the bool
conversion.  (There are some folding rules whereby if a WHEN condition
reduces to constant true or constant false, we drop all the
therefore-unreachable THEN/ELSE arms without folding them --- but
that doesn't help here since m.mapped_to isn't a constant.)

I'm not especially eager to lobotomize the const-folding rules
in order to make toy examples like this one work.  I don't think
it's representative of real queries; but we *would* be penalizing
real queries if we didn't perform such folding.

I believe you could dodge the issue in this particular case
by marking the WITH query as MATERIALIZED, which'll serve as
an optimization fence to prevent the constants from being
hoisted into the outer query.

                        regards, tom lane

Re: BUG #17637: case-when branches taken even if they dont match, raising errors

От
"David G. Johnston"
Дата:
On Fri, Oct 14, 2022 at 12:23 AM Facundo Etchezar <hctf90@gmail.com> wrote:
I see. So is this behavior expected? The two snippets should work the same when you look at them but one errors out and the other doesn't. I'm thinking either both should work or both should error out.

While reasonable, this particular dynamic falls into a grey area.  SQL is a strongly typed language and having a column of data that is conditionally in different data formats is not really compatible with that design.  The CASE expression does allow for handling of this typically but that only works during execution - and in this case the problematic optimization is happening during parsing.  More people write structurally correct inefficient queries than non-structurally correct ones and so the parsing time optimization stays as-is.  As noted, you have a way to prohibit the optimization from revealing the design problem with your query.

David J.


Re: BUG #17637: case-when branches taken even if they dont match, raising errors

От
Facundo Etchezar
Дата:
SQL is a strongly typed language and having a column of data that is conditionally in different data formats is not really compatible with that design.
I completely understand the argument of not touching the optimizer for these cases, especially since I have a way to make it work which Tom provided. But this is really not what is happening in the query.

I'm doing the equivalent of this piece of pseudo-code:

text v = '1234';
int8 id_type = 1;
bool bool_col;
int8 int_col;
if(id_type == 1) {
  int_col =  int8.parse(v);
} else {
  int_col = null;
}
if(id_type == 2) {
  bool_col = bool.parse(v);
} else {
  bool_col = null;
}

And this query is executing both conditionals as if the condition is true in both cases which is impossible, id_type can't be both values at the same time. There is no dynamic typing whatsoever, it's all strongly typed, it's just a very normal conditional operation.

Although this matter is way beyond the initial issue, which I consider resolved since the 'materialized' keyword lets me make my original insert just fine.

On Fri, Oct 14, 2022 at 10:28 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Fri, Oct 14, 2022 at 12:23 AM Facundo Etchezar <hctf90@gmail.com> wrote:
I see. So is this behavior expected? The two snippets should work the same when you look at them but one errors out and the other doesn't. I'm thinking either both should work or both should error out.

While reasonable, this particular dynamic falls into a grey area.  SQL is a strongly typed language and having a column of data that is conditionally in different data formats is not really compatible with that design.  The CASE expression does allow for handling of this typically but that only works during execution - and in this case the problematic optimization is happening during parsing.  More people write structurally correct inefficient queries than non-structurally correct ones and so the parsing time optimization stays as-is.  As noted, you have a way to prohibit the optimization from revealing the design problem with your query.

David J.