Обсуждение: BUG #16356: Can't perform json operations on casted text in where clause when a join is used. WHERE is too early

Поиск
Список
Период
Сортировка

BUG #16356: Can't perform json operations on casted text in where clause when a join is used. WHERE is too early

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      16356
Logged by:          Vale Violet Mote
Email address:      valefbonetwo@gmail.com
PostgreSQL version: 12.2
Operating system:   Win10
Description:

Live question: https://stackoverflow.com/questions/61147921
Paste of text:

Postgres (12.2) Setup:

```sql
CREATE TABLE public.test_table (
    id int NOT NULL,
    value_type text NOT NULL,
    value text NOT NULL
);
```
```sql
INSERT INTO public.test_table
(id, value_type, value)
VALUES (1, 'string', 'a'), 
(2, 'json', '{"hello":"world"}'),
(3, 'json', '{"color":"blue"}');
```
Initial Queries:

```sql
select value::jsonb as json_value from test_table where value_type =
'json'
```

```
json_value        |
------------------|
{"hello": "world"}|
{"color": "blue"} |
```

But I'm only interested in ones with 'color'.

Moving it to a subquery so that I can get only 'color', also just fine:

```sql
select only_json.json_value 
from(
    select value::jsonb as json_value from test_table where value_type =
'json'
) only_json
where only_json.json_value ? 'color' = true
```
```
json_value        |
------------------|
{"color": "blue"} |
```

Now let's break that main table up into two, and suddenly effectively the
same query has trouble:

```sql

CREATE TABLE public.test_table (
    id INT PRIMARY KEY,
    value TEXT NOT NULL
);
CREATE TABLE public.test_types (
    id INT PRIMARY KEY REFERENCES public.test_table (id),
    value_type TEXT NOT NULL
);

INSERT INTO public.test_table
(id, value)
VALUES (1, 'a'), 
(2, '{"hello":"world"}'),
(3, '{"color":"blue"}');

insert into public.test_types
(id, value_type)
values (1, 'string'),
(2, 'json'),
(3, 'json');
```

Now this query:
```sql
select id, value from (
select id, value::jsonb from public.test_table natural join
public.test_types
where value_type = 'json') only_json
```

returns, as expected: 
```
id|value             |
--|------------------|
 2|{"hello": "world"}|
 3|{"color": "blue"} |
```

But as soon as I attach the where clause, it fails:

```sql
select id, value from (
select id, value::jsonb from public.test_table natural join
public.test_types
where value_type = 'json') only_json
where only_json.value ? 'color' = true
```

```
SQL Error [22P02]: ERROR: invalid input syntax for type json
  Detail: Token "a" is invalid.
  Where: JSON data, line 1: a
```

It's somehow resurrected the value of 'a' that was well-eliminated prior to
this where clause. So what gives? Why does the join cause it to apply the
last where clause (which should happen logically last) too early?  Failed
workarounds I've tried:

 - Using left join instead of natural join.
 - Applying `where value_type = 'json'` to the joined table first, prior to
the join.
 - Moving it to a "with".
 - Creating a view and then applying the where clause to a select from the
view.
 - Creating a column via select called `is_color_holder` with `SELECT
only_json.value ? 'color' as is_color_holder`. This column populates
correctly, but if I use a where clause, `WHERE is_color_holder = true`, I
receive the same error.


PG Bug reporting form <noreply@postgresql.org> writes:
> It's somehow resurrected the value of 'a' that was well-eliminated prior to
> this where clause. So what gives? Why does the join cause it to apply the
> last where clause (which should happen logically last) too early?

The documentation explicitly disclaims any specific evaluation order for
WHERE clauses, cf

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-EXPRESS-EVAL

If you really need to, you can force the matter by putting some type of
optimization fence into the sub-select ("OFFSET 0" is the traditional
way, or you can use WITH ... AS MATERIALIZED in v12 and up).  This is
typically disastrous for performance, of course.  Not being able to
filter rows before joining would make the join much slower.

I'll just note that the sort of EAV schema that you have here is widely
agreed to be an anti-pattern in database design.

            regards, tom lane




On Friday, April 10, 2020, PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      16356
Logged by:          Vale Violet Mote
Email address:      valefbonetwo@gmail.com
PostgreSQL version: 12.2
Operating system:   Win10
Description:       

Live question: https://stackoverflow.com/questions/61147921
Paste of text:

But as soon as I attach the where clause, it fails:

```sql
select id, value from (
select id, value::jsonb from public.test_table natural join
public.test_types
where value_type = 'json') only_json
where only_json.value ? 'color' = true
```

```
SQL Error [22P02]: ERROR: invalid input syntax for type json
  Detail: Token "a" is invalid.
  Where: JSON data, line 1: a
```

It's somehow resurrected the value of 'a' that was well-eliminated prior to
this where clause.  So what gives? Why does the join cause it to apply the
last where clause (which should happen logically last) too early?


PostgreSQL is trying to be helpful by re-arranging things to execute in the most efficient way possible.  It cannot adapt those optimizations on the fly in response to data.  Since your model has issues that make this optimization fail you need to make it so the optimization cannot be applied.  Adding “offset 0” to the subquery should accomplish this.

 - Moving it to a "with".

It would have in prior versions.  I think 12 removed the optimization fence that used to be in place here.

The best solution is not to have column content formats vary, though I get that is not always possible.

David J.