Обсуждение: jsonb subscripting vs SQL/JSON array accessor semantics (SQL:2023)
Hi hackers,
While working on a patch [1] to add SQL/JSON simplified accessor
support (dot notation, wildcards, etc.) to jsonb, I ran into a
semantic mismatch between PostgreSQL’s existing jsonb subscripting [2]
syntax (value[...]), which predates the SQL/JSON standard, and the
SQL:2023 specification of the JSON array accessor (also value[...]) in
the SQL/JSON path language.
Because both use bracket notation for array access, this creates
several edge cases where the existing behavior in PostgreSQL differs
from what the standard specifies, and from what json_query() already
implemented today.
I’d like to summarize the differences, discuss whether we should
reconcile them, and get guidance on how we want to proceed.
---
## Context
PostgreSQL currently allows jsonb subscripts that are coercible to
int4 or text:
-- object member access
SELECT ('{"a": 1}'::jsonb)['a']; -- returns 1 (a jsonb number)
-- array element access
SELECT ('[1, "2", null]'::jsonb)[1]; -- returns "2" (a jsonb string)
In contrast, SQL:2023 defines array access as part of the SQL/JSON
path language:
<JSON array accessor> ::=
<left bracket> <JSON subscript list> <right bracket>
<JSON subscript list> ::=
<JSON subscript> [ { <comma> <JSON subscript> }... ]
where each subscript expression must evaluate to a singleton numeric
value (except for the special *last* keyword). Object member access is
performed using dot notation, not ['key'].
The standard further specifies that a simplified accessor expression
is semantically equivalent to:
JSON_QUERY (VEP, 'lax $JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR)
where:
VEP = <value expression primary>
JC = <JSON simplified accessor op chain>
For example:
select ('[1, "2", null]'::jsonb)[1];
by the standard's specification is semantically equivalent to:
select json_query('[1, "2", null]'::jsonb, 'lax $[1]' with conditional array wrapper);
However, as Peter Eisentraut pointed out in [3], these two forms
currently diverge in several cases.
---
## Observed differences
Below are the main cases where jsonb subscripting and SQL/JSON path
semantics differ.
### Case 1a: Array access on a jsonb primitive value
select ('123'::jsonb)[0];
jsonb
-------
(1 row)
select json_query('123'::jsonb, 'lax $[0]' with conditional array wrapper);
json_query
------------
123
(1 row)
The json_query() behavior is correct and comply with the SQL standard:
> In lax mode:
> If an operation requires an SQL/JSON array but the operand is not an
> SQL/JSON array, then the operand is first "wrapped" in an SQL/JSON
> array prior to performing the operation.
So, whenever there is a 'lax $[0]' jsonpath expression applying on a
non-json array, the return value should always be the same as the
original json value expression, in this case, the same number 123.
The jsonb subscripting result is not only different, but also
contradicts its current documentation [2], which suggests raising
error:
> However, if any val itself or one of the intermediary values is
> defined as a non-object such as a string, number, or jsonb null,
> traversal cannot proceed so an error is raised and the transaction
> aborted.
---
### Case 1b: Array access on a jsonb object
select ('{"a": 1}'::jsonb)[0];
jsonb
-------
(1 row)
select json_query('{"a": 1}'::jsonb, 'lax $[0]' with conditional array
wrapper);
json_query
------------
{"a": 1}
(1 row)
Same issue as Case 1a, but with an jsonb object instead of a scalar.
json_query() returns the same input json object as expected, whereas
the jsonb subscripting returns null. The jsonb subscripting behavior
here is undocumented.
### Case 1c: Chained array access on a one-dimensional json array
select ('[1, "2", null]'::jsonb)[1][0];
jsonb
-------
(1 row)
select json_query('[1, "2", null]'::jsonb, 'lax $[1][0]' with
conditional array wrapper);
json_query
------------
"2"
(1 row)
This reduces to Case 1a: the intermediate result of [1] is a scalar
("2"), which SQL/JSON treats as a single-element array in 'lax' mode.
---
### Case 2: Non-integer array index
select ('[1, "2", null]'::jsonb)[1.0];
ERROR: subscript type numeric is not supported
LINE 1: select ('[1, "2", null]'::jsonb)[1.0];
^
HINT: jsonb subscript must be coercible to either integer or text.
select json_query('[1, "2", null]'::jsonb, 'lax $[1.0]' with conditional
array wrapper);
json_query
------------
"2"
(1 row)
select json_query('[1, "2", null]'::jsonb, 'lax $[1.9]' with conditional
array wrapper);
json_query
------------
"2"
(1 row)
The standard allows non-integer numeric subscripts, with
implementation-defined rounding or truncation (IA073). PostgreSQL’s
json_query() uses truncation, which is compliant.
jsonb subscripting does not support this at all, which I think is
reasonable, but again diverges from the standard behavior.
For reference, since DuckDB and Oracle both support JSON simplified
accessor, I tried similar queries on them as well:
-- DuckDB
duckdb> select ('[1, "2", null]'::json)[0.4]; -- returns 1
duckdb> select ('[1, "2", null]'::json)[0.5]; -- returns 2
-- Oracle
WITH data AS (
SELECT JSON('[1, "2", null]') AS jcol
)
SELECT d.jcol[1.5] FROM data d; -- error
So DuckDB does rounding, and Orcale errors due to "an invalid array
index".
---
### Case 3: Negative array index
select ('[1, "2", null]'::jsonb)[-2];
jsonb
-------
"2"
(1 row)
select json_query('[1, "2", null]'::jsonb, 'lax $[-2]' with conditional
array wrapper);
json_query
------------
(1 row)
select json_query('[1, "2", null]'::jsonb, 'lax $[-0.9]' with conditional array wrapper);
json_query
------------
1
(1 row)
The json_query() behavior is correct according to the standard for out
of range index in 'lax' mode; and the jsonb subscripting is a custom
extension (-1 means last element, etc.).
Interestingly, DuckDB's JSON array accessor made the same custom
choice as PostgreSQL for negative index.
---
## Summary
In all cases above, json_query() follows the SQL/JSON standard
specification. jsonb subscripting, which predates the standard,
differs in several ways:
1. Array access on non-arrays (scalars or objects) does not use 'lax'
mode wrapping. As a result, "[0]" does not return the original value.
2. Non-integer subscripts are not supported.
3. Negative subscripts use a PostgreSQL-specific extension.
These are all edge cases and likely low-impact in practice. However,
if we were to add broader support for SQL/JSON simplified accessors
(dot notation, wildcards, item methods, etc.), which can be chained
together and include array access using the existing bracket syntax,
it seems useful to clarify what semantics we would want for array
access in those situations.
For example, with expressions such as
select (jb)[0].a
select (jb).a[0].b
select (jb).a[-1].b
(where jb is a jsonb value), it would be potentially confusing if
their behavior differed from the equivalent json_query() calls using
the same JSON path.
Given this, it is unclear to me whether the expectation should be to
move closer to the SQL/JSON path semantics, or to preserve the
existing jsonb subscripting behavior and document it as
PostgreSQL-specific.
This question comes up in the context of the dot-notation work I
mentioned at the beginning, but more generally it seems like something
we may want to be explicit about before extending the syntax further.
I would very much appreciate any thoughts or guidance on this.
Best,
Alex
[1] https://commitfest.postgresql.org/patch/5214/
[2] https://www.postgresql.org/docs/current/datatype-json.html#JSONB-SUBSCRIPTING
[3] https://www.postgresql.org/message-id/a8843d0a-8adb-4fdc-9ac8-8efd22f7d29c%40eisentraut.org
[4]https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-STRICT-AND-LAX-MODES
While working on a patch [1] to add SQL/JSON simplified accessor
support (dot notation, wildcards, etc.) to jsonb, I ran into a
semantic mismatch between PostgreSQL’s existing jsonb subscripting [2]
syntax (value[...]), which predates the SQL/JSON standard, and the
SQL:2023 specification of the JSON array accessor (also value[...]) in
the SQL/JSON path language.
Because both use bracket notation for array access, this creates
several edge cases where the existing behavior in PostgreSQL differs
from what the standard specifies, and from what json_query() already
implemented today.
I’d like to summarize the differences, discuss whether we should
reconcile them, and get guidance on how we want to proceed.
---
## Context
PostgreSQL currently allows jsonb subscripts that are coercible to
int4 or text:
-- object member access
SELECT ('{"a": 1}'::jsonb)['a']; -- returns 1 (a jsonb number)
-- array element access
SELECT ('[1, "2", null]'::jsonb)[1]; -- returns "2" (a jsonb string)
In contrast, SQL:2023 defines array access as part of the SQL/JSON
path language:
<JSON array accessor> ::=
<left bracket> <JSON subscript list> <right bracket>
<JSON subscript list> ::=
<JSON subscript> [ { <comma> <JSON subscript> }... ]
where each subscript expression must evaluate to a singleton numeric
value (except for the special *last* keyword). Object member access is
performed using dot notation, not ['key'].
The standard further specifies that a simplified accessor expression
is semantically equivalent to:
JSON_QUERY (VEP, 'lax $JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR)
where:
VEP = <value expression primary>
JC = <JSON simplified accessor op chain>
For example:
select ('[1, "2", null]'::jsonb)[1];
by the standard's specification is semantically equivalent to:
select json_query('[1, "2", null]'::jsonb, 'lax $[1]' with conditional array wrapper);
However, as Peter Eisentraut pointed out in [3], these two forms
currently diverge in several cases.
---
## Observed differences
Below are the main cases where jsonb subscripting and SQL/JSON path
semantics differ.
### Case 1a: Array access on a jsonb primitive value
select ('123'::jsonb)[0];
jsonb
-------
(1 row)
select json_query('123'::jsonb, 'lax $[0]' with conditional array wrapper);
json_query
------------
123
(1 row)
The json_query() behavior is correct and comply with the SQL standard:
> In lax mode:
> If an operation requires an SQL/JSON array but the operand is not an
> SQL/JSON array, then the operand is first "wrapped" in an SQL/JSON
> array prior to performing the operation.
So, whenever there is a 'lax $[0]' jsonpath expression applying on a
non-json array, the return value should always be the same as the
original json value expression, in this case, the same number 123.
The jsonb subscripting result is not only different, but also
contradicts its current documentation [2], which suggests raising
error:
> However, if any val itself or one of the intermediary values is
> defined as a non-object such as a string, number, or jsonb null,
> traversal cannot proceed so an error is raised and the transaction
> aborted.
---
### Case 1b: Array access on a jsonb object
select ('{"a": 1}'::jsonb)[0];
jsonb
-------
(1 row)
select json_query('{"a": 1}'::jsonb, 'lax $[0]' with conditional array
wrapper);
json_query
------------
{"a": 1}
(1 row)
Same issue as Case 1a, but with an jsonb object instead of a scalar.
json_query() returns the same input json object as expected, whereas
the jsonb subscripting returns null. The jsonb subscripting behavior
here is undocumented.
### Case 1c: Chained array access on a one-dimensional json array
select ('[1, "2", null]'::jsonb)[1][0];
jsonb
-------
(1 row)
select json_query('[1, "2", null]'::jsonb, 'lax $[1][0]' with
conditional array wrapper);
json_query
------------
"2"
(1 row)
This reduces to Case 1a: the intermediate result of [1] is a scalar
("2"), which SQL/JSON treats as a single-element array in 'lax' mode.
---
### Case 2: Non-integer array index
select ('[1, "2", null]'::jsonb)[1.0];
ERROR: subscript type numeric is not supported
LINE 1: select ('[1, "2", null]'::jsonb)[1.0];
^
HINT: jsonb subscript must be coercible to either integer or text.
select json_query('[1, "2", null]'::jsonb, 'lax $[1.0]' with conditional
array wrapper);
json_query
------------
"2"
(1 row)
select json_query('[1, "2", null]'::jsonb, 'lax $[1.9]' with conditional
array wrapper);
json_query
------------
"2"
(1 row)
The standard allows non-integer numeric subscripts, with
implementation-defined rounding or truncation (IA073). PostgreSQL’s
json_query() uses truncation, which is compliant.
jsonb subscripting does not support this at all, which I think is
reasonable, but again diverges from the standard behavior.
For reference, since DuckDB and Oracle both support JSON simplified
accessor, I tried similar queries on them as well:
-- DuckDB
duckdb> select ('[1, "2", null]'::json)[0.4]; -- returns 1
duckdb> select ('[1, "2", null]'::json)[0.5]; -- returns 2
-- Oracle
WITH data AS (
SELECT JSON('[1, "2", null]') AS jcol
)
SELECT d.jcol[1.5] FROM data d; -- error
So DuckDB does rounding, and Orcale errors due to "an invalid array
index".
---
### Case 3: Negative array index
select ('[1, "2", null]'::jsonb)[-2];
jsonb
-------
"2"
(1 row)
select json_query('[1, "2", null]'::jsonb, 'lax $[-2]' with conditional
array wrapper);
json_query
------------
(1 row)
select json_query('[1, "2", null]'::jsonb, 'lax $[-0.9]' with conditional array wrapper);
json_query
------------
1
(1 row)
The json_query() behavior is correct according to the standard for out
of range index in 'lax' mode; and the jsonb subscripting is a custom
extension (-1 means last element, etc.).
Interestingly, DuckDB's JSON array accessor made the same custom
choice as PostgreSQL for negative index.
---
## Summary
In all cases above, json_query() follows the SQL/JSON standard
specification. jsonb subscripting, which predates the standard,
differs in several ways:
1. Array access on non-arrays (scalars or objects) does not use 'lax'
mode wrapping. As a result, "[0]" does not return the original value.
2. Non-integer subscripts are not supported.
3. Negative subscripts use a PostgreSQL-specific extension.
These are all edge cases and likely low-impact in practice. However,
if we were to add broader support for SQL/JSON simplified accessors
(dot notation, wildcards, item methods, etc.), which can be chained
together and include array access using the existing bracket syntax,
it seems useful to clarify what semantics we would want for array
access in those situations.
For example, with expressions such as
select (jb)[0].a
select (jb).a[0].b
select (jb).a[-1].b
(where jb is a jsonb value), it would be potentially confusing if
their behavior differed from the equivalent json_query() calls using
the same JSON path.
Given this, it is unclear to me whether the expectation should be to
move closer to the SQL/JSON path semantics, or to preserve the
existing jsonb subscripting behavior and document it as
PostgreSQL-specific.
This question comes up in the context of the dot-notation work I
mentioned at the beginning, but more generally it seems like something
we may want to be explicit about before extending the syntax further.
I would very much appreciate any thoughts or guidance on this.
Best,
Alex
[1] https://commitfest.postgresql.org/patch/5214/
[2] https://www.postgresql.org/docs/current/datatype-json.html#JSONB-SUBSCRIPTING
[3] https://www.postgresql.org/message-id/a8843d0a-8adb-4fdc-9ac8-8efd22f7d29c%40eisentraut.org
[4]https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-STRICT-AND-LAX-MODES
Alexandra Wang
On Fri, 16 Jan 2026 at 22:27, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > In all cases above, json_query() follows the SQL/JSON standard > specification. jsonb subscripting, which predates the standard, > differs in several ways: My thoughts on changing these 3 behaviours: > 1. Array access on non-arrays (scalars or objects) does not use 'lax' > mode wrapping. As a result, "[0]" does not return the original value. I think it's unlikely anyone cares about the exact behaviour here in practice. But changing the behaviour could corrupt expression indexes that use this syntax. > 2. Non-integer subscripts are not supported. Changing this to not throw an error seems fine to me. Making something that throw an error, now not throw an error should not cause breakage for people. The new behaviour would be of arguable usefulness though. > 3. Negative subscripts use a PostgreSQL-specific extension. I think there are probably people relying on it. And postgres behaviour actually seems way more useful than the SQL Standard behaviour. You said DuckDB does the same as Postgres. That doesn't surprise me much, since DuckDB usually defaults to Postgres behaviour. They don't care much about being strictly sql standard compliant, if that means more sensible/useful SQL for their users. And since many of their users are used to Postgres, they try to stay PostgreSQL compatible in their SQL (unless they think the postgres behaviour is really weird/confusing). I do wonder what other databases do though. Does Oracle, MySQL or MSSQL actually follow the standard here? i.e how incompatible is this behaviour in practice with other databases? > I would very much appreciate any thoughts or guidance on this. If change 3 would not have been there, I would have probably been okay with changing Postgres to behave like the SQL standard and telling people to re-index their indexes that use this syntax in that major release. But I think we should keep our current behaviour for option 3. An approach that I think would be viable to do that is: 1. Define a new sqlpath mode (e.g. with the name lax_postgres or something). And define that as our current behaviour (possibly with 2 changed to behave like lax). 2. Document that our SQL/JSON simplified accessors diverge slightly from the SQL standard because they use lax_postgres instead of lax. That would mean there's still an easy 1-to-1 translation between the simplified accessor string and and JSON_QUERY (all that would be different is the change from lax to lax_postgres in the string)
so 17. 1. 2026 v 10:54 odesílatel Jelte Fennema-Nio <postgres@jeltef.nl> napsal:
On Fri, 16 Jan 2026 at 22:27, Alexandra Wang
<alexandra.wang.oss@gmail.com> wrote:
> In all cases above, json_query() follows the SQL/JSON standard
> specification. jsonb subscripting, which predates the standard,
> differs in several ways:
My thoughts on changing these 3 behaviours:
> 1. Array access on non-arrays (scalars or objects) does not use 'lax'
> mode wrapping. As a result, "[0]" does not return the original value.
I think it's unlikely anyone cares about the exact behaviour here in
practice. But changing the behaviour could corrupt expression indexes
that use this syntax.
> 2. Non-integer subscripts are not supported.
Changing this to not throw an error seems fine to me. Making something
that throw an error, now not throw an error should not cause breakage
for people. The new behaviour would be of arguable usefulness though.
> 3. Negative subscripts use a PostgreSQL-specific extension.
I think there are probably people relying on it. And postgres
behaviour actually seems way more useful than the SQL Standard
behaviour.
You said DuckDB does the same as Postgres. That doesn't surprise me
much, since DuckDB usually defaults to Postgres behaviour. They don't
care much about being strictly sql standard compliant, if that means
more sensible/useful SQL for their users. And since many of their
users are used to Postgres, they try to stay PostgreSQL compatible in
their SQL (unless they think the postgres behaviour is really
weird/confusing).
I do wonder what other databases do though. Does Oracle, MySQL or
MSSQL actually follow the standard here? i.e how incompatible is this
behaviour in practice with other databases?
> I would very much appreciate any thoughts or guidance on this.
If change 3 would not have been there, I would have probably been okay
with changing Postgres to behave like the SQL standard and telling
people to re-index their indexes that use this syntax in that major
release. But I think we should keep our current behaviour for option
3.
An approach that I think would be viable to do that is:
1. Define a new sqlpath mode (e.g. with the name lax_postgres or
something). And define that as our current behaviour (possibly with 2
changed to behave like lax).
2. Document that our SQL/JSON simplified accessors diverge slightly
from the SQL standard because they use lax_postgres instead of lax.
That would mean there's still an easy 1-to-1 translation between the
simplified accessor string and and JSON_QUERY (all that would be
different is the change from lax to lax_postgres in the string)
I am not sure if the implementation of the third method is the best we can do.
Described handling of corner cases in SQL/JSON has some logic and consistency, but it is not compatible with the generic philosophy of PostgreSQL arrays. If I know ANSI/SQL doesn't know arrays, so this inconsistency is just a PostgreSQL problem, and because we don't like feature flags, I don't see any solution to how this situation can be solved.
Any solution will be ugly. In this situation I prefer current behavior - (inconsistency between array access and JSON_QUERY) with good description in documentation.
Theoretically it can be introduced lax_postgres like you propose. But I don't see how it can help with possible compatibility issues when somebody will migrate from other databases.
So anything inside JSON_XXXX functions can be rigidly consistent with standard SQL/JSON. Outside should not be true - and it is better to say it explicitly. I don't think introducing some JavaScripts concepts to Postgres (although just for some corner cases) is a good idea (when we have some specific handling of some corner cases too).
Regards
Pavel
On Sat, 17 Jan 2026 at 11:22, Pavel Stehule <pavel.stehule@gmail.com> wrote: > Described handling of corner cases in SQL/JSON has some logic and consistency, but it is not compatible with the genericphilosophy of PostgreSQL arrays. If I know ANSI/SQL doesn't know arrays, so this inconsistency is just a PostgreSQLproblem, and because we don't like feature flags, I don't see any solution to how this situation can be solved. Array subscripting (aka indexing) and jsonb subscripting work completely differently. A very important difference is that arrays use 1-based subscripting, while jsonb uses 0-based subscripting. > Any solution will be ugly. In this situation I prefer current behavior - (inconsistency between array access and JSON_QUERY)with good description in documentation. > > Theoretically it can be introduced lax_postgres like you propose. But I don't see how it can help with possible compatibilityissues when somebody will migrate from other databases. I didn't mean to suggest it for compatibility reasons (although I do think there's very little practical compatibility risk with keeping our current behaviour). It seemed mostly nice so that we can have a simplified accessor parsetree be transformed to the same plan as json query based query. That will make explain plans look the same/similar and it also means that expression indexes can be easily used with both syntaxes. > So anything inside JSON_XXXX functions can be rigidly consistent with standard SQL/JSON. Outside should not be true - andit is better to say it explicitly. I don't think introducing some JavaScripts concepts to Postgres (although just forsome corner cases) is a good idea (when we have some specific handling of some corner cases too). What does Javascript have to do with this topic?
so 17. 1. 2026 v 15:56 odesílatel Jelte Fennema-Nio <postgres@jeltef.nl> napsal:
On Sat, 17 Jan 2026 at 11:22, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Described handling of corner cases in SQL/JSON has some logic and consistency, but it is not compatible with the generic philosophy of PostgreSQL arrays. If I know ANSI/SQL doesn't know arrays, so this inconsistency is just a PostgreSQL problem, and because we don't like feature flags, I don't see any solution to how this situation can be solved.
Array subscripting (aka indexing) and jsonb subscripting work
completely differently. A very important difference is that arrays use
1-based subscripting, while jsonb uses 0-based subscripting.
PostgreSQL arrays can be 0 based too. I don't see a problem with this - Postgres is more general than SQL/JSON and then there is not a problem
> Any solution will be ugly. In this situation I prefer current behavior - (inconsistency between array access and JSON_QUERY) with good description in documentation.
>
> Theoretically it can be introduced lax_postgres like you propose. But I don't see how it can help with possible compatibility issues when somebody will migrate from other databases.
I didn't mean to suggest it for compatibility reasons (although I do
think there's very little practical compatibility risk with keeping
our current behaviour). It seemed mostly nice so that we can have a
simplified accessor parsetree be transformed to the same plan as json
query based query. That will make explain plans look the same/similar
and it also means that expression indexes can be easily used with both
syntaxes.
> So anything inside JSON_XXXX functions can be rigidly consistent with standard SQL/JSON. Outside should not be true - and it is better to say it explicitly. I don't think introducing some JavaScripts concepts to Postgres (although just for some corner cases) is a good idea (when we have some specific handling of some corner cases too).
What does Javascript have to do with this topic?
I see some JavaScript philosophy (or HTML) in design of SQL/JSON -
'{[1,2,3]}'[0,1] -> 1 (SQL/JSON) versus -> NULL (Postgres)
but I can be wrong
Regards
Pavel