Обсуждение: jsonpath: Missing regex_like && starts with Errors?

Поиск
Список
Период
Сортировка

jsonpath: Missing regex_like && starts with Errors?

От
"David E. Wheeler"
Дата:
Hackers,

I noticed that neither `regex_like` nor `starts with`, the jsonpath operators, raise an error when the operand is not a
string(or array of strings): 

david=# select jsonb_path_query('true', '$ like_regex "^hi"');
 jsonb_path_query
------------------
 null
(1 row)

david=# select jsonb_path_query('{"x": "hi"}', '$ starts with "^hi"');
 jsonb_path_query
------------------
 null
(1 row)

This is true in strict and lax mode, and with verbosity enabled (as in these examples). Most other operators raise an
errorwhen they can’t operate on the operand: 

david=# select jsonb_path_query('{"x": "hi"}', '$.integer()');
ERROR:  jsonpath item method .integer() can only be applied to a string or numeric value

david=# select jsonb_path_query('{"x": "hi"}', '$+$');
ERROR:  left operand of jsonpath operator + is not a single numeric value

Should `like_regex` and `starts with` adopt this behavior, too?

I note that filter expressions seem to suppress these sorts of errors, but I assume that’s by design:

david=# select jsonb_path_query('{"x": "hi"}', 'strict $ ?(@ starts with "^hi")');
 jsonb_path_query
------------------
(0 rows)

david=# select jsonb_path_query('{"x": "hi"}', 'strict $ ?(@ like_regex "^hi")');
 jsonb_path_query
------------------
(0 rows)

david=# select jsonb_path_query('{"x": "hi"}', 'strict $ ?(@.integer() == 1)');
 jsonb_path_query
------------------
(0 rows)

D




Re: jsonpath: Missing regex_like && starts with Errors?

От
Chapman Flack
Дата:
On 06/14/24 12:21, David E. Wheeler wrote:
> I noticed that neither `regex_like` nor `starts with`, the jsonpath operators, raise an error when the operand is not
astring (or array of strings): 
>
> david=# select jsonb_path_query('true', '$ like_regex "^hi"');
>  jsonb_path_query
> ------------------
>  null
> (1 row)
>
> david=# select jsonb_path_query('{"x": "hi"}', '$ starts with "^hi"');
>  jsonb_path_query
> ------------------
>  null
> (1 row)

To begin with, both of those path queries should have been rejected at
the parsing stage, just like the one David Johnson pointed out:

On 06/13/24 22:14, David G. Johnston wrote:
> On Thursday, June 13, 2024, Chapman Flack <jcflack@acm.org> wrote:
>> On 06/13/24 21:46, David G. Johnston wrote:
>>>>> david=# select jsonb_path_query('1', '$ >= 1');
>>>>
>>>> Good point. I can't either. No way I can see to parse that as
>>>> a <JSON path wff>.
>>>
>>> Whether we note it as non-standard or not is an open question then, but
>> it
>>> does work and opens up a documentation question.

All of these are <JSON path predicate> appearing where a <JSON path wff>
is needed, and that's not allowed in the standard. Strictly speaking, the
only place <JSON path predicate> can appear is within <JSON filter expression>.

So I should go look at our code to see what grammar we've implemented,
exactly. It is beginning to seem as if we have simply added
<JSON path predicate> as another choice for an expression, not restricted
to only appearing in a filter. If so, and we add documentation about how
we diverge from the standard, that's probably the way to say it.

On 06/13/24 22:14, David G. Johnston wrote:
> I don’t get why the outcome of a boolean producing operation isn’t just
> generally allowed to be produced

I understand; after all, what is a 'predicate' but another 'boolean
producing operation'? But the committee (at least in this edition) has
stuck us with this clear division in the grammar: there is no
<JSON path wff>, boolean as it may be, that can appear as a
<JSON path predicate>, and there is no <JSON path predicate> that
can appear outside of a filter and be treated as a boolean-valued
expression.

As for the error behavior of a <JSON path predicate> (which strictly,
again, can only appear inside a <JSON filter expression>), the standard
says what seems to be the same thing, in a couple different ways.

In 4.48.5 Overview of SQL/JSON path language, this is said: "The SQL/JSON
path language traps any errors that occur during the evaluation of a
<JSON filter expression>. Depending on the precise <JSON path predicate> ...
the result may be Unknown, True, or False, ...".

Later in 9.46's General Rules where the specific semantics of the
various predicates are laid out, each predicate has rules spelling out
which of Unknown, True, or False results when an error condition is
encountered (usually Unknown, except where something already seen allows
returning True or False). Finally, the <JSON filter expression> itself
collapses the three-valued logic to two; it includes the items for which
the predicate returns True, and excludes them for False or Unknown.

So that's where the errors went.

The question of what should happen to the errors when a
<JSON path predicate> appears outside of a <JSON filter expression>
of course isn't answered in the standard, because that's not supposed
to be possible. So if we're allowing predicates to appear on their own
as expressions, it's also up to us to say what should happen with errors
when they do.

Regards,
-Chap



Re: jsonpath: Missing regex_like && starts with Errors?

От
Chapman Flack
Дата:
On 06/14/24 22:29, Chapman Flack wrote:
> So I should go look at our code to see what grammar we've implemented,
> exactly. It is beginning to seem as if we have simply added
> <JSON path predicate> as another choice for an expression, not restricted
> to only appearing in a filter. If so, and we add documentation about how
> we diverge from the standard, that's probably the way to say it.

That's roughly what we've done:


 119 result:
 120     mode expr_or_predicate          {
 121                                         ...
 125                                     }
 126     | /* EMPTY */                   { *result = NULL; }
 127     ;
 128
 129 expr_or_predicate:
 130     expr                            { $$ = $1; }
 131     | predicate                     { $$ = $1; }
 132     ;


Oddly, that's only at the top-level goal production. Your entire JSON
path query we'll allow to be a predicate in lieu of an expr. We still
don't allow a predicate to appear in place of an expr within any other
production.

Regards,
-Chap



Re: jsonpath: Missing regex_like && starts with Errors?

От
"David E. Wheeler"
Дата:
On Jun 14, 2024, at 22:29, Chapman Flack <jcflack@acm.org> wrote:

> So I should go look at our code to see what grammar we've implemented,
> exactly. It is beginning to seem as if we have simply added
> <JSON path predicate> as another choice for an expression, not restricted
> to only appearing in a filter. If so, and we add documentation about how
> we diverge from the standard, that's probably the way to say it.

Yes, if I understand correctly, these are predicate check expressions, supported and documented as an extension to the
standardsince Postgres 12[1]. I found their behavior quite confusing for a while, and spent some time figuring it out
andsubmitting a doc patch (committed in 7014c9a[2]) to hopefully clarify things in Postgres 17. 

> So that's where the errors went.

Ah, great, that explains the error suppression in filters. Thank you. I still think the supression of `like_regex` and
`startswith` errors in predicate path queries is odd, though. 

> The question of what should happen to the errors when a
> <JSON path predicate> appears outside of a <JSON filter expression>
> of course isn't answered in the standard, because that's not supposed
> to be possible. So if we're allowing predicates to appear on their own
> as expressions, it's also up to us to say what should happen with errors
> when they do.

Right, and I think there’s an inconsistency right now.

Best,

David

[1]: https://www.postgresql.org/docs/devel/functions-json.html#FUNCTIONS-SQLJSON-CHECK-EXPRESSIONS
[2]: https://github.com/postgres/postgres/commit/7014c9a




Re: jsonpath: Missing regex_like && starts with Errors?

От
Chapman Flack
Дата:
On 06/15/24 10:47, David E. Wheeler wrote:
> these are predicate check expressions, supported and documented
> as an extension to the standard since Postgres 12[1].
> ...
> [1]: https://www.postgresql.org/docs/devel/functions-json.html#FUNCTIONS-SQLJSON-CHECK-EXPRESSIONS

I see. Yes, that documentation now says "predicate check expressions return
the single three-valued result of the predicate: true, false, or unknown".

(Aside: are all readers of the docs assumed to have learned the habit
of calling SQL null "unknown" when speaking of a boolean? They can flip
back to 8.6 Boolean Type and see 'a third state, “unknown”, which is
represented by the SQL null value'. But would it save them some page
flipping to add " (represented by SQL null)" to the sentence here?)

As Unknown is typically what the predicates return within a filter (where
errors get trapped) when an error has occurred, the existing docs seem to
suggest they behave the same way in a "predicate check expression", so a
change to that behavior now would be a change to what we've documented.

OTOH, getting Unknown because some error occurred is strictly less
information than seeing the error, so perhaps you would want a way
to request non-error-trapping behavior for a "predicate check expression".

Can't really overload jsonb_path_query's 'silent' parameter for that,
because it is already false by default. If predicate check expressions
were nonsilent by default, the existing 'silent' parameter would be a
perfect way to silence them.

No appetite to add yet another optional boolean parameter to
jsonb_path_query for the sole purpose of controlling the silence of
our nonstandard syntax extension ....

Maybe just see the nonstandard syntax extension and raise it another one:

expr_or_predicate
    : expr
    | predicate
    | "nonsilent" '(' predicate ')'
    ;

or something like that.

Regards,
-Chap



Re: jsonpath: Missing regex_like && starts with Errors?

От
"David E. Wheeler"
Дата:
On Jun 15, 2024, at 12:23, Chapman Flack <jcflack@acm.org> wrote:

> I see. Yes, that documentation now says "predicate check expressions return
> the single three-valued result of the predicate: true, false, or unknown".

It has been there since jsonpath was introduced in v12[1]:

> A path expression can be a Boolean predicate, although the SQL/JSON standard allows predicates only in filters. This
isnecessary for implementation of the @@ operator. For example, the following jsonpath expression is valid in
PostgreSQL:
>
> '$.track.segments[*].HR < 70'



> (Aside: are all readers of the docs assumed to have learned the habit
> of calling SQL null "unknown" when speaking of a boolean? They can flip
> back to 8.6 Boolean Type and see 'a third state, “unknown”, which is
> represented by the SQL null value'. But would it save them some page
> flipping to add " (represented by SQL null)" to the sentence here?)

In 9.16.2[2] it says:

> The unknown value plays the same role as SQL NULL and can be tested for with the is unknown predicate.

> As Unknown is typically what the predicates return within a filter (where
> errors get trapped) when an error has occurred, the existing docs seem to
> suggest they behave the same way in a "predicate check expression", so a
> change to that behavior now would be a change to what we've documented.

It’s reasonable to ask, then, whether `starts with` and `like_regex` are correct and the others shouldn’t throw errors
inpredicate check expressions, yes. I don’t know the answer, but would like it to be consistent. 

> Can't really overload jsonb_path_query's 'silent' parameter for that,
> because it is already false by default. If predicate check expressions
> were nonsilent by default, the existing 'silent' parameter would be a
> perfect way to silence them.

I think that’s how it should be; I prefer that it raises errors by default but you can silence them:

david=# select jsonb_path_query(target => '{"x": "hi"}', path => '$.integer()', silent => false);
ERROR:  jsonpath item method .integer() can only be applied to a string or numeric value

david=# select jsonb_path_query(target => '{"x": "hi"}', path => '$.integer()', silent => true);
 jsonb_path_query
------------------
(0 rows)

I suggest that the same behavior be adopted for `like_regex` and `starts with`.

> No appetite to add yet another optional boolean parameter to
> jsonb_path_query for the sole purpose of controlling the silence of
> our nonstandard syntax extension ....

You don’t need it IMO, the existing silent parameter already does it existing error-raising operators.

Best,

David

[1]: https://www.postgresql.org/docs/12/functions-json.html#FUNCTIONS-SQLJSON-PATH
[2]: https://www.postgresql.org/docs/devel/functions-json.html#FUNCTIONS-SQLJSON-PATH





Re: jsonpath: Missing regex_like && starts with Errors?

От
"David E. Wheeler"
Дата:
On Jun 16, 2024, at 11:52, David E. Wheeler <david@justatheory.com> wrote:

> I think that’s how it should be; I prefer that it raises errors by default but you can silence them:
>
> david=# select jsonb_path_query(target => '{"x": "hi"}', path => '$.integer()', silent => false);
> ERROR:  jsonpath item method .integer() can only be applied to a string or numeric value
>
> david=# select jsonb_path_query(target => '{"x": "hi"}', path => '$.integer()', silent => true);
> jsonb_path_query
> ------------------
> (0 rows)
>
> I suggest that the same behavior be adopted for `like_regex` and `starts with`.

Okay, I think I’ve figured this out, and the key is that I am, once again, comparing predicate path queries to SQL
standardqueries. If I update the first example to use a comparison I no longer get an error: 

david=# select jsonb_path_query('{"x": "hi"}', '$.integer() == 1');
 jsonb_path_query
------------------
 null

So I think that’s the key: There’s not a difference between the behavior of `like_regex` and `starts with` vs other
predicateexpressions. 

This dichotomy continues to annoy. I would very much like some way to have jsonb_path_query() raise an error (or even a
warning!)if passed a predate expression, and for jsonb_path_match() to raise an error or warning if its path is not a
predicateexpression. Because I keep confusing TF out of myself. 

Best,

David




Re: jsonpath: Missing regex_like && starts with Errors?

От
Chapman Flack
Дата:
On 06/17/24 18:14, David E. Wheeler wrote:
> So I think that’s the key: There’s not a difference between the behavior of
> `like_regex` and `starts with` vs other predicate expressions.

The current implementation seems to have made each of our
<JSON path predicate>s responsible for swallowing its own errors, which
is one perfectly cromulent way to satisfy the SQL standard behavior saying
all errors within a <JSON filter expression> should be swallowed.

The standard says nothing on how they should behave outside of a
<JSON filter expression>, because as far as the standard's concerned,
they can't appear there.

Ours currently behave the same way, and swallow their errors.

It would have been possible to write them in such a way as to raise errors,
but not when inside a <JSON filter expression>, and that would also satisfy
the standard, but it would also give us the errors you would like from our
nonstandard "predicate check expressions". And then you could easily use
silent => true if you wanted them silent.

I'd be leery of changing that, though, as we've already documented that
a "predicate check expression" returns true, false, or unknown, so having
it throw by default seems like a change of documented behavior.

The current situation can't make much use of 'silent', since it's already
false by default; you can't make it any falser to make predicate-check
errors show up.

Would it be a thinkable thought to change the 'silent' default to null?
That could have the same effect as false for SQL standard expressions, and
the same effect seen now for "predicate check expressions", and you could
pass it explicitly false if you wanted errors from the predicate checks.

If that's no good, I don't see an obvious solution other than adding
another nonstandard construct to what's nonstandard already, and allowing
something like nonsilent(predicate check expression).

Regards,
-Chap



Re: jsonpath: Missing regex_like && starts with Errors?

От
"David E. Wheeler"
Дата:
On Jun 17, 2024, at 6:44 PM, Chapman Flack <jcflack@acm.org> wrote:

> The current implementation seems to have made each of our
> <JSON path predicate>s responsible for swallowing its own errors, which
> is one perfectly cromulent way to satisfy the SQL standard behavior saying
> all errors within a <JSON filter expression> should be swallowed.

Naw, executePredicate does it for all of them, as for the left operand here[1].

> The standard says nothing on how they should behave outside of a
> <JSON filter expression>, because as far as the standard's concerned,
> they can't appear there.
>
> Ours currently behave the same way, and swallow their errors.

Yes, and they’re handled consistently, at least.

> It would have been possible to write them in such a way as to raise errors,
> but not when inside a <JSON filter expression>, and that would also satisfy
> the standard, but it would also give us the errors you would like from our
> nonstandard "predicate check expressions". And then you could easily use
> silent => true if you wanted them silent.

I’m okay without the errors, as long as the behaviors are consistent. I mean it might be cool to have a way to get
them,but the consistency I thought I saw was the bit that seemed like a bug. 

> I'd be leery of changing that, though, as we've already documented that
> a "predicate check expression" returns true, false, or unknown, so having
> it throw by default seems like a change of documented behavior.

Right, same for using jsonb_path_match().

> The current situation can't make much use of 'silent', since it's already
> false by default; you can't make it any falser to make predicate-check
> errors show up.

EXTREAMLY FALSE! 😂

> Would it be a thinkable thought to change the 'silent' default to null?
> That could have the same effect as false for SQL standard expressions, and
> the same effect seen now for "predicate check expressions", and you could
> pass it explicitly false if you wanted errors from the predicate checks.

Thaat seems like it’d be confusing TBH.

> If that's no good, I don't see an obvious solution other than adding
> another nonstandard construct to what's nonstandard already, and allowing
> something like nonsilent(predicate check expression).

The only options I can think of are a GUC to turn on SUPER STRICT mode or something (yuck, action at a distance) or
introducenew functions with the new behavior. I advocate for neither (at this point). 

Best,

David

[1]: https://github.com/postgres/postgres/blob/82ed67a/src/backend/utils/adt/jsonpath_exec.c#L2058-L2059




Re: jsonpath: Missing regex_like && starts with Errors?

От
Chapman Flack
Дата:
On 06/17/24 19:17, David E. Wheeler wrote:
> [1]: https://github.com/postgres/postgres/blob/82ed67a/src/backend/utils/adt/jsonpath_exec.c#L2058-L2059

Huh, I just saw something peculiar, skimming through the code:

https://github.com/postgres/postgres/blob/82ed67a/src/backend/utils/adt/jsonpath_exec.c#L1385

We allow .boolean() applied to a jbvBool, a jbvString (those are the only
two possibilities allowed by the standard), or to a jbvNumeric (!), but
only if it can be serialized and then parsed as an int4, otherwise we say
ERRCODE_NON_NUMERIC_SQL_JSON_ITEM, or if it survived all that we call it
true if it isn't zero.

I wonder what that alternative is doing there.

It also looks like the expected errcode (in the standard, if the item
was not boolean or string) would be 2202V "non-boolean SQL/JSON item" ...
which isn't in our errcodes.txt.

Regards,
-Chap



Re: jsonpath: Missing regex_like && starts with Errors?

От
Peter Eisentraut
Дата:
On 18.06.24 04:17, Chapman Flack wrote:
> On 06/17/24 19:17, David E. Wheeler wrote:
>> [1]: https://github.com/postgres/postgres/blob/82ed67a/src/backend/utils/adt/jsonpath_exec.c#L2058-L2059
> 
> Huh, I just saw something peculiar, skimming through the code:
> 
> https://github.com/postgres/postgres/blob/82ed67a/src/backend/utils/adt/jsonpath_exec.c#L1385
> 
> We allow .boolean() applied to a jbvBool, a jbvString (those are the only
> two possibilities allowed by the standard), or to a jbvNumeric (!), but
> only if it can be serialized and then parsed as an int4, otherwise we say
> ERRCODE_NON_NUMERIC_SQL_JSON_ITEM, or if it survived all that we call it
> true if it isn't zero.
> 
> I wonder what that alternative is doing there.

Are you saying we shouldn't allow .boolean() to be called on a JSON number?

I would concur that that's what the spec says.




Re: jsonpath: Missing regex_like && starts with Errors?

От
Chapman Flack
Дата:
On 06/18/24 08:30, Peter Eisentraut wrote:
> Are you saying we shouldn't allow .boolean() to be called on a JSON number?
> 
> I would concur that that's what the spec says.

Or, if we want to extend the spec and allow .boolean() on a JSON number,
should it just check that the number is nonzero or zero, rather than
checking that it can be serialized then deserialized as an int4 and
otherwise complaining that it isn't a number?

Which error code to use seems to be a separate issue. Is it possible that
more codes like 2202V non-boolean SQL/JSON item were added in a later spec
than we developed the code from?

I have not read through all of the code to see in how many other places
the error code doesn't match the spec.

Regards,
-Chap