Обсуждение: jsonpath: Missing regex_like && starts with Errors?
Hackers, I noticed that neither `regex_like` nor `starts with`, the jsonpath operators, raise an error when the operand is not a string(or array of strings): david=# select jsonb_path_query('true', '$ like_regex "^hi"'); jsonb_path_query ------------------ null (1 row) david=# select jsonb_path_query('{"x": "hi"}', '$ starts with "^hi"'); jsonb_path_query ------------------ null (1 row) This is true in strict and lax mode, and with verbosity enabled (as in these examples). Most other operators raise an errorwhen they can’t operate on the operand: david=# select jsonb_path_query('{"x": "hi"}', '$.integer()'); ERROR: jsonpath item method .integer() can only be applied to a string or numeric value david=# select jsonb_path_query('{"x": "hi"}', '$+$'); ERROR: left operand of jsonpath operator + is not a single numeric value Should `like_regex` and `starts with` adopt this behavior, too? I note that filter expressions seem to suppress these sorts of errors, but I assume that’s by design: david=# select jsonb_path_query('{"x": "hi"}', 'strict $ ?(@ starts with "^hi")'); jsonb_path_query ------------------ (0 rows) david=# select jsonb_path_query('{"x": "hi"}', 'strict $ ?(@ like_regex "^hi")'); jsonb_path_query ------------------ (0 rows) david=# select jsonb_path_query('{"x": "hi"}', 'strict $ ?(@.integer() == 1)'); jsonb_path_query ------------------ (0 rows) D
On 06/14/24 12:21, David E. Wheeler wrote: > I noticed that neither `regex_like` nor `starts with`, the jsonpath operators, raise an error when the operand is not astring (or array of strings): > > david=# select jsonb_path_query('true', '$ like_regex "^hi"'); > jsonb_path_query > ------------------ > null > (1 row) > > david=# select jsonb_path_query('{"x": "hi"}', '$ starts with "^hi"'); > jsonb_path_query > ------------------ > null > (1 row) To begin with, both of those path queries should have been rejected at the parsing stage, just like the one David Johnson pointed out: On 06/13/24 22:14, David G. Johnston wrote: > On Thursday, June 13, 2024, Chapman Flack <jcflack@acm.org> wrote: >> On 06/13/24 21:46, David G. Johnston wrote: >>>>> david=# select jsonb_path_query('1', '$ >= 1'); >>>> >>>> Good point. I can't either. No way I can see to parse that as >>>> a <JSON path wff>. >>> >>> Whether we note it as non-standard or not is an open question then, but >> it >>> does work and opens up a documentation question. All of these are <JSON path predicate> appearing where a <JSON path wff> is needed, and that's not allowed in the standard. Strictly speaking, the only place <JSON path predicate> can appear is within <JSON filter expression>. So I should go look at our code to see what grammar we've implemented, exactly. It is beginning to seem as if we have simply added <JSON path predicate> as another choice for an expression, not restricted to only appearing in a filter. If so, and we add documentation about how we diverge from the standard, that's probably the way to say it. On 06/13/24 22:14, David G. Johnston wrote: > I don’t get why the outcome of a boolean producing operation isn’t just > generally allowed to be produced I understand; after all, what is a 'predicate' but another 'boolean producing operation'? But the committee (at least in this edition) has stuck us with this clear division in the grammar: there is no <JSON path wff>, boolean as it may be, that can appear as a <JSON path predicate>, and there is no <JSON path predicate> that can appear outside of a filter and be treated as a boolean-valued expression. As for the error behavior of a <JSON path predicate> (which strictly, again, can only appear inside a <JSON filter expression>), the standard says what seems to be the same thing, in a couple different ways. In 4.48.5 Overview of SQL/JSON path language, this is said: "The SQL/JSON path language traps any errors that occur during the evaluation of a <JSON filter expression>. Depending on the precise <JSON path predicate> ... the result may be Unknown, True, or False, ...". Later in 9.46's General Rules where the specific semantics of the various predicates are laid out, each predicate has rules spelling out which of Unknown, True, or False results when an error condition is encountered (usually Unknown, except where something already seen allows returning True or False). Finally, the <JSON filter expression> itself collapses the three-valued logic to two; it includes the items for which the predicate returns True, and excludes them for False or Unknown. So that's where the errors went. The question of what should happen to the errors when a <JSON path predicate> appears outside of a <JSON filter expression> of course isn't answered in the standard, because that's not supposed to be possible. So if we're allowing predicates to appear on their own as expressions, it's also up to us to say what should happen with errors when they do. Regards, -Chap
On 06/14/24 22:29, Chapman Flack wrote: > So I should go look at our code to see what grammar we've implemented, > exactly. It is beginning to seem as if we have simply added > <JSON path predicate> as another choice for an expression, not restricted > to only appearing in a filter. If so, and we add documentation about how > we diverge from the standard, that's probably the way to say it. That's roughly what we've done: 119 result: 120 mode expr_or_predicate { 121 ... 125 } 126 | /* EMPTY */ { *result = NULL; } 127 ; 128 129 expr_or_predicate: 130 expr { $$ = $1; } 131 | predicate { $$ = $1; } 132 ; Oddly, that's only at the top-level goal production. Your entire JSON path query we'll allow to be a predicate in lieu of an expr. We still don't allow a predicate to appear in place of an expr within any other production. Regards, -Chap
On Jun 14, 2024, at 22:29, Chapman Flack <jcflack@acm.org> wrote: > So I should go look at our code to see what grammar we've implemented, > exactly. It is beginning to seem as if we have simply added > <JSON path predicate> as another choice for an expression, not restricted > to only appearing in a filter. If so, and we add documentation about how > we diverge from the standard, that's probably the way to say it. Yes, if I understand correctly, these are predicate check expressions, supported and documented as an extension to the standardsince Postgres 12[1]. I found their behavior quite confusing for a while, and spent some time figuring it out andsubmitting a doc patch (committed in 7014c9a[2]) to hopefully clarify things in Postgres 17. > So that's where the errors went. Ah, great, that explains the error suppression in filters. Thank you. I still think the supression of `like_regex` and `startswith` errors in predicate path queries is odd, though. > The question of what should happen to the errors when a > <JSON path predicate> appears outside of a <JSON filter expression> > of course isn't answered in the standard, because that's not supposed > to be possible. So if we're allowing predicates to appear on their own > as expressions, it's also up to us to say what should happen with errors > when they do. Right, and I think there’s an inconsistency right now. Best, David [1]: https://www.postgresql.org/docs/devel/functions-json.html#FUNCTIONS-SQLJSON-CHECK-EXPRESSIONS [2]: https://github.com/postgres/postgres/commit/7014c9a
On 06/15/24 10:47, David E. Wheeler wrote: > these are predicate check expressions, supported and documented > as an extension to the standard since Postgres 12[1]. > ... > [1]: https://www.postgresql.org/docs/devel/functions-json.html#FUNCTIONS-SQLJSON-CHECK-EXPRESSIONS I see. Yes, that documentation now says "predicate check expressions return the single three-valued result of the predicate: true, false, or unknown". (Aside: are all readers of the docs assumed to have learned the habit of calling SQL null "unknown" when speaking of a boolean? They can flip back to 8.6 Boolean Type and see 'a third state, “unknown”, which is represented by the SQL null value'. But would it save them some page flipping to add " (represented by SQL null)" to the sentence here?) As Unknown is typically what the predicates return within a filter (where errors get trapped) when an error has occurred, the existing docs seem to suggest they behave the same way in a "predicate check expression", so a change to that behavior now would be a change to what we've documented. OTOH, getting Unknown because some error occurred is strictly less information than seeing the error, so perhaps you would want a way to request non-error-trapping behavior for a "predicate check expression". Can't really overload jsonb_path_query's 'silent' parameter for that, because it is already false by default. If predicate check expressions were nonsilent by default, the existing 'silent' parameter would be a perfect way to silence them. No appetite to add yet another optional boolean parameter to jsonb_path_query for the sole purpose of controlling the silence of our nonstandard syntax extension .... Maybe just see the nonstandard syntax extension and raise it another one: expr_or_predicate : expr | predicate | "nonsilent" '(' predicate ')' ; or something like that. Regards, -Chap
On Jun 15, 2024, at 12:23, Chapman Flack <jcflack@acm.org> wrote: > I see. Yes, that documentation now says "predicate check expressions return > the single three-valued result of the predicate: true, false, or unknown". It has been there since jsonpath was introduced in v12[1]: > A path expression can be a Boolean predicate, although the SQL/JSON standard allows predicates only in filters. This isnecessary for implementation of the @@ operator. For example, the following jsonpath expression is valid in PostgreSQL: > > '$.track.segments[*].HR < 70' > (Aside: are all readers of the docs assumed to have learned the habit > of calling SQL null "unknown" when speaking of a boolean? They can flip > back to 8.6 Boolean Type and see 'a third state, “unknown”, which is > represented by the SQL null value'. But would it save them some page > flipping to add " (represented by SQL null)" to the sentence here?) In 9.16.2[2] it says: > The unknown value plays the same role as SQL NULL and can be tested for with the is unknown predicate. > As Unknown is typically what the predicates return within a filter (where > errors get trapped) when an error has occurred, the existing docs seem to > suggest they behave the same way in a "predicate check expression", so a > change to that behavior now would be a change to what we've documented. It’s reasonable to ask, then, whether `starts with` and `like_regex` are correct and the others shouldn’t throw errors inpredicate check expressions, yes. I don’t know the answer, but would like it to be consistent. > Can't really overload jsonb_path_query's 'silent' parameter for that, > because it is already false by default. If predicate check expressions > were nonsilent by default, the existing 'silent' parameter would be a > perfect way to silence them. I think that’s how it should be; I prefer that it raises errors by default but you can silence them: david=# select jsonb_path_query(target => '{"x": "hi"}', path => '$.integer()', silent => false); ERROR: jsonpath item method .integer() can only be applied to a string or numeric value david=# select jsonb_path_query(target => '{"x": "hi"}', path => '$.integer()', silent => true); jsonb_path_query ------------------ (0 rows) I suggest that the same behavior be adopted for `like_regex` and `starts with`. > No appetite to add yet another optional boolean parameter to > jsonb_path_query for the sole purpose of controlling the silence of > our nonstandard syntax extension .... You don’t need it IMO, the existing silent parameter already does it existing error-raising operators. Best, David [1]: https://www.postgresql.org/docs/12/functions-json.html#FUNCTIONS-SQLJSON-PATH [2]: https://www.postgresql.org/docs/devel/functions-json.html#FUNCTIONS-SQLJSON-PATH
On Jun 16, 2024, at 11:52, David E. Wheeler <david@justatheory.com> wrote: > I think that’s how it should be; I prefer that it raises errors by default but you can silence them: > > david=# select jsonb_path_query(target => '{"x": "hi"}', path => '$.integer()', silent => false); > ERROR: jsonpath item method .integer() can only be applied to a string or numeric value > > david=# select jsonb_path_query(target => '{"x": "hi"}', path => '$.integer()', silent => true); > jsonb_path_query > ------------------ > (0 rows) > > I suggest that the same behavior be adopted for `like_regex` and `starts with`. Okay, I think I’ve figured this out, and the key is that I am, once again, comparing predicate path queries to SQL standardqueries. If I update the first example to use a comparison I no longer get an error: david=# select jsonb_path_query('{"x": "hi"}', '$.integer() == 1'); jsonb_path_query ------------------ null So I think that’s the key: There’s not a difference between the behavior of `like_regex` and `starts with` vs other predicateexpressions. This dichotomy continues to annoy. I would very much like some way to have jsonb_path_query() raise an error (or even a warning!)if passed a predate expression, and for jsonb_path_match() to raise an error or warning if its path is not a predicateexpression. Because I keep confusing TF out of myself. Best, David
On 06/17/24 18:14, David E. Wheeler wrote: > So I think that’s the key: There’s not a difference between the behavior of > `like_regex` and `starts with` vs other predicate expressions. The current implementation seems to have made each of our <JSON path predicate>s responsible for swallowing its own errors, which is one perfectly cromulent way to satisfy the SQL standard behavior saying all errors within a <JSON filter expression> should be swallowed. The standard says nothing on how they should behave outside of a <JSON filter expression>, because as far as the standard's concerned, they can't appear there. Ours currently behave the same way, and swallow their errors. It would have been possible to write them in such a way as to raise errors, but not when inside a <JSON filter expression>, and that would also satisfy the standard, but it would also give us the errors you would like from our nonstandard "predicate check expressions". And then you could easily use silent => true if you wanted them silent. I'd be leery of changing that, though, as we've already documented that a "predicate check expression" returns true, false, or unknown, so having it throw by default seems like a change of documented behavior. The current situation can't make much use of 'silent', since it's already false by default; you can't make it any falser to make predicate-check errors show up. Would it be a thinkable thought to change the 'silent' default to null? That could have the same effect as false for SQL standard expressions, and the same effect seen now for "predicate check expressions", and you could pass it explicitly false if you wanted errors from the predicate checks. If that's no good, I don't see an obvious solution other than adding another nonstandard construct to what's nonstandard already, and allowing something like nonsilent(predicate check expression). Regards, -Chap
On Jun 17, 2024, at 6:44 PM, Chapman Flack <jcflack@acm.org> wrote: > The current implementation seems to have made each of our > <JSON path predicate>s responsible for swallowing its own errors, which > is one perfectly cromulent way to satisfy the SQL standard behavior saying > all errors within a <JSON filter expression> should be swallowed. Naw, executePredicate does it for all of them, as for the left operand here[1]. > The standard says nothing on how they should behave outside of a > <JSON filter expression>, because as far as the standard's concerned, > they can't appear there. > > Ours currently behave the same way, and swallow their errors. Yes, and they’re handled consistently, at least. > It would have been possible to write them in such a way as to raise errors, > but not when inside a <JSON filter expression>, and that would also satisfy > the standard, but it would also give us the errors you would like from our > nonstandard "predicate check expressions". And then you could easily use > silent => true if you wanted them silent. I’m okay without the errors, as long as the behaviors are consistent. I mean it might be cool to have a way to get them,but the consistency I thought I saw was the bit that seemed like a bug. > I'd be leery of changing that, though, as we've already documented that > a "predicate check expression" returns true, false, or unknown, so having > it throw by default seems like a change of documented behavior. Right, same for using jsonb_path_match(). > The current situation can't make much use of 'silent', since it's already > false by default; you can't make it any falser to make predicate-check > errors show up. EXTREAMLY FALSE! 😂 > Would it be a thinkable thought to change the 'silent' default to null? > That could have the same effect as false for SQL standard expressions, and > the same effect seen now for "predicate check expressions", and you could > pass it explicitly false if you wanted errors from the predicate checks. Thaat seems like it’d be confusing TBH. > If that's no good, I don't see an obvious solution other than adding > another nonstandard construct to what's nonstandard already, and allowing > something like nonsilent(predicate check expression). The only options I can think of are a GUC to turn on SUPER STRICT mode or something (yuck, action at a distance) or introducenew functions with the new behavior. I advocate for neither (at this point). Best, David [1]: https://github.com/postgres/postgres/blob/82ed67a/src/backend/utils/adt/jsonpath_exec.c#L2058-L2059
On 06/17/24 19:17, David E. Wheeler wrote: > [1]: https://github.com/postgres/postgres/blob/82ed67a/src/backend/utils/adt/jsonpath_exec.c#L2058-L2059 Huh, I just saw something peculiar, skimming through the code: https://github.com/postgres/postgres/blob/82ed67a/src/backend/utils/adt/jsonpath_exec.c#L1385 We allow .boolean() applied to a jbvBool, a jbvString (those are the only two possibilities allowed by the standard), or to a jbvNumeric (!), but only if it can be serialized and then parsed as an int4, otherwise we say ERRCODE_NON_NUMERIC_SQL_JSON_ITEM, or if it survived all that we call it true if it isn't zero. I wonder what that alternative is doing there. It also looks like the expected errcode (in the standard, if the item was not boolean or string) would be 2202V "non-boolean SQL/JSON item" ... which isn't in our errcodes.txt. Regards, -Chap
On 18.06.24 04:17, Chapman Flack wrote: > On 06/17/24 19:17, David E. Wheeler wrote: >> [1]: https://github.com/postgres/postgres/blob/82ed67a/src/backend/utils/adt/jsonpath_exec.c#L2058-L2059 > > Huh, I just saw something peculiar, skimming through the code: > > https://github.com/postgres/postgres/blob/82ed67a/src/backend/utils/adt/jsonpath_exec.c#L1385 > > We allow .boolean() applied to a jbvBool, a jbvString (those are the only > two possibilities allowed by the standard), or to a jbvNumeric (!), but > only if it can be serialized and then parsed as an int4, otherwise we say > ERRCODE_NON_NUMERIC_SQL_JSON_ITEM, or if it survived all that we call it > true if it isn't zero. > > I wonder what that alternative is doing there. Are you saying we shouldn't allow .boolean() to be called on a JSON number? I would concur that that's what the spec says.
On 06/18/24 08:30, Peter Eisentraut wrote: > Are you saying we shouldn't allow .boolean() to be called on a JSON number? > > I would concur that that's what the spec says. Or, if we want to extend the spec and allow .boolean() on a JSON number, should it just check that the number is nonzero or zero, rather than checking that it can be serialized then deserialized as an int4 and otherwise complaining that it isn't a number? Which error code to use seems to be a separate issue. Is it possible that more codes like 2202V non-boolean SQL/JSON item were added in a later spec than we developed the code from? I have not read through all of the code to see in how many other places the error code doesn't match the spec. Regards, -Chap