Обсуждение: pgsql: Implement jsonpath .datetime() method

Поиск
Список
Период
Сортировка

pgsql: Implement jsonpath .datetime() method

От
Alexander Korotkov
Дата:
Implement jsonpath .datetime() method

This commit implements jsonpath .datetime() method as it's specified in
SQL/JSON standard.  There are no-argument and single-argument versions of
this method.  No-argument version selects first of ISO datetime formats
matching input string.  Single-argument version accepts template string as
its argument.

Additionally to .datetime() method itself this commit also implements
comparison ability of resulting date and time values.  There is some difficulty
because exising jsonb_path_*() functions are immutable, while comparison of
timezoned and non-timezoned types involves current timezone.  At first, current
timezone could be changes in session.  Moreover, timezones themselves are not
immutable and could be updated.  This is why we let existing immutable functions
throw errors on such non-immutable comparison.  In the same time this commit
provides jsonb_path_*_tz() functions which are stable and support operations
involving timezones.  As new functions are added to the system catalog,
catversion is bumped.

Support of .datetime() method was the only blocker prevents T832 from being
marked as supported.  sql_features.txt is updated correspondingly.

Extracted from original patch by Nikita Glukhov, Teodor Sigaev, Oleg Bartunov.
Heavily revised by me.  Comments were adjusted by Liudmila Mantrova.

Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com
Discussion: https://postgr.es/m/CAPpHfdsZgYEra_PeCLGNoXOWYx6iU-S3wF8aX0ObQUcZU%2B4XTw%40mail.gmail.com
Author: Alexander Korotkov, Nikita Glukhov, Teodor Sigaev, Oleg Bartunov, Liudmila Mantrova
Reviewed-by: Anastasia Lubennikova, Peter Eisentraut

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/bffe1bd68457e43925c362d8728ce3b25bdf1c94

Modified Files
--------------
doc/src/sgml/func.sgml                       | 117 +++++-
src/backend/catalog/sql_features.txt         |   2 +-
src/backend/catalog/system_views.sql         |  40 +++
src/backend/utils/adt/jsonpath.c             |  24 +-
src/backend/utils/adt/jsonpath_exec.c        | 470 ++++++++++++++++++++++--
src/backend/utils/adt/jsonpath_gram.y        |  14 +
src/backend/utils/adt/jsonpath_scan.l        |   1 +
src/backend/utils/errcodes.txt               |   1 +
src/include/catalog/catversion.h             |   2 +-
src/include/catalog/pg_proc.dat              |  22 ++
src/include/utils/jsonpath.h                 |   1 +
src/test/regress/expected/jsonb_jsonpath.out | 520 +++++++++++++++++++++++++++
src/test/regress/expected/jsonpath.out       |  12 +
src/test/regress/sql/jsonb_jsonpath.sql      | 172 +++++++++
src/test/regress/sql/jsonpath.sql            |   2 +
15 files changed, 1355 insertions(+), 45 deletions(-)


Re: pgsql: Implement jsonpath .datetime() method

От
Alexander Korotkov
Дата:
On Wed, Sep 25, 2019 at 10:52 PM Alexander Korotkov
<akorotkov@postgresql.org> wrote:
> Implement jsonpath .datetime() method

I've noticed buildfarm is unhappy with my commits.

1. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing&dt=2019-09-25%2020%3A40%3A11
2. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary&dt=2019-09-25%2020%3A38%3A21

I'm investigating the issue.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: pgsql: Implement jsonpath .datetime() method

От
Alexander Korotkov
Дата:
On Thu, Sep 26, 2019 at 12:05 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
>
> On Wed, Sep 25, 2019 at 10:52 PM Alexander Korotkov
> <akorotkov@postgresql.org> wrote:
> > Implement jsonpath .datetime() method
>
> I've noticed buildfarm is unhappy with my commits.
>
> 1. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing&dt=2019-09-25%2020%3A40%3A11
> 2. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary&dt=2019-09-25%2020%3A38%3A21

Appears to be another issue happening when 
USE_FLOAT8_BYVAL is undefined.  Managed to reproduce it locally.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: pgsql: Implement jsonpath .datetime() method

От
Tom Lane
Дата:
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> I've noticed buildfarm is unhappy with my commits.

The proximate problem seems to be that compareItems() is insufficiently
careful to ensure that both values are non-null before passing them
off to datatype-specific code.  The code accidentally fails to crash
on 64-bit machines, but it's still giving garbage answers, I think.

            regards, tom lane



Re: pgsql: Implement jsonpath .datetime() method

От
Alexander Korotkov
Дата:
On Thu, Sep 26, 2019 at 2:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> > I've noticed buildfarm is unhappy with my commits.
>
> The proximate problem seems to be that compareItems() is insufficiently
> careful to ensure that both values are non-null before passing them
> off to datatype-specific code.  The code accidentally fails to crash
> on 64-bit machines, but it's still giving garbage answers, I think.

I've found compareItems() code to not apply appropriate cast to/from
Datum.  Fixed in 7881bb14f4.  This makes test pass on my local 32-bit
machine.  I'll keep look on buildfarm.

It seems that caller (executePredicate()) don't NULL values to
compareItems().  It does do only for predicates, which doesn't have
right argument.  But binary predicate with lacking argument should be
detected during syntactic analysis.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: pgsql: Implement jsonpath .datetime() method

От
Tom Lane
Дата:
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> On Thu, Sep 26, 2019 at 2:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The proximate problem seems to be that compareItems() is insufficiently
>> careful to ensure that both values are non-null before passing them
>> off to datatype-specific code.  The code accidentally fails to crash
>> on 64-bit machines, but it's still giving garbage answers, I think.

> I've found compareItems() code to not apply appropriate cast to/from
> Datum.  Fixed in 7881bb14f4.  This makes test pass on my local 32-bit
> machine.  I'll keep look on buildfarm.

Hm.  dromedary seems not to crash either with that fix, but I'm not
sure why not, because when I was running the previous tree by hand,
the stack trace showed pretty clearly that we were getting to
timestamp_cmp with one null and one non-null argument.  So I don't
believe your argument that that's impossible, and even if it is,
I do not think it's sane for compareItems to depend on that ---
especially when one of its code paths *does* check for nulls.

I do not have a very good opinion about the quality of this code
upon my first glance at it.  Just looking at compareDatetime:

* The code is schizophrenic about whether it's allowed to pass a
null have_error pointer or not.  It is not very sensible to have
some code doing
                    if (have_error && *have_error)
                        return 0;
when other code branches will dump core for null have_error.
Given that if this test actually was doing anything, what it
would be doing is failing to detect error conditions, I think
the only sensible design is to insist that have_error mustn't be
null, in which case these checks for null pointer should be removed,
because (a) they waste code and cycles and (b) they mislead the
reader as to what the API of compareDatetime actually is.

* At least some of the code paths will malfunction if the caller
didn't initialize *have_error to false.  If that is an intended API
requirement, it is not acceptable for the function header comment
not to say so.  (For bonus points, it'd be nice if the header
comment agreed with the code as to the name of the variable.)
If this isn't an intended requirement, you need to fix the code,
probably by initializing "*have_error = false;" up at the top.

* It's a bit schizophrenic also that some of the switches
lack default:s (and rely on the if (!cmpfunc) below), while
the outer switch does have its own, completely redundant
default:.  I'd get rid of that default: and instead add
a comment explaining that the !cmpfunc test substitutes for
default branches.

* This is silly:

    if (*have_error)
        return 0;

    *have_error = false;

* Also, given that you have that "if (*have_error)" where you do,
the have_error tests inside the switch are useless redundancy.
You might as well just remove them completely and let the final
test handle falling out if a conversion failed.  Alternatively
you could drop the final test, because as the code stands right
now, it's visibly impossible to get there with *have_error true.

* More generally, it's completely unclear why some error conditions
are thrown as errors and others just result in returning *have_error.
In particular, it seems weird that some unsupported datatype combinations
cause hard errors while others do not.  Maybe that's fine, but if so,
the function header comment is falling down on the job by not explaining
the reasoning.

* OIDs are unsigned, so if you must print them, use %u not %d.

* The errhints don't follow project message style.

* The blank lines before "break"s aren't really per project
style either, IMO.  They certainly aren't doing anything to
improve readability, and they help limit how much code you
can see at once.

            regards, tom lane



Re: pgsql: Implement jsonpath .datetime() method

От
Alexander Korotkov
Дата:
On Thu, Sep 26, 2019 at 2:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> > On Thu, Sep 26, 2019 at 2:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> The proximate problem seems to be that compareItems() is insufficiently
> >> careful to ensure that both values are non-null before passing them
> >> off to datatype-specific code.  The code accidentally fails to crash
> >> on 64-bit machines, but it's still giving garbage answers, I think.
>
> > I've found compareItems() code to not apply appropriate cast to/from
> > Datum.  Fixed in 7881bb14f4.  This makes test pass on my local 32-bit
> > machine.  I'll keep look on buildfarm.
>
> Hm.  dromedary seems not to crash either with that fix, but I'm not
> sure why not, because when I was running the previous tree by hand,
> the stack trace showed pretty clearly that we were getting to
> timestamp_cmp with one null and one non-null argument.  So I don't
> believe your argument that that's impossible, and even if it is,
> I do not think it's sane for compareItems to depend on that ---
> especially when one of its code paths *does* check for nulls.
>
> I do not have a very good opinion about the quality of this code
> upon my first glance at it.  Just looking at compareDatetime:
>
> * The code is schizophrenic about whether it's allowed to pass a
> null have_error pointer or not.  It is not very sensible to have
> some code doing
>                     if (have_error && *have_error)
>                         return 0;
> when other code branches will dump core for null have_error.
> Given that if this test actually was doing anything, what it
> would be doing is failing to detect error conditions, I think
> the only sensible design is to insist that have_error mustn't be
> null, in which case these checks for null pointer should be removed,
> because (a) they waste code and cycles and (b) they mislead the
> reader as to what the API of compareDatetime actually is.
>
> * At least some of the code paths will malfunction if the caller
> didn't initialize *have_error to false.  If that is an intended API
> requirement, it is not acceptable for the function header comment
> not to say so.  (For bonus points, it'd be nice if the header
> comment agreed with the code as to the name of the variable.)
> If this isn't an intended requirement, you need to fix the code,
> probably by initializing "*have_error = false;" up at the top.
>
> * It's a bit schizophrenic also that some of the switches
> lack default:s (and rely on the if (!cmpfunc) below), while
> the outer switch does have its own, completely redundant
> default:.  I'd get rid of that default: and instead add
> a comment explaining that the !cmpfunc test substitutes for
> default branches.
>
> * This is silly:
>
>         if (*have_error)
>                 return 0;
>
>         *have_error = false;
>
> * Also, given that you have that "if (*have_error)" where you do,
> the have_error tests inside the switch are useless redundancy.
> You might as well just remove them completely and let the final
> test handle falling out if a conversion failed.  Alternatively
> you could drop the final test, because as the code stands right
> now, it's visibly impossible to get there with *have_error true.
>
> * More generally, it's completely unclear why some error conditions
> are thrown as errors and others just result in returning *have_error.
> In particular, it seems weird that some unsupported datatype combinations
> cause hard errors while others do not.  Maybe that's fine, but if so,
> the function header comment is falling down on the job by not explaining
> the reasoning.
>
> * OIDs are unsigned, so if you must print them, use %u not %d.
>
> * The errhints don't follow project message style.
>
> * The blank lines before "break"s aren't really per project
> style either, IMO.  They certainly aren't doing anything to
> improve readability, and they help limit how much code you
> can see at once.

Thank you for feedback.  Nikita and me will provide patches to fix
pointed issues during next couple of days.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: pgsql: Implement jsonpath .datetime() method

От
Nikita Glukhov
Дата:
On Thu, Sep 26, 2019 at 2:57 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> writes:
> > On Thu, Sep 26, 2019 at 2:12 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> The proximate problem seems to be that compareItems() is insufficiently
> >> careful to ensure that both values are non-null before passing them
> >> off to datatype-specific code.  The code accidentally fails to crash
> >> on 64-bit machines, but it's still giving garbage answers, I think.
>
> > I've found compareItems() code to not apply appropriate cast to/from
> > Datum.  Fixed in 7881bb14f4.  This makes test pass on my local 32-bit
> > machine.  I'll keep look on buildfarm.
>
> Hm.  dromedary seems not to crash either with that fix, but I'm not
> sure why not, because when I was running the previous tree by hand,
> the stack trace showed pretty clearly that we were getting to
> timestamp_cmp with one null and one non-null argument.  So I don't
> believe your argument that that's impossible, and even if it is,
> I do not think it's sane for compareItems to depend on that ---
> especially when one of its code paths *does* check for nulls.
>
> I do not have a very good opinion about the quality of this code
> upon my first glance at it.  Just looking at compareDatetime:


The patch with compareDatetime() refactoring was posted in the original 
thread in pgsql-hackers [1].


> * The code is schizophrenic about whether it's allowed to pass a
> null have_error pointer or not.  It is not very sensible to have
> some code doing
>                     if (have_error && *have_error)
>                         return 0;
> when other code branches will dump core for null have_error.
> Given that if this test actually was doing anything, what it
> would be doing is failing to detect error conditions, I think
> the only sensible design is to insist that have_error mustn't be
> null, in which case these checks for null pointer should be removed,
> because (a) they waste code and cycles and (b) they mislead the
> reader as to what the API of compareDatetime actually is.
>
> * At least some of the code paths will malfunction if the caller
> didn't initialize *have_error to false.  If that is an intended API
> requirement, it is not acceptable for the function header comment
> not to say so.  (For bonus points, it'd be nice if the header
> comment agreed with the code as to the name of the variable.)
> If this isn't an intended requirement, you need to fix the code,
> probably by initializing "*have_error = false;" up at the top.
>
> * This is silly:
>
>         if (*have_error)
>                 return 0;
>
>         *have_error = false;
>
> * Also, given that you have that "if (*have_error)" where you do,
> the have_error tests inside the switch are useless redundancy.
> You might as well just remove them completely and let the final
> test handle falling out if a conversion failed.  Alternatively
> you could drop the final test, because as the code stands right
> now, it's visibly impossible to get there with *have_error true.

Yes, oddities in have_error handling seem to appear during numerous 
reworks of the patch.  have_error is really non-NULL now, and its
handling was simplified in the patch.


> * It's a bit schizophrenic also that some of the switches
> lack default:s (and rely on the if (!cmpfunc) below), while
> the outer switch does have its own, completely redundant
> default:.  I'd get rid of that default: and instead add
> a comment explaining that the !cmpfunc test substitutes for
> default branches.

Default cases with elog()s were added to the all switches.  Previously, 
the default case in the outer switch was used to report invalid type1, 
and cmpfunc was used to report invalid type2.


> * OIDs are unsigned, so if you must print them, use %u not %d.

Fixed.

> * The errhints don't follow project message style.

Fixed, but I'm not sure about "*_tz()".  Maybe it's worth to pass current 
jsonb_xxx function name to compareDatetime() through JsonPathExecContext?


> * The blank lines before "break"s aren't really per project
> style either, IMO.  They certainly aren't doing anything to
> improve readability, and they help limit how much code you
> can see at once.

Fixed. If I recall it correctly, these lines were added by pgindent.


> * More generally, it's completely unclear why some error conditions
> are thrown as errors and others just result in returning *have_error.
> In particular, it seems weird that some unsupported datatype combinations
> cause hard errors while others do not.  Maybe that's fine, but if so,
> the function header comment is falling down on the job by not explaining
> the reasoning.

All cast errors are caught by jsonpath predicate.  Comparison of the 
uncomparable datetime types (time[tz] to dated types) also returns Unknown.
And only if datatype conversion requires current timezone, which is not 
available in immutable family of jsonb_xxx() functions, hard error is thrown.
This behavior is specific only for our jsonpath implementation.  But I'm 
really not sure if we should throw an error or return Unknown in this case.

Re: pgsql: Implement jsonpath .datetime() method

От
Alexander Korotkov
Дата:
On Fri, Sep 27, 2019 at 6:58 PM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
> On Thu, Sep 26, 2019 at 2:57 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > * More generally, it's completely unclear why some error conditions
> > are thrown as errors and others just result in returning *have_error.
> > In particular, it seems weird that some unsupported datatype combinations
> > cause hard errors while others do not.  Maybe that's fine, but if so,
> > the function header comment is falling down on the job by not explaining
> > the reasoning.
>
> All cast errors are caught by jsonpath predicate.  Comparison of the
> uncomparable datetime types (time[tz] to dated types) also returns Unknown.
> And only if datatype conversion requires current timezone, which is not
> available in immutable family of jsonb_xxx() functions, hard error is thrown.
> This behavior is specific only for our jsonpath implementation.  But I'm
> really not sure if we should throw an error or return Unknown in this case.

I'd like to share my further thoughts about errors.  I think we should
suppress errors defined by standard and which user can expect.  So,
user can expect that wrong date format causes an error, division by
zero causes an error and so on.  And those errors are defined by
standard.

However, we error is caused by limitation of our implementation, then
suppression doesn't look right to me.

For instance.

# select jsonb_path_query('"1000000-01-01"', '$.datetime() >
"2020-01-01 12:00:00".datetime()'::jsonpath);
 jsonb_path_query
------------------
 null
(1 row)

# select '1000000-01-01'::date > '2020-01-01 12:00:00'::timestamp;
ERROR:  date out of range for timestamp

So, jsonpath behaves like 1000000 is not greater than 2020.  This
looks like plain false.  And user can't expect that unless she is
familiar with our particular issues.  Now I got opinion  that such
errors shouldn't be suppressed.  We can't suppress *every* error.  If
trying to do this, we can come to an idea to suppress OOM error and
return garbage then, which is obviously ridiculous.  Opinions?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: pgsql: Implement jsonpath .datetime() method

От
Alexander Korotkov
Дата:
On Fri, Sep 27, 2019 at 6:58 PM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
> On Thu, Sep 26, 2019 at 2:57 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > * More generally, it's completely unclear why some error conditions
> > are thrown as errors and others just result in returning *have_error.
> > In particular, it seems weird that some unsupported datatype combinations
> > cause hard errors while others do not.  Maybe that's fine, but if so,
> > the function header comment is falling down on the job by not explaining
> > the reasoning.
>
> All cast errors are caught by jsonpath predicate.  Comparison of the
> uncomparable datetime types (time[tz] to dated types) also returns Unknown.
> And only if datatype conversion requires current timezone, which is not
> available in immutable family of jsonb_xxx() functions, hard error is thrown.
> This behavior is specific only for our jsonpath implementation.  But I'm
> really not sure if we should throw an error or return Unknown in this case.

I'd like to share my further thoughts about errors.  I think we should
suppress errors defined by standard and which user can expect.  So,
user can expect that wrong date format causes an error, division by
zero causes an error and so on.  And those errors are defined by
standard.

However, we error is caused by limitation of our implementation, then
suppression doesn't look right to me.

For instance.

# select jsonb_path_query('"1000000-01-01"', '$.datetime() >
"2020-01-01 12:00:00".datetime()'::jsonpath);
 jsonb_path_query
------------------
 null
(1 row)

# select '1000000-01-01'::date > '2020-01-01 12:00:00'::timestamp;
ERROR:  date out of range for timestamp

So, jsonpath behaves like 1000000 is not greater than 2020.  This
looks like plain false.  And user can't expect that unless she is
familiar with our particular issues.  Now I got opinion  that such
errors shouldn't be suppressed.  We can't suppress *every* error.  If
trying to do this, we can come to an idea to suppress OOM error and
return garbage then, which is obviously ridiculous.  Opinions?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: pgsql: Implement jsonpath .datetime() method

От
Robert Haas
Дата:
On Sun, Sep 29, 2019 at 10:30 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> So, jsonpath behaves like 1000000 is not greater than 2020.  This
> looks like plain false.  And user can't expect that unless she is
> familiar with our particular issues.  Now I got opinion  that such
> errors shouldn't be suppressed.  We can't suppress *every* error.  If
> trying to do this, we can come to an idea to suppress OOM error and
> return garbage then, which is obviously ridiculous.  Opinions?

I don't know enough about jsonpath to have a view on specifically
which errors ought to be suppressed, but I agree that it's probably
not all of them. In fact, I'd go so far as to say that thinking about
it in terms of error suppression is probably not the right approach in
the first place. Rather, you want to ask what behavior you're trying
to create.

For example, if I'm trying to write a function that takes a string as
input and returns JSON, where the result is formatted as a number if
possible or a string otherwise, I might want access at the C level to
the guts of numeric_in, with all parsing errors returned rather than
thrown. But it would be silly to suppress an out-of-memory condition,
because that doesn't help the caller. The caller wants to know whether
the thing can be parsed as a number or not, and that has nothing to do
with whether we're out of memory, so an out-of-memory error should
still be thrown.

In this case here, it seems to me that you should similarly start by
defining the behavior you're trying to create. Unless that's clearly
defined, deciding which errors to suppress may be difficult.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgsql: Implement jsonpath .datetime() method

От
Robert Haas
Дата:
On Sun, Sep 29, 2019 at 10:30 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> So, jsonpath behaves like 1000000 is not greater than 2020.  This
> looks like plain false.  And user can't expect that unless she is
> familiar with our particular issues.  Now I got opinion  that such
> errors shouldn't be suppressed.  We can't suppress *every* error.  If
> trying to do this, we can come to an idea to suppress OOM error and
> return garbage then, which is obviously ridiculous.  Opinions?

I don't know enough about jsonpath to have a view on specifically
which errors ought to be suppressed, but I agree that it's probably
not all of them. In fact, I'd go so far as to say that thinking about
it in terms of error suppression is probably not the right approach in
the first place. Rather, you want to ask what behavior you're trying
to create.

For example, if I'm trying to write a function that takes a string as
input and returns JSON, where the result is formatted as a number if
possible or a string otherwise, I might want access at the C level to
the guts of numeric_in, with all parsing errors returned rather than
thrown. But it would be silly to suppress an out-of-memory condition,
because that doesn't help the caller. The caller wants to know whether
the thing can be parsed as a number or not, and that has nothing to do
with whether we're out of memory, so an out-of-memory error should
still be thrown.

In this case here, it seems to me that you should similarly start by
defining the behavior you're trying to create. Unless that's clearly
defined, deciding which errors to suppress may be difficult.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgsql: Implement jsonpath .datetime() method

От
Alexander Korotkov
Дата:
On Mon, Sep 30, 2019 at 10:56 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Sep 29, 2019 at 10:30 AM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > So, jsonpath behaves like 1000000 is not greater than 2020.  This
> > looks like plain false.  And user can't expect that unless she is
> > familiar with our particular issues.  Now I got opinion  that such
> > errors shouldn't be suppressed.  We can't suppress *every* error.  If
> > trying to do this, we can come to an idea to suppress OOM error and
> > return garbage then, which is obviously ridiculous.  Opinions?
>
> I don't know enough about jsonpath to have a view on specifically
> which errors ought to be suppressed, but I agree that it's probably
> not all of them. In fact, I'd go so far as to say that thinking about
> it in terms of error suppression is probably not the right approach in
> the first place. Rather, you want to ask what behavior you're trying
> to create.
>
> For example, if I'm trying to write a function that takes a string as
> input and returns JSON, where the result is formatted as a number if
> possible or a string otherwise, I might want access at the C level to
> the guts of numeric_in, with all parsing errors returned rather than
> thrown. But it would be silly to suppress an out-of-memory condition,
> because that doesn't help the caller. The caller wants to know whether
> the thing can be parsed as a number or not, and that has nothing to do
> with whether we're out of memory, so an out-of-memory error should
> still be thrown.
>
> In this case here, it seems to me that you should similarly start by
> defining the behavior you're trying to create. Unless that's clearly
> defined, deciding which errors to suppress may be difficult.

Making C functions return errors rather than throw is what we're
implementing in our patchsets.  In big picture the behavior we're
trying to create is SQL Standard 2016.  It defines error handling as
following.

> The SQL operators JSON_VALUE, JSON_QUERY, JSON_TABLE, and JSON_EXISTS provide
> the following mechanisms to handle these errors:
> 1) The SQL/JSON path language traps any errors that occur during the evaluation
> of a <JSON filter expression>. Depending on the precise <JSON path predicate>
> contained in the <JSON filter expression>, the result may be Unknown, True, or
> False, depending on the outcome of non-error tests evaluated in the <JSON path
> predicate>.
> 2) The SQL/JSON path language has two modes, strict and lax, which govern
> structural errors, as follows:
>   a) In lax mode:
>     i) If an operation requires an SQL/JSON array but the operand is not an SQL
>     JSON array, then the operand is first “wrapped” in an SQL/JSON array prior
>     to performing the operation.
>     ii) If an operation requires something other than an SQL/JSON array, but
>     the operand is an SQL/JSON array, then the operand is “unwrapped” by
>     converting its elements into an SQL/JSON sequence prior to performing the
>     operation.
>     iii) After applying the preceding resolutions to structural errors, if
>     there is still a structural error, the result is an empty SQL/JSON
>     sequence.
>   b) In strict mode, if the structural error occurs within a <JSON filter
>   expression>, then the error handling of <JSON filter expression> applies
>    Otherwise, a structural error is an unhandled error.
> 3) Non-structural errors outside of a <JSON path predicate> are always
> unhandled errors, resulting in an exception condition returned from the path
> engine to the SQL/JSON query operator.
> 4) The SQL/JSON query operators provide an ON ERROR clause to specify the
> behavior in case of an input conversion error, an unhandled structural error,
> an unhandled non-structural error, or an output conversion error.

So, basically standard requires us to suppress any error happening in
filter expression.  But as I wrote before suppression of errors in
datetime comparison may lead to surprising results.  That happens in
rare corner cases, but still.  This makes uneasy choice between
consistent behavior and standard behavior.

However, Nikita Glukhov gave to good idea about that.  Instead on
thinking about whether we should suppress or not cast errors in
datetime comparison, we may just eliminate those error.  So, if we
know that casting date to timestamp overflows upper bound of finite
timestamp, then we also know that this date is greater than any finite
timestamp.  So, we still able to do correct comparison.  I'm going to
implement this and post a patch.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


On Mon, Sep 30, 2019 at 10:56 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sun, Sep 29, 2019 at 10:30 AM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > So, jsonpath behaves like 1000000 is not greater than 2020.  This
> > looks like plain false.  And user can't expect that unless she is
> > familiar with our particular issues.  Now I got opinion  that such
> > errors shouldn't be suppressed.  We can't suppress *every* error.  If
> > trying to do this, we can come to an idea to suppress OOM error and
> > return garbage then, which is obviously ridiculous.  Opinions?
>
> I don't know enough about jsonpath to have a view on specifically
> which errors ought to be suppressed, but I agree that it's probably
> not all of them. In fact, I'd go so far as to say that thinking about
> it in terms of error suppression is probably not the right approach in
> the first place. Rather, you want to ask what behavior you're trying
> to create.
>
> For example, if I'm trying to write a function that takes a string as
> input and returns JSON, where the result is formatted as a number if
> possible or a string otherwise, I might want access at the C level to
> the guts of numeric_in, with all parsing errors returned rather than
> thrown. But it would be silly to suppress an out-of-memory condition,
> because that doesn't help the caller. The caller wants to know whether
> the thing can be parsed as a number or not, and that has nothing to do
> with whether we're out of memory, so an out-of-memory error should
> still be thrown.
>
> In this case here, it seems to me that you should similarly start by
> defining the behavior you're trying to create. Unless that's clearly
> defined, deciding which errors to suppress may be difficult.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company



Re: pgsql: Implement jsonpath .datetime() method

От
Alexander Korotkov
Дата:
On Mon, Sep 30, 2019 at 10:56 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Sep 29, 2019 at 10:30 AM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > So, jsonpath behaves like 1000000 is not greater than 2020.  This
> > looks like plain false.  And user can't expect that unless she is
> > familiar with our particular issues.  Now I got opinion  that such
> > errors shouldn't be suppressed.  We can't suppress *every* error.  If
> > trying to do this, we can come to an idea to suppress OOM error and
> > return garbage then, which is obviously ridiculous.  Opinions?
>
> I don't know enough about jsonpath to have a view on specifically
> which errors ought to be suppressed, but I agree that it's probably
> not all of them. In fact, I'd go so far as to say that thinking about
> it in terms of error suppression is probably not the right approach in
> the first place. Rather, you want to ask what behavior you're trying
> to create.
>
> For example, if I'm trying to write a function that takes a string as
> input and returns JSON, where the result is formatted as a number if
> possible or a string otherwise, I might want access at the C level to
> the guts of numeric_in, with all parsing errors returned rather than
> thrown. But it would be silly to suppress an out-of-memory condition,
> because that doesn't help the caller. The caller wants to know whether
> the thing can be parsed as a number or not, and that has nothing to do
> with whether we're out of memory, so an out-of-memory error should
> still be thrown.
>
> In this case here, it seems to me that you should similarly start by
> defining the behavior you're trying to create. Unless that's clearly
> defined, deciding which errors to suppress may be difficult.

Making C functions return errors rather than throw is what we're
implementing in our patchsets.  In big picture the behavior we're
trying to create is SQL Standard 2016.  It defines error handling as
following.

> The SQL operators JSON_VALUE, JSON_QUERY, JSON_TABLE, and JSON_EXISTS provide
> the following mechanisms to handle these errors:
> 1) The SQL/JSON path language traps any errors that occur during the evaluation
> of a <JSON filter expression>. Depending on the precise <JSON path predicate>
> contained in the <JSON filter expression>, the result may be Unknown, True, or
> False, depending on the outcome of non-error tests evaluated in the <JSON path
> predicate>.
> 2) The SQL/JSON path language has two modes, strict and lax, which govern
> structural errors, as follows:
>   a) In lax mode:
>     i) If an operation requires an SQL/JSON array but the operand is not an SQL
>     JSON array, then the operand is first “wrapped” in an SQL/JSON array prior
>     to performing the operation.
>     ii) If an operation requires something other than an SQL/JSON array, but
>     the operand is an SQL/JSON array, then the operand is “unwrapped” by
>     converting its elements into an SQL/JSON sequence prior to performing the
>     operation.
>     iii) After applying the preceding resolutions to structural errors, if
>     there is still a structural error, the result is an empty SQL/JSON
>     sequence.
>   b) In strict mode, if the structural error occurs within a <JSON filter
>   expression>, then the error handling of <JSON filter expression> applies
>    Otherwise, a structural error is an unhandled error.
> 3) Non-structural errors outside of a <JSON path predicate> are always
> unhandled errors, resulting in an exception condition returned from the path
> engine to the SQL/JSON query operator.
> 4) The SQL/JSON query operators provide an ON ERROR clause to specify the
> behavior in case of an input conversion error, an unhandled structural error,
> an unhandled non-structural error, or an output conversion error.

So, basically standard requires us to suppress any error happening in
filter expression.  But as I wrote before suppression of errors in
datetime comparison may lead to surprising results.  That happens in
rare corner cases, but still.  This makes uneasy choice between
consistent behavior and standard behavior.

However, Nikita Glukhov gave to good idea about that.  Instead on
thinking about whether we should suppress or not cast errors in
datetime comparison, we may just eliminate those error.  So, if we
know that casting date to timestamp overflows upper bound of finite
timestamp, then we also know that this date is greater than any finite
timestamp.  So, we still able to do correct comparison.  I'm going to
implement this and post a patch.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


On Mon, Sep 30, 2019 at 10:56 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sun, Sep 29, 2019 at 10:30 AM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > So, jsonpath behaves like 1000000 is not greater than 2020.  This
> > looks like plain false.  And user can't expect that unless she is
> > familiar with our particular issues.  Now I got opinion  that such
> > errors shouldn't be suppressed.  We can't suppress *every* error.  If
> > trying to do this, we can come to an idea to suppress OOM error and
> > return garbage then, which is obviously ridiculous.  Opinions?
>
> I don't know enough about jsonpath to have a view on specifically
> which errors ought to be suppressed, but I agree that it's probably
> not all of them. In fact, I'd go so far as to say that thinking about
> it in terms of error suppression is probably not the right approach in
> the first place. Rather, you want to ask what behavior you're trying
> to create.
>
> For example, if I'm trying to write a function that takes a string as
> input and returns JSON, where the result is formatted as a number if
> possible or a string otherwise, I might want access at the C level to
> the guts of numeric_in, with all parsing errors returned rather than
> thrown. But it would be silly to suppress an out-of-memory condition,
> because that doesn't help the caller. The caller wants to know whether
> the thing can be parsed as a number or not, and that has nothing to do
> with whether we're out of memory, so an out-of-memory error should
> still be thrown.
>
> In this case here, it seems to me that you should similarly start by
> defining the behavior you're trying to create. Unless that's clearly
> defined, deciding which errors to suppress may be difficult.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company



Re: pgsql: Implement jsonpath .datetime() method

От
Robert Haas
Дата:
On Tue, Oct 1, 2019 at 1:41 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> So, basically standard requires us to suppress any error happening in
> filter expression.

Sounds like the standard is dumb, then. :-)

> But as I wrote before suppression of errors in
> datetime comparison may lead to surprising results.  That happens in
> rare corner cases, but still.  This makes uneasy choice between
> consistent behavior and standard behavior.

Yeah.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgsql: Implement jsonpath .datetime() method

От
Robert Haas
Дата:
On Tue, Oct 1, 2019 at 1:41 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> So, basically standard requires us to suppress any error happening in
> filter expression.

Sounds like the standard is dumb, then. :-)

> But as I wrote before suppression of errors in
> datetime comparison may lead to surprising results.  That happens in
> rare corner cases, but still.  This makes uneasy choice between
> consistent behavior and standard behavior.

Yeah.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgsql: Implement jsonpath .datetime() method

От
Alexander Korotkov
Дата:
On Thu, Oct 3, 2019 at 4:48 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Oct 1, 2019 at 1:41 PM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > So, basically standard requires us to suppress any error happening in
> > filter expression.
>
> Sounds like the standard is dumb, then. :-)
>
> > But as I wrote before suppression of errors in
> > datetime comparison may lead to surprising results.  That happens in
> > rare corner cases, but still.  This makes uneasy choice between
> > consistent behavior and standard behavior.
>
> Yeah.

Proposed patch eliminates this dilemma in particular case.  It
provides correct cross-type comparison of datetime values even if one
of values overflows during cast.  In order to do this, I made cast
functions to report whether lower or upper boundary is overflowed.  We
know that overflowed value is lower (or upper) than any valid value
except infinity.

This patch also changes the way timestamp to timestamptz cast works.
Previously it did timestamp2tm() then tm2timestamp().  Instead, after
timestamp2tm() it calculates timezone offset and applies it to
original timestamp value.  I hope this is correct.  If so, besides
making overflow handling easier, this refactoring saves some CPU
cycles.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Вложения

Re: pgsql: Implement jsonpath .datetime() method

От
Alexander Korotkov
Дата:
On Thu, Oct 3, 2019 at 4:48 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Oct 1, 2019 at 1:41 PM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > So, basically standard requires us to suppress any error happening in
> > filter expression.
>
> Sounds like the standard is dumb, then. :-)
>
> > But as I wrote before suppression of errors in
> > datetime comparison may lead to surprising results.  That happens in
> > rare corner cases, but still.  This makes uneasy choice between
> > consistent behavior and standard behavior.
>
> Yeah.

Proposed patch eliminates this dilemma in particular case.  It
provides correct cross-type comparison of datetime values even if one
of values overflows during cast.  In order to do this, I made cast
functions to report whether lower or upper boundary is overflowed.  We
know that overflowed value is lower (or upper) than any valid value
except infinity.

This patch also changes the way timestamp to timestamptz cast works.
Previously it did timestamp2tm() then tm2timestamp().  Instead, after
timestamp2tm() it calculates timezone offset and applies it to
original timestamp value.  I hope this is correct.  If so, besides
making overflow handling easier, this refactoring saves some CPU
cycles.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: pgsql: Implement jsonpath .datetime() method

От
Tom Lane
Дата:
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> This patch also changes the way timestamp to timestamptz cast works.
> Previously it did timestamp2tm() then tm2timestamp().  Instead, after
> timestamp2tm() it calculates timezone offset and applies it to
> original timestamp value.  I hope this is correct.

I'd wonder whether this gives the same answers near DST transitions,
where it's not real clear which offset applies.

Please *don't* wrap this sort of thing into an unrelated feature patch.

            regards, tom lane



Re: pgsql: Implement jsonpath .datetime() method

От
Tom Lane
Дата:
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> This patch also changes the way timestamp to timestamptz cast works.
> Previously it did timestamp2tm() then tm2timestamp().  Instead, after
> timestamp2tm() it calculates timezone offset and applies it to
> original timestamp value.  I hope this is correct.

I'd wonder whether this gives the same answers near DST transitions,
where it's not real clear which offset applies.

Please *don't* wrap this sort of thing into an unrelated feature patch.

            regards, tom lane



Re: pgsql: Implement jsonpath .datetime() method

От
Alexander Korotkov
Дата:
On Sun, Oct 13, 2019 at 5:24 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> > This patch also changes the way timestamp to timestamptz cast works.
> > Previously it did timestamp2tm() then tm2timestamp().  Instead, after
> > timestamp2tm() it calculates timezone offset and applies it to
> > original timestamp value.  I hope this is correct.
>
> I'd wonder whether this gives the same answers near DST transitions,
> where it's not real clear which offset applies.

I will try this and share the results.

> Please *don't* wrap this sort of thing into an unrelated feature patch.

Sure, thank you for noticing.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: pgsql: Implement jsonpath .datetime() method

От
Alexander Korotkov
Дата:
On Sun, Oct 13, 2019 at 5:24 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> > This patch also changes the way timestamp to timestamptz cast works.
> > Previously it did timestamp2tm() then tm2timestamp().  Instead, after
> > timestamp2tm() it calculates timezone offset and applies it to
> > original timestamp value.  I hope this is correct.
>
> I'd wonder whether this gives the same answers near DST transitions,
> where it's not real clear which offset applies.

I will try this and share the results.

> Please *don't* wrap this sort of thing into an unrelated feature patch.

Sure, thank you for noticing.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: pgsql: Implement jsonpath .datetime() method

От
Alexander Korotkov
Дата:
On Mon, Oct 14, 2019 at 5:36 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Sun, Oct 13, 2019 at 5:24 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> > > This patch also changes the way timestamp to timestamptz cast works.
> > > Previously it did timestamp2tm() then tm2timestamp().  Instead, after
> > > timestamp2tm() it calculates timezone offset and applies it to
> > > original timestamp value.  I hope this is correct.
> >
> > I'd wonder whether this gives the same answers near DST transitions,
> > where it's not real clear which offset applies.
>
> I will try this and share the results.

I've separated refactoring of timestamp to timestamptz cast into a
separate patch.  Patchset is attached.

I've investigates the behavior near DST transitions in Moscow
timezone.  Last two DST transitions it had in 2010-03-28 and
2010-10-31.  It behaves the same with and without patch.  The tests
are below.

# set timezone = 'Europe/Moscow';

# select '2010-03-28 01:59:59'::timestamp::timestamptz;
      timestamptz
------------------------
 2010-03-28 01:59:59+03
(1 row)

# select '2010-03-28 02:00:00'::timestamp::timestamptz;
      timestamptz
------------------------
 2010-03-28 03:00:00+04
(1 row)

# select '2010-03-28 02:59:59'::timestamp::timestamptz;
      timestamptz
------------------------
 2010-03-28 03:59:59+04
(1 row)

# select '2010-03-28 03:00:00'::timestamp::timestamptz;
      timestamptz
------------------------
 2010-03-28 03:00:00+04
(1 row)

# select '2010-10-31 01:59:59'::timestamp::timestamptz;
      timestamptz
------------------------
 2010-10-31 01:59:59+04
(1 row)

# select '2010-10-31 02:00:00'::timestamp::timestamptz;
      timestamptz
------------------------
 2010-10-31 02:00:00+03
(1 row)

BTW, I've noticed how ridiculous cast behaves for values in the range
of [2010-03-28 02:00:00, 2010-03-28 03:00:00).  Now, I think that
timestamptz type, which explicitly stores timezone offset, has some
point.  At least, it would be possible to save the same local time
value during casts.

I'm going to push these two patches if no objections.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Вложения

Re: pgsql: Implement jsonpath .datetime() method

От
Alexander Korotkov
Дата:
On Mon, Oct 14, 2019 at 5:36 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Sun, Oct 13, 2019 at 5:24 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> > > This patch also changes the way timestamp to timestamptz cast works.
> > > Previously it did timestamp2tm() then tm2timestamp().  Instead, after
> > > timestamp2tm() it calculates timezone offset and applies it to
> > > original timestamp value.  I hope this is correct.
> >
> > I'd wonder whether this gives the same answers near DST transitions,
> > where it's not real clear which offset applies.
>
> I will try this and share the results.

I've separated refactoring of timestamp to timestamptz cast into a
separate patch.  Patchset is attached.

I've investigates the behavior near DST transitions in Moscow
timezone.  Last two DST transitions it had in 2010-03-28 and
2010-10-31.  It behaves the same with and without patch.  The tests
are below.

# set timezone = 'Europe/Moscow';

# select '2010-03-28 01:59:59'::timestamp::timestamptz;
      timestamptz
------------------------
 2010-03-28 01:59:59+03
(1 row)

# select '2010-03-28 02:00:00'::timestamp::timestamptz;
      timestamptz
------------------------
 2010-03-28 03:00:00+04
(1 row)

# select '2010-03-28 02:59:59'::timestamp::timestamptz;
      timestamptz
------------------------
 2010-03-28 03:59:59+04
(1 row)

# select '2010-03-28 03:00:00'::timestamp::timestamptz;
      timestamptz
------------------------
 2010-03-28 03:00:00+04
(1 row)

# select '2010-10-31 01:59:59'::timestamp::timestamptz;
      timestamptz
------------------------
 2010-10-31 01:59:59+04
(1 row)

# select '2010-10-31 02:00:00'::timestamp::timestamptz;
      timestamptz
------------------------
 2010-10-31 02:00:00+03
(1 row)

BTW, I've noticed how ridiculous cast behaves for values in the range
of [2010-03-28 02:00:00, 2010-03-28 03:00:00).  Now, I think that
timestamptz type, which explicitly stores timezone offset, has some
point.  At least, it would be possible to save the same local time
value during casts.

I'm going to push these two patches if no objections.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company