Re: Define jsonpath functions as stable

Поиск
Список
Период
Сортировка
От Jonathan S. Katz
Тема Re: Define jsonpath functions as stable
Дата
Msg-id 1149945c-8a31-ec24-b454-3e410f8a70b6@postgresql.org
обсуждение исходный текст
Ответ на Re: Define jsonpath functions as stable  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Define jsonpath functions as stable  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 9/16/19 11:20 AM, Tom Lane wrote:
> "Jonathan S. Katz" <jkatz@postgresql.org> writes:
>> It sounds like the easiest path to completion without potentially adding
>> futures headaches pushing back the release too far would be that, e.g.
>> these examples:
>
>>     $.** ? (@ like_regex "O(w|v)" pg flag "i")
>>     $.** ? (@ like_regex "O(w|v)" pg)
>
>> If it's using POSIX regexp, I would +1 using "posix" instead of "pg"
>
> I agree that we'd be better off to say "POSIX".  However, having just
> looked through the references Chapman provided, it seems to me that
> the regex language Henry Spencer's library provides is awful darn
> close to what XPath is asking for.  The main thing I see in the XML/XPath
> specs that we don't have is a bunch of character class escapes that are
> specifically tied to Unicode character properties.  We could possibly
> add code to implement those, but I'm not sure how it'd work in non-UTF8
> database encodings.

Maybe taking a page from the pg_saslprep implementation. For some cases
where the string in question would issue a "reject" under normal
SASLprep[1] considerations (really stringprep[2]), PostgreSQL just lets
the string passthrough to the next step, without alteration.

What's implied here is if the string is UTF-8, it goes through SASLprep,
but if not, it is just passed through.

So perhaps the answer is that if we implement XQuery, the escape for
UTF-8 character properties are only honored if the encoding is set to be
UTF-8, and ignored otherwise. We would have to document that said
escapes only work on UTF-8 encodings.

>  There may also be subtle differences in the behavior
> of character class escapes that we do have in common, such as "\s" for
> white space; but again I'm not sure that those are any different than
> what you get naturally from encoding or locale variations.
>
> I think we could possibly get away with not having any special marker
> on regexes, but just explaining in the documentation that "features
> so-and-so are not implemented".  Writing that text would require closer
> analysis than I've seen in this thread as to exactly what the differences
> are.

+1, and likely would need some example strings too that highlight the
difference in how they are processed.

And again, if we end up updating the behavior in the future, it becomes
a part of our standard deprecation notice at the beginning of the
release notes, though one that could require a lot of explanation.

Jonathan

[1] https://tools.ietf.org/html/rfc4013
[2] https://www.ietf.org/rfc/rfc3454.txt


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: refactoring - share str2*int64 functions
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: block-level incremental backup