Re: Define jsonpath functions as stable
От | Tom Lane |
---|---|
Тема | Re: Define jsonpath functions as stable |
Дата | |
Msg-id | 31931.1568668225@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Define jsonpath functions as stable ("Jonathan S. Katz" <jkatz@postgresql.org>) |
Ответы |
Re: Define jsonpath functions as stable
("Jonathan S. Katz" <jkatz@postgresql.org>)
Re: Define jsonpath functions as stable (Chapman Flack <chap@anastigmatix.net>) |
Список | pgsql-hackers |
"Jonathan S. Katz" <jkatz@postgresql.org> writes: > On 9/16/19 11:20 AM, Tom Lane wrote: >> I think we could possibly get away with not having any special marker >> on regexes, but just explaining in the documentation that "features >> so-and-so are not implemented". Writing that text would require closer >> analysis than I've seen in this thread as to exactly what the differences >> are. > +1, and likely would need some example strings too that highlight the > difference in how they are processed. I spent an hour digging through these specs. I was initially troubled by the fact that XML Schema regexps are implicitly anchored, ie must match the whole string; that's a huge difference from POSIX. However, 19075-6 says that jsonpath like_regex works the same as the LIKE_REGEX predicate in SQL; and SQL:2011 "9.18 XQuery regular expression matching" defines LIKE_REGEX to work exactly like XQuery's fn:matches function, except for some weirdness around newline matching; and that spec clearly says that fn:matches treats its pattern argument as NOT anchored. So it looks like we end up in the same place as POSIX for this. Otherwise, the pattern language differences I could find are all details of character class expressions (bracket expressions, such as "[a-z0-9]") and escapes that are character class shorthands: * We don't have "character class subtraction". I'd be pretty hesitant to add that to our regexp language because it seems to change "-" into a metacharacter, which would break an awful lot of regexps. I might be misunderstanding their syntax for it, because elsewhere that spec explicitly claims that "-" is not a metacharacter. * Character class elements can be #xNN (NN being hex digits), which seems equivalent to POSIX \xNN as long as you're using UTF8 encoding. Again, the compatibility costs of allowing that don't seem attractive, since # isn't a metacharacter today. * Character class elements can be \p{UnicodeProperty} or the complement \P{UnicodeProperty}, where there are a bunch of different possible properties. Perhaps we could add that someday; since there's no reason to escape "p" or "P" today, this doesn't seem like it'd be a huge compatibility hit. But I'm content to document this as unimplemented for now. * XQuery adds character class shorthands \i (complement \I) for "initial name characters" and \c (complement \C) for "NameChar". Same as above; maybe add someday, but no hurry. * It looks like XQuery's \w class might allow more characters than our interpretation does, and hence \W allows fewer. But since \w devolves to what libc thinks the "alnum" class is, it's at least possible that some locales might do the same thing XQuery calls for. * Likewise, any other discrepancies between the Unicode-centric character class definitions in XQuery and what our stuff does are well within the boundaries of locale variances. So I don't feel too bad about that. * The SQL-spec newline business mentioned above is a possible exception: it appears to require that when '.' is allowed to match newlines, a single '.' should match a '\r\n' Windows newline. I think we can document that and move on. * The x flag in XQuery is defined as ignoring all whitespace in the pattern except within character class expressions. Spencer's x flag does mostly that, but it thinks that "\ " means a literal space whereas XQuery explicitly says that the space is ignored and the backslash applies to the next non-space character. (That's just weird, in my book.) Also, Spencer's x mode causes # to begin a comment extending to EOL, which is a nice thing XQuery hasn't got, and it says you can't put spaces within multi-character symbols like "(?:", which presumably is allowed with XQuery's "x". I feel a bit uncomfortable with these inconsistencies in x-flag rules. We could probably teach the regexp library to have an alternate expanded mode that matches XQuery's rules, but that's not a project to tackle for v12. I tentatively recommend that we remove the jsonpath "x" flag for the time being. Also, I noted some things that seem to be flat out sloppiness in the XQuery flag conversions: * The newline-matching flags (m and s flags) can be mapped to features of Spencer's library, but jsonpath_gram.y does so incorrectly. * XQuery says that the q flag overrides m, s, and x flags, which is exactly the opposite of what our code does; besides which the code is flag-order-sensitive which is just wrong. These last two are simple to fix and we should just go do it. Otherwise, I think we're okay with regarding Spencer's library as being a sufficiently close approximation to LIKE_REGEX. We need some documentation work though. regards, tom lane
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Tomas VondraДата:
Сообщение: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Следующее
От: Peter GeogheganДата:
Сообщение: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.