On Fri, Mar 22, 2019 at 5:38 AM John Naylor <john.naylor@2ndquadrant.com> wrote:
> On Thu, Mar 21, 2019 at 9:59 PM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > Attaches patches improving jsonpath parser. First one introduces
> > cosmetic changes, while second gets rid of backtracking. I'm also
> > planning to add high-level comment for both grammar and lexer.
>
> The cosmetic changes look good to me. I just noticed a couple things
> about the comments.
>
> 0001:
>
> +/* Check if current scanstring value constitutes a keyword */
>
> 'is a keyword' is better. 'Constitutes' implies parts of a whole.
>
> + * Resize scanstring for appending of given length. Reinitilize if required.
>
> s/Reinitilize/Reinitialize/
>
> The first sentence is not entirely clear to me.
Thank you, fixed.
> 0002:
>
> These two rules are not strictly necessary:
>
> <xnq,xq,xvq,xsq>{unicode}+\\ {
> /* throw back the \\, and treat as unicode */
> yyless(yyleng - 1);
> parseUnicode(yytext, yyleng);
> }
>
> <xnq,xq,xvq,xsq>{hex_char}+\\ {
> /* throw back the \\, and treat as hex */
> yyless(yyleng - 1);
> parseHexChars(yytext, yyleng);
> }
>
> ...and only seem to be there because of how these are written:
>
> <xnq,xq,xvq,xsq>{unicode}+ { parseUnicode(yytext, yyleng); }
> <xnq,xq,xvq,xsq>{hex_char}+ { parseHexChars(yytext, yyleng); }
> <xnq,xq,xvq,xsq>{unicode}*{unicodefail} { yyerror(NULL, "Unicode
> sequence is invalid"); }
> <xnq,xq,xvq,xsq>{hex_char}*{hex_fail} { yyerror(NULL, "Hex character
> sequence is invalid"); }
>
> I don't understand the reasoning here -- is it a micro-optimization?
> The following is simpler, allow the rules I mentioned to be removed,
> and make check still passes. I would prefer it unless there is a
> performance penalty, in which case a comment to describe the
> additional complexity would be helpful.
>
> <xnq,xq,xvq,xsq>{unicode} { parseUnicode(yytext, yyleng); }
> <xnq,xq,xvq,xsq>{hex_char} { parseHexChars(yytext, yyleng); }
> <xnq,xq,xvq,xsq>{unicodefail} { yyerror(NULL, "Unicode sequence is invalid"); }
> <xnq,xq,xvq,xsq>{hex_fail} { yyerror(NULL, "Hex character sequence is
> invalid"); }
These rules are needed for unicode. Sequential escaped unicode
characters might be connected by hi surrogate value. See
jsonpath_encoding regression test in attached patch.
Regarding hex, I made it so for the sake of uniformity. But I changed
my mind and decided that simpler flex rules are more important. So,
now they are considered one-by-one.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company