On Thu, Mar 21, 2019 at 9:59 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> Attaches patches improving jsonpath parser. First one introduces
> cosmetic changes, while second gets rid of backtracking. I'm also
> planning to add high-level comment for both grammar and lexer.
The cosmetic changes look good to me. I just noticed a couple things
about the comments.
0001:
+/* Check if current scanstring value constitutes a keyword */
'is a keyword' is better. 'Constitutes' implies parts of a whole.
+ * Resize scanstring for appending of given length. Reinitilize if required.
s/Reinitilize/Reinitialize/
The first sentence is not entirely clear to me.
0002:
These two rules are not strictly necessary:
<xnq,xq,xvq,xsq>{unicode}+\\ {
/* throw back the \\, and treat as unicode */
yyless(yyleng - 1);
parseUnicode(yytext, yyleng);
}
<xnq,xq,xvq,xsq>{hex_char}+\\ {
/* throw back the \\, and treat as hex */
yyless(yyleng - 1);
parseHexChars(yytext, yyleng);
}
...and only seem to be there because of how these are written:
<xnq,xq,xvq,xsq>{unicode}+ { parseUnicode(yytext, yyleng); }
<xnq,xq,xvq,xsq>{hex_char}+ { parseHexChars(yytext, yyleng); }
<xnq,xq,xvq,xsq>{unicode}*{unicodefail} { yyerror(NULL, "Unicode
sequence is invalid"); }
<xnq,xq,xvq,xsq>{hex_char}*{hex_fail} { yyerror(NULL, "Hex character
sequence is invalid"); }
I don't understand the reasoning here -- is it a micro-optimization?
The following is simpler, allow the rules I mentioned to be removed,
and make check still passes. I would prefer it unless there is a
performance penalty, in which case a comment to describe the
additional complexity would be helpful.
<xnq,xq,xvq,xsq>{unicode} { parseUnicode(yytext, yyleng); }
<xnq,xq,xvq,xsq>{hex_char} { parseHexChars(yytext, yyleng); }
<xnq,xq,xvq,xsq>{unicodefail} { yyerror(NULL, "Unicode sequence is invalid"); }
<xnq,xq,xvq,xsq>{hex_fail} { yyerror(NULL, "Hex character sequence is
invalid"); }
--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services