Re: [HACKERS] Re: SQL compliance - why -- comments only at psql level?

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] Re: SQL compliance - why -- comments only at psql level?
Дата
Msg-id 5348.951068504@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] Re: SQL compliance - why -- comments only at psql level?  (Hannu Krosing <hannu@tm.ee>)
Ответы Re: [HACKERS] Re: SQL compliance - why -- comments only at psql level?  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-hackers
Hannu Krosing <hannu@tm.ee> writes:
> Could you test with some other frontend (python, perl, tcl, C) ?

Yup, psql is untrustworthy as a means of testing the backend's comment
handling ;-).

I committed lexer changes on Friday evening that I believe fix all of
the backend's problems with \r versus \n.  The issue with unterminated
-- comments, which was Hannu's original complaint, was fixed awhile ago;
but we still had problems with comments terminated with \r instead of
\n, as well as some non-SQL-compliant behavior for -- comments between
the segments of a multiline literal, etc etc.

While fixing this I realized that there are some fundamental
discrepancies between the way the backend recognizes comments and the
way that psql does.  These arise from the fact that the comment
introducer sequences /* and -- are also legal as parts of operator
names, and since the backend is based on lex which uses greedy longest-
available-match rules, you get things like this:

select *-- 123
ERROR:  Can't find left op '*--' for type 23

(Parsing '*--' as an operator name wins over parsing just '*' as an
operator name, so that '--' would be recognized on the next call.)
More subtly,

select /**/- 22
ERROR:  parser: parse error at or near ""

which is the backend's rather lame excuse for an "unterminated comment"
error.  What happens here is that the sequence /**/- is bit off as a
single lexer token, then tested in this order to see if it is(a) a complete "/* ... */" comment (nope),(b) the start of
acomment, "/* anything" (yup), or(c) an operator (which would succeed if it got the chance).
 
There does not seem to be any way to persuade lex to stop at the "*/"
if it has a chance to recognize a longer token by applying the operator
rule.

Both of these problems are easily avoided by inserting some whitespace,
but I wonder whether we ought to try to fix them for real.  One way
that this could be done would be to alter the lexer rules so that
operators are lexed a single character at a time, which'd eliminate
lex's tendency to recognize a long operator name in place of a comment.
Then we'd need a post-pass to recombine adjacent operator characters into
a single token.  (This would forever prevent anyone from using operator
names that include '--' or '/*', but I'm not sure that's a bad thing.)
The post-pass would also be a mighty convenient place to fix the NOT NULL
problem that's giving us trouble in another thread: the post-pass would
need one-token lookahead anyway, so it could very easily convert NOT
followed by NULL into a single special token.

Meanwhile, psql is using some ad-hoc code to recognize comments,
rather than a lexer, and it thinks both of these sequences are indeed
comments.  I also find that it strips out the -- flavor of comment,
but sends the /* */ flavor on through, which is just plain inconsistent.
I suggest we change psql to not strip -- comments either.  The only
reason for psql to be in the comment-recognition business at all is
so that it can determine whether a semicolon is end-of-query or just
a character in a comment.

Another thing I'd like to fix here is to get the backend to produce
a more useful error message than 'parse error at or near ""' when it's
presented with an unterminated comment or unterminated literal.
The flex manual recommends coding like
    <quote><<EOF>>   {             error( "unterminated quote" );             yyterminate();             }

but <<EOF>> is a flex-ism not supported by regular lex.  We already
tell people they have to use flex (though I'm not sure that's *really*
necessary at present); do we want to set that requirement in stone?
Or does anyone know another way to get this effect?
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] Re: SQL compliance
Следующее
От: Chris Bitmead
Дата:
Сообщение: Re: [HACKERS] psql and Control-C