Re: BUG #15273: Lexer bug with UESCAPE
| От | Andrew Gierth | 
|---|---|
| Тема | Re: BUG #15273: Lexer bug with UESCAPE | 
| Дата | |
| Msg-id | 87bmbekq90.fsf@news-spur.riddles.org.uk обсуждение исходный текст | 
| Ответ на | Re: BUG #15273: Lexer bug with UESCAPE (Tom Lane <tgl@sss.pgh.pa.us>) | 
| Ответы | Re: BUG #15273: Lexer bug with UESCAPE | 
| Список | pgsql-bugs | 
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
 Tom> Also, I'm going to push back on the claim that allowing comments
 Tom> there is required by the SQL spec. The relevant rules in SQL:2011
 Tom> are
 Tom> <Unicode character string literal> ::=
 Tom>   [ <introducer> <character set specification> ]
 Tom>       U <ampersand> <quote> [ <Unicode representation>... ] <quote>
 Tom>       [ { <separator> <quote> [ <Unicode representation>... ] <quote> }... ]
 Tom>       <Unicode escape specifier>
 Tom> <Unicode escape specifier> ::=
 Tom>   [ UESCAPE <quote> <Unicode escape character> <quote> ]
 Tom> I do not see any principled way of arguing that these rules
 Tom> require comments to be allowed adjacent to UESCAPE without also
 Tom> claiming that they must be allowed between, say, the initial 'U'
 Tom> and the ampersand.
These are the rules that (as far as I can see) apply to that case:
5.2 <token> and <separator>
<separator> ::=
  { <comment> | <white space> }...
  7) Any <token> may be followed by a <separator>.
5.3 <literal>
  11) In a <Unicode character string literal>, there shall be no
      <separator> between the "U" and the <ampersand> nor between the
      <ampersand> and the <quote>.
 Tom> The only place these rules allow a <separator> is between segments
 Tom> of a multiline literal. It looks to me like an extension that we
 Tom> even allow whitespace around UESCAPE.
I think that that use of <separator> is only to indicate that a
<separator> there is _required_, rather than optional as it usually is
after tokens, and that the special rule about requiring newlines also
applies only to that specific use of <separator>.
If the whole <Unicode character string literal> is regarded as being a
single token, and therefore rule 5.2.7 above didn't apply around the
UESCAPE, then there would be no reason to write rule 5.3.11 forbidding
separators within the U&' part.
(In the case of X'...', there's rule 5.2.5, which as I see it would
prevent a space after the X, but that rule explicitly does not apply to
the U& cases.)
As a related issue, we don't allow comments within the <separator> that
splits a multiline literal, even though the spec certainly allows those
(arguably, since the spec defines that comments are equivalent to
newlines, "select 'foo' /**/ 'bar';" should be legal too).
I've put up a summary of all these at
https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL_Standard#Lexing_of_string_literals_and_comments
(under the assumption that the whole issue is filed under WONTFIX at
least for the time being)
-- 
Andrew (irc:RhodiumToad)
		
	В списке pgsql-bugs по дате отправления: