Обсуждение: Lexical Structure - String Constants

Поиск
Список
Период
Сортировка

Lexical Structure - String Constants

От
Sérgio Saquetim
Дата:
Hi,

I'm trying to build in Java a SQL lexer/parser, compliant with PostgreSQL 9.3, from scratch as a hobby project and reading chapter 4, section 4.1 (http://www.postgresql.org/docs/9.3/interactive/sql-syntax-lexical.html) and I've noticed a few things I thought I should mention:

In section 4.1.2.1, the following text introduces us to SQL's bizarre multiline/multisegment split style: "Two string constants that are only separated by whitespace with at least one newline are concatenated and effectively treated as if the string had been written as one constant."

The text does not mention if comments are allowed between segments, so I've run a few tests on PSQL (PostgreSQL 9.3.4):

                                               version                                                
------------------------------------------------------------------------------------------------------
 PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-16ubuntu6) 4.8.2, 64-bit
(1 row)

postgres=# SELECT 'a'
'b';
 ?column? 
----------
 ab
(1 row)

postgres=# SELECT 'a' --comment
'b';
 ?column? 
----------
 ab
(1 row)

So far everything worked, but I've got different results with C style block comments:

postgres=# SELECT 'a' /*comment*/
'b';
ERROR:  syntax error at or near "'b'"
LINE 2: 'b';

So line style comments (--) are accepted between segments but not C style block comments (/* */). Do you think this difference in behavior should me mentioned in the docs?

I've also noticed that in section 4.1.2.6, the following statement: "At least one digit must follow the exponent marker (e), if one is present."

As I've understood the statement, I think it says that the following instruction should not be valid because the exponent marker is not followed by at least one digit, but the expression is successfully evaluated: 

postgres=# SELECT 10e;
 e  
----
 10
(1 row)

That said, I live in Brazil and English is not my first language so I may be mistaken, but I thought I should bring this to this list.

Regards,

Sérgio Saquetim


Re: Lexical Structure - String Constants

От
Tom Lane
Дата:
=?UTF-8?Q?S=C3=A9rgio_Saquetim?= <sergiosaquetim@gmail.com> writes:
> So line style comments (--) are accepted between segments but not C style
> block comments (/* */). Do you think this difference in behavior should me
> mentioned in the docs?

Hm, interesting.  It looks to me like modern versions of the SQL spec
require either -- or /* ... */ style comments to be allowed between
segments of a quoted literal.  This is pretty bad taste in language
design, if you ask me, but that's what it seems to say.  I think that
our current lexer rules date from before the SQL standard even had
/* ... */ style comments, which is why the lexer isn't taking it.

> I've also noticed that in section 4.1.2.6, the following statement: "At
> least one digit must follow the exponent marker (e), if one is present."

> As I've understood the statement, I think it says that the following
> instruction should not be valid because the exponent marker is not followed
> by at least one digit, but the expression is successfully evaluated:

> postgres=# SELECT 10e;
>  e
> ----
>  10
> (1 row)

"10e" is not a valid number, just like the manual says.  But "10" is a
valid number, and "e" is a valid column alias, so this is equivalent
to "SELECT 10 AS e".  There's no requirement for white space between
adjacent tokens, if the tokens couldn't validly be run together into
one token.

            regards, tom lane


Re: Lexical Structure - String Constants

От
Sérgio Saquetim
Дата:

> "10e" is not a valid number, just like the manual says.  But "10" is a
> valid number, and "e" is a valid column alias, so this is equivalent
> to "SELECT 10 AS e".  There's no requirement for white space between
> adjacent tokens, if the tokens couldn't validly be run together into
> one token.

Thanks Tom, 

I haven't noticed that fact. I'll refactor my lexer to deal with that.

Regards,

Sérgio Saquetim