Обсуждение: ECPG gets embedded quotes wrong

Поиск
Список
Период
Сортировка

ECPG gets embedded quotes wrong

От
Tom Lane
Дата:
A recent user complaint [1] led me to investigate what ECPG does with
embedded quotes (that is, quotes-meant-to-be-data) in SQL identifiers
and strings.  AFAICS, it gets it wrong.  For example, if you write
the literal 'abc''def' in an EXEC SQL command, that will come out the
other end as 'abc'def', triggering a syntax error in the backend.
Likewise, "abc""def" is reduced to "abc"def" which is wrong syntax.

It looks to me like a sufficient fix is just to keep these quote
sequences as-is within a converted string, so that the attached
appears to fix it.  I added some documentation too, since there
doesn't seem to be anything there now explaining how it's supposed
to work.

I doubt this is safely back-patchable, since anybody who's working
around the existing misbehavior (as I see sql/dyntest.pgc is doing)
would not appreciate it changing under them in a minor release.
But I think we can fix it in v14.

            regards, tom lane

[1]
https://www.postgresql.org/message-id/flat/CA%2B4qtLct1L%3DgUordX4c_AdctJ%2BvZBsebYYLBk18LX8dLHthktg%40mail.gmail.com

diff --git a/doc/src/sgml/ecpg.sgml b/doc/src/sgml/ecpg.sgml
index 6e3ca788f6..aa1499bcea 100644
--- a/doc/src/sgml/ecpg.sgml
+++ b/doc/src/sgml/ecpg.sgml
@@ -63,11 +63,22 @@ EXEC SQL ...;
 </programlisting>
    These statements syntactically take the place of a C statement.
    Depending on the particular statement, they can appear at the
-   global level or within a function.  Embedded
+   global level or within a function.
+  </para>
+
+  <para>
+   Embedded
    <acronym>SQL</acronym> statements follow the case-sensitivity rules of
    normal <acronym>SQL</acronym> code, and not those of C. Also they allow nested
    C-style comments that are part of the SQL standard. The C part of the
    program, however, follows the C standard of not accepting nested comments.
+   Embedded <acronym>SQL</acronym> statements likewise use SQL rules, not
+   C rules, for parsing quoted strings and identifiers.
+   (See <xref linkend="sql-syntax-strings"/> and
+   <xref linkend="sql-syntax-identifiers"/> respectively.  Note that
+   ECPG assumes that <varname>standard_conforming_strings</varname>
+   is <literal>on</literal>.)
+   Of course, the C part of the program follows C quoting rules.
   </para>

   <para>
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index 466bbac6a7..e98aa6c486 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -623,11 +623,8 @@ cppline            {space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
                     }
                 }

-<xq,xe,xn,xus>{xqdouble}    { addlitchar('\''); }
-<xqc>{xqcquote}    {
-                    addlitchar('\\');
-                    addlitchar('\'');
-                }
+<xq,xe,xn,xus>{xqdouble}    { addlit(yytext, yyleng); }
+<xqc>{xqcquote}                { addlit(yytext, yyleng); }
 <xq,xqc,xn,xus>{xqinside}    { addlit(yytext, yyleng); }
 <xe>{xeinside}  {
                     addlit(yytext, yyleng);
@@ -736,7 +733,7 @@ cppline            {space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
                     return UIDENT;
                 }
 <xd,xui>{xddouble}    {
-                    addlitchar('"');
+                    addlit(yytext, yyleng);
                 }
 <xd,xui>{xdinside}    {
                     addlit(yytext, yyleng);
diff --git a/src/interfaces/ecpg/test/expected/preproc-strings.c b/src/interfaces/ecpg/test/expected/preproc-strings.c
index e695007b13..1e50cd36c3 100644
--- a/src/interfaces/ecpg/test/expected/preproc-strings.c
+++ b/src/interfaces/ecpg/test/expected/preproc-strings.c
@@ -45,7 +45,7 @@ int main(void)
 #line 13 "strings.pgc"


-  { ECPGdo(__LINE__, 0, 1, NULL, 0, ECPGst_normal, "select 'abcdef' , N'abcdef' as foo , E'abc\\bdef' as \"foo\" ,
U&'d\\0061t\\0061'as U&\"foo\" , U&'d!+000061t!+000061' UESCAPE '!' , $foo$abc$def$foo$", ECPGt_EOIT,  
+  { ECPGdo(__LINE__, 0, 1, NULL, 0, ECPGst_normal, "select 'abc''d\\ef' , N'abc''d\\ef' as foo , E'abc''d\\\\ef' as
\"foo\"\"bar\", U&'d\\0061t\\0061' as U&\"foo\"\"bar\" , U&'d!+000061t!+000061' UESCAPE '!' , $foo$abc$def$foo$",
ECPGt_EOIT, 
     ECPGt_char,&(s1),(long)0,(long)1,(1)*sizeof(char),
     ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L,
     ECPGt_char,&(s2),(long)0,(long)1,(1)*sizeof(char),
diff --git a/src/interfaces/ecpg/test/expected/preproc-strings.stderr
b/src/interfaces/ecpg/test/expected/preproc-strings.stderr
index dbc9e5c0b8..4c3a8eee5a 100644
--- a/src/interfaces/ecpg/test/expected/preproc-strings.stderr
+++ b/src/interfaces/ecpg/test/expected/preproc-strings.stderr
@@ -8,7 +8,7 @@
 [NO_PID]: sqlca: code: 0, state: 00000
 [NO_PID]: ecpg_process_output on line 13: OK: SET
 [NO_PID]: sqlca: code: 0, state: 00000
-[NO_PID]: ecpg_execute on line 15: query: select 'abcdef' , N'abcdef' as foo , E'abc\bdef' as "foo" , U&'d\0061t\0061'
asU&"foo" , U&'d!+000061t!+000061' UESCAPE '!' , $foo$abc$def$foo$; with 0 parameter(s) on connection ecpg1_regression 
+[NO_PID]: ecpg_execute on line 15: query: select 'abc''d\ef' , N'abc''d\ef' as foo , E'abc''d\\ef' as "foo""bar" ,
U&'d\0061t\0061'as U&"foo""bar" , U&'d!+000061t!+000061' UESCAPE '!' , $foo$abc$def$foo$; with 0 parameter(s) on
connectionecpg1_regression 
 [NO_PID]: sqlca: code: 0, state: 00000
 [NO_PID]: ecpg_execute on line 15: using PQexec
 [NO_PID]: sqlca: code: 0, state: 00000
@@ -16,15 +16,15 @@
 [NO_PID]: sqlca: code: 0, state: 00000
 [NO_PID]: ecpg_store_result on line 15: allocating memory for 1 tuples
 [NO_PID]: sqlca: code: 0, state: 00000
-[NO_PID]: ecpg_get_data on line 15: RESULT: abcdef offset: -1; array: no
+[NO_PID]: ecpg_get_data on line 15: RESULT: abc'd\ef offset: -1; array: no
 [NO_PID]: sqlca: code: 0, state: 00000
 [NO_PID]: ecpg_store_result on line 15: allocating memory for 1 tuples
 [NO_PID]: sqlca: code: 0, state: 00000
-[NO_PID]: ecpg_get_data on line 15: RESULT: abcdef offset: -1; array: no
+[NO_PID]: ecpg_get_data on line 15: RESULT: abc'd\ef offset: -1; array: no
 [NO_PID]: sqlca: code: 0, state: 00000
 [NO_PID]: ecpg_store_result on line 15: allocating memory for 1 tuples
 [NO_PID]: sqlca: code: 0, state: 00000
-[NO_PID]: ecpg_get_data on line 15: RESULT: abcdef offset: -1; array: no
+[NO_PID]: ecpg_get_data on line 15: RESULT: abc'd\ef offset: -1; array: no
 [NO_PID]: sqlca: code: 0, state: 00000
 [NO_PID]: ecpg_store_result on line 15: allocating memory for 1 tuples
 [NO_PID]: sqlca: code: 0, state: 00000
diff --git a/src/interfaces/ecpg/test/expected/preproc-strings.stdout
b/src/interfaces/ecpg/test/expected/preproc-strings.stdout
index 730d72dd64..1456b152d7 100644
--- a/src/interfaces/ecpg/test/expected/preproc-strings.stdout
+++ b/src/interfaces/ecpg/test/expected/preproc-strings.stdout
@@ -1 +1 @@
-abcdef abcdef abcdef data data abc$def
+abc'd\ef abc'd\ef abc'd\ef data data abc$def
diff --git a/src/interfaces/ecpg/test/preproc/strings.pgc b/src/interfaces/ecpg/test/preproc/strings.pgc
index f004ddf6dc..25157f136c 100644
--- a/src/interfaces/ecpg/test/preproc/strings.pgc
+++ b/src/interfaces/ecpg/test/preproc/strings.pgc
@@ -12,10 +12,10 @@ int main(void)

   exec sql set standard_conforming_strings to on;

-  exec sql select 'abcdef',
-                  N'abcdef' AS foo,
-                  E'abc\bdef' AS "foo",
-                  U&'d\0061t\0061' AS U&"foo",
+  exec sql select 'abc''d\ef',
+                  N'abc''d\ef' AS foo,
+                  E'abc''d\\ef' AS "foo""bar",
+                  U&'d\0061t\0061' AS U&"foo""bar",
                   U&'d!+000061t!+000061' uescape '!',
                   $foo$abc$def$foo$
                   into :s1, :s2, :s3, :s4, :s5, :s6;
diff --git a/src/interfaces/ecpg/test/sql/dyntest.pgc b/src/interfaces/ecpg/test/sql/dyntest.pgc
index 5f02fd5dd6..0222c89851 100644
--- a/src/interfaces/ecpg/test/sql/dyntest.pgc
+++ b/src/interfaces/ecpg/test/sql/dyntest.pgc
@@ -51,7 +51,7 @@ main ()
   exec sql create table dyntest (name char (14), d float8, i int,
                  bignumber int8, b boolean, comment text,
                  day date);
-  exec sql insert into dyntest values ('first entry', 14.7, 14, 123045607890, true, 'The world''''s most advanced open
sourcedatabase.', '1987-07-14'); 
+  exec sql insert into dyntest values ('first entry', 14.7, 14, 123045607890, true, 'The world''s most advanced open
sourcedatabase.', '1987-07-14'); 
   exec sql insert into dyntest values ('second entry', 1407.87, 1407, 987065403210, false, 'The elephant never
forgets.','1999-11-5'); 

   exec sql prepare MYQUERY from :QUERY;

Re: ECPG gets embedded quotes wrong

От
Tom Lane
Дата:
I wrote:
> It looks to me like a sufficient fix is just to keep these quote
> sequences as-is within a converted string, so that the attached
> appears to fix it.

Poking at this further, I noticed that there's a semi-related bug
that this patch changes the behavior for, without fixing it exactly.
That has to do with use of a string literal as "execstring" in ECPG's
PREPARE ... FROM and EXECUTE IMMEDIATE commands.  Right now, it
appears that there is simply no way to write a double quote as part
of the SQL command in this context.  The EXECUTE IMMEDIATE docs say
that such a literal is a "C string", so one would figure that \"
(backslash-double quote) is the way, but that just produces syntax
errors.  The reason is that ECPG's lexer is in SQL mode at this point
so it thinks the double-quoted string is a SQL quoted identifier, in
which backslash isn't special so the double quote terminates the
identifier.  Ooops.  Knowing this, you might try writing two double
quotes, but that doesn't work either, because the <xd>{xddouble}
lexer rule converts that to one double quote, and you end up with
an unterminated literal in the translated C code rather than in the
ECPG input.

My patch above modifies this to the extent that two double quotes
come out as two double quotes in the translated C code, but that
just results in nothing at all, since the C compiler sees adjacent
string literals, which the C standard commands it to concatenate.
Then you probably get a mysterious syntax error from the backend
because it thinks your intended-to-be SQL quoted identifier isn't
quoted.  However, this is the behavior a C programmer would expect
for adjacent double quotes in a literal, so maybe people wouldn't
see it as mysterious.

Anyway, what to do?

1. Nothing, except document that you can't put a double quote into
the C string literal in these commands.

2. Make two-double-quotes work to produce a data double quote,
which I think could be done fairly easily with some post-processing
in the execstring production.  However, this doesn't have much to
recommend it other than being easily implementable.  C programmers
would not think it's natural, and the fact that backslash sequences
other than \" would work as a C programmer expects doesn't help.

3. Find a way to lex the literal per C rules, as the EXECUTE IMMEDIATE
docs clearly imply we should.  (The PREPARE docs are silent on the
point AFAICS.)  Unfortunately, this seems darn near impossible unless
we want to make IMMEDIATE (more) reserved.  Since it's currently
unreserved, the grammar can't tell which flavor of EXEC SQL EXECUTE ...
it's dealing with until it looks ahead past the name-or-IMMEDIATE token,
so that it must lex the literal (if any) too soon.  I tried putting in a
mid-rule action to switch the lexer back to C mode but failed because of
that ambiguity.  Maybe we could make it work with a bunch of refactoring,
but it would be ugly and subtle code, in both the grammar and lexer.

On the whole I'm inclined to go with #1.  There's a reason why nobody has
complained about this in twenty years, which is that the syntaxes with
a string literal are completely useless.  There's no point in writing
EXEC SQL EXECUTE IMMEDIATE "SQL-statement" when you can just write
EXEC SQL SQL-statement, and similarly for PREPARE.  (The other variant
that takes the string from a C variable is useful, but that one doesn't
have any weird quoting problem.)  So I can't see expending the effort
for #3, and I don't feel like adding and documenting the wart of #2.

Thoughts?

            regards, tom lane



Re: ECPG gets embedded quotes wrong

От
Tom Lane
Дата:
I wrote:
> Poking at this further, I noticed that there's a semi-related bug
> that this patch changes the behavior for, without fixing it exactly.
> That has to do with use of a string literal as "execstring" in ECPG's
> PREPARE ... FROM and EXECUTE IMMEDIATE commands.  Right now, it
> appears that there is simply no way to write a double quote as part
> of the SQL command in this context.

In the other thread, 1250kv pointed out that you can use an octal
escape (\042) to get a quote mark.  That's pretty grotty, but it
does work in existing ECPG releases as well as with this patch.

So now I think the best answer for this part is just to document that
workaround.  Given the lack of complaints up to now, it's definitely not
worth the amount of trouble that'd be needed to have a cleaner solution.

            regards, tom lane