Обсуждение: BUG #17635: The behavior of quote_ident differs from what the doc specifies

Поиск

Список

Период

Сортировка

BUG #17635: The behavior of quote_ident differs from what the doc specifies

От

PG Bug reporting form

Дата:

11 октября 2022 г., 18:28:51

The following bug has been logged on the website:

Bug reference:      17635
Logged by:          Bryn Llewellyn
Email address:      bryn@yugabyte.com
PostgreSQL version: 14.5
Operating system:   macOS Big Sur Version 11.7.
Description:

SUMMARY
-------

The quote_ident doc here:

https://www.postgresql.org/docs/11/functions-string.html#FUNCTIONS-STRING-OTHER

says this:

«
Return the given string suitably quoted to be used as an identifier in an
SQL statement string. Quotes are added only if necessary... 
»

But tests show that the returned text is often double-quoted when this is
NOT necessary.

ENVIRONMENT
-----------

 select version(); →

 PostgreSQL 14.5 (Homebrew) on x86_64-apple-darwin20.6.0,
 compiled by Apple clang version 13.0.0 (clang-1300.0.29.30), 64-bit

I'm using a Mac Book with macOS Big Sur Version 11.7.

TESCASE
-------

Try this using psql. It works in the cluster that you get immediately
following a use of initdb. But you can run it from any starting point as
long as you connect as a role that allows creating a temporary function.

«
create function pg_temp.f()
  returns table(z text)
  language plpgsql
as $body$
declare
  j                   int    not null := 0;
  n                   text   not null := '';
  name_texts constant text[] not null :=
    array['农民',    'bønder',   'фермеры',   'किसानों',   'dog house'];

  /*
    Here, "expected" means what empirical tests have shown and not
    what the PG doc tells you to expect.
  */
  expected_identfier_texts constant text[] not null :=
    array['"农民"', '"bønder"', '"фермеры"', '"किसानों"', '"dog house"'];

  stmt constant text not null := 'deallocate ?';
begin
  deallocate all;
  for j in array_lower(name_texts, 1)..array_upper(name_texts, 1) loop
    n := name_texts[j];
    assert quote_ident(n) = expected_identfier_texts[j], 'Actual !=
expected';
    assert quote_ident(n) = format('%I', n),             'quote_ident() !=
format()';

    begin
      execute replace(stmt, '?', n);
    exception
      when invalid_sql_statement_name then
        z := 'Does not need double quotes: '||n; return next;
      when syntax_error then
        z := 'Needs double quotes:         '||n; return next;
    end;
  end loop;
end;
$body$;

\t on
\o output.txt
select pg_temp.f();
\o
\t off
»

Here is output.txt:

«
 Does not need double quotes: 农民
 Does not need double quotes: bønder
 Does not need double quotes: фермеры
 Does not need double quotes: किसानों
 Needs double quotes:         dog house
»


This shows that (double) quotes are (sometimes) added when they are NOT
necessary.

It looks like double quotes are ALWAYS added whenever the input name text
contains any non-ASCII(7) character -- irrespective of whether the input
breaks the usual rules for when it can be used "as is" in the role of an SQL
identifier text.

SIMPLE FIX: TREAT THIS AS JUST A DOC BUG
----------------------------------------

Use new wording like this:

«
The input is the text of an object name (like you see, for example, in
pg_class.relname). The output is the text of the SQL identifier, for use
within the text of an SQL statement or PL/pgSQL source code, to designate
the object with the given name. Sometimes, the text of the input name can be
used "as is" as the text of the corresponding SQL identifier. For example,
if a pg_class query shows relname (for a table) like this:

 some_table

then this SQL statement text is legal and correct:

 drop table some_table;

However, this SQL statement text is also legal and correct:

 drop table "some_table";

Here, the double quotes are neither necessary nor harmful. In contrast, if a
pg_class query shows relname like this:

 some table

then the correct SQL statement text requires the use of double-quoting:

 drop table "some_table";

The SQL identifier text that quote_ident returns is always a legal spelling
of an SQL identifier that will correctly designate the object that the input
names. Sometimes, the input text is returned "as is". But often, surrounding
double quotes are added when they are not needed. This happens whenever the
input text contains a non-ASCII(7) character. Here is a simple example:

 select quote_ident('bønder'); → "bønder"

However, the SQL statement text that uses the text of the name "as is" runs
without error and has the intended effect:

 create table bønder(n int);

Embedded quotes are properly doubled.

The rules that specify when double-quoting is essential are given in section
[4.1.1. Identifiers and Key
Words](https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS).
»

The account of quote_ident would benefit from an x-ref to the account of the
format function to say that both functions have the same purpose (to form
the text of a legal and correct SQL identifier from the text of the name of
an object that is to be designated). And it should say that the behavior
w.r.t. over-zealous double-quoting is identical. Correspondingly, the
account of the format function would benefit from an x-ref to this account
of quote_ident.

PREFERRED FIX: TREAT THE DOC AS THE SPEC OF WHAT'S INTENDED AND FIX THE CODE
TO MATCH IT
----------------------------------------------------------------------------------------

The key part of the specified behavior is "Quotes are added ONLY if
necessary" (my emphasis). If the quote_ident behavior is changed to meet
this, then it would be trivial for a user to implement a language SQL
function thus:

  function is_exotic(name in text) returns boolean

Here, "exotic" denotes the kind of name text that requires double-quoting to
form the SQL identifier text that denotes it. (It would be useful to define
this term in the PG doc and to define, say, "simple" for the kind of name
text that does not require, but does allow, double-quoting to form the SQL
identifier text that denotes it.)

A very large body of anecdotal evidence shows that development shops outlaw
the use of exotic names. If quote_ident behaved as the current doc
specifies, then it woud be trivial to police this rule:

 select exists(
     select 1
     from pg_class
     where relname != quote_ident(relname)
   )::text as "exotic name(s) found";

(Here, pg_class is an example. Other catalog tables would have to be queried
too.)

As it happens, the current PG uses only ASCII(7) name texts and so this
query returns false. If a user creates an object thus (still using only
ASCII(7):

 create table "my table"(n int);

then this query:

 select relname as "exotic names"
 from pg_class
 where relname != quote_ident(relname)
 order by 1;

produces this result:

  exotic names 
 --------------
  my table

However, it's obviously prone to false negatives, as this shows:

 create table bønder(n int);

 select relname as "exotic names"
 from pg_class
 where relname != quote_ident(relname)
 order by 1;

This is the new result.

  exotic names 
 --------------
  bønder
  my table

But the name with the text bønder is NOT exotic.

Re: BUG #17635: The behavior of quote_ident differs from what the doc specifies

От

Tom Lane

Дата:

11 октября 2022 г., 19:51:33

PG Bug reporting form <noreply@postgresql.org> writes:
> The quote_ident doc here:

> https://www.postgresql.org/docs/11/functions-string.html#FUNCTIONS-STRING-OTHER

> says this:

> «
> Return the given string suitably quoted to be used as an identifier in an
> SQL statement string. Quotes are added only if necessary...
> »

> But tests show that the returned text is often double-quoted when this is
> NOT necessary.

As I said on the other thread, I think it's good from a future-proofing
standpoint that quote_ident is more conservative than strictly necessary.
One of the primary use-cases for it is preparing dumps that might get
loaded into future PG versions, or other RDBMSes altogether, that could
have different ideas of which characters are allowed in unquoted
identifiers.  (Most obviously, we might stop lumping all non-ASCII
characters together.)

So I don't want to change the code.  Perhaps we should remove "only"
from the quoted docs fragment, though.

            regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: BUG #17635: The behavior of quote_ident differs from what the doc specifies

BUG #17635: The behavior of quote_ident differs from what the doc specifies

Re: BUG #17635: The behavior of quote_ident differs from what the doc specifies