Обсуждение: BUG #17635: The behavior of quote_ident differs from what the doc specifies
BUG #17635: The behavior of quote_ident differs from what the doc specifies
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 17635 Logged by: Bryn Llewellyn Email address: bryn@yugabyte.com PostgreSQL version: 14.5 Operating system: macOS Big Sur Version 11.7. Description: SUMMARY ------- The quote_ident doc here: https://www.postgresql.org/docs/11/functions-string.html#FUNCTIONS-STRING-OTHER says this: « Return the given string suitably quoted to be used as an identifier in an SQL statement string. Quotes are added only if necessary... » But tests show that the returned text is often double-quoted when this is NOT necessary. ENVIRONMENT ----------- select version(); → PostgreSQL 14.5 (Homebrew) on x86_64-apple-darwin20.6.0, compiled by Apple clang version 13.0.0 (clang-1300.0.29.30), 64-bit I'm using a Mac Book with macOS Big Sur Version 11.7. TESCASE ------- Try this using psql. It works in the cluster that you get immediately following a use of initdb. But you can run it from any starting point as long as you connect as a role that allows creating a temporary function. « create function pg_temp.f() returns table(z text) language plpgsql as $body$ declare j int not null := 0; n text not null := ''; name_texts constant text[] not null := array['农民', 'bønder', 'фермеры', 'किसानों', 'dog house']; /* Here, "expected" means what empirical tests have shown and not what the PG doc tells you to expect. */ expected_identfier_texts constant text[] not null := array['"农民"', '"bønder"', '"фермеры"', '"किसानों"', '"dog house"']; stmt constant text not null := 'deallocate ?'; begin deallocate all; for j in array_lower(name_texts, 1)..array_upper(name_texts, 1) loop n := name_texts[j]; assert quote_ident(n) = expected_identfier_texts[j], 'Actual != expected'; assert quote_ident(n) = format('%I', n), 'quote_ident() != format()'; begin execute replace(stmt, '?', n); exception when invalid_sql_statement_name then z := 'Does not need double quotes: '||n; return next; when syntax_error then z := 'Needs double quotes: '||n; return next; end; end loop; end; $body$; \t on \o output.txt select pg_temp.f(); \o \t off » Here is output.txt: « Does not need double quotes: 农民 Does not need double quotes: bønder Does not need double quotes: фермеры Does not need double quotes: किसानों Needs double quotes: dog house » This shows that (double) quotes are (sometimes) added when they are NOT necessary. It looks like double quotes are ALWAYS added whenever the input name text contains any non-ASCII(7) character -- irrespective of whether the input breaks the usual rules for when it can be used "as is" in the role of an SQL identifier text. SIMPLE FIX: TREAT THIS AS JUST A DOC BUG ---------------------------------------- Use new wording like this: « The input is the text of an object name (like you see, for example, in pg_class.relname). The output is the text of the SQL identifier, for use within the text of an SQL statement or PL/pgSQL source code, to designate the object with the given name. Sometimes, the text of the input name can be used "as is" as the text of the corresponding SQL identifier. For example, if a pg_class query shows relname (for a table) like this: some_table then this SQL statement text is legal and correct: drop table some_table; However, this SQL statement text is also legal and correct: drop table "some_table"; Here, the double quotes are neither necessary nor harmful. In contrast, if a pg_class query shows relname like this: some table then the correct SQL statement text requires the use of double-quoting: drop table "some_table"; The SQL identifier text that quote_ident returns is always a legal spelling of an SQL identifier that will correctly designate the object that the input names. Sometimes, the input text is returned "as is". But often, surrounding double quotes are added when they are not needed. This happens whenever the input text contains a non-ASCII(7) character. Here is a simple example: select quote_ident('bønder'); → "bønder" However, the SQL statement text that uses the text of the name "as is" runs without error and has the intended effect: create table bønder(n int); Embedded quotes are properly doubled. The rules that specify when double-quoting is essential are given in section [4.1.1. Identifiers and Key Words](https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS). » The account of quote_ident would benefit from an x-ref to the account of the format function to say that both functions have the same purpose (to form the text of a legal and correct SQL identifier from the text of the name of an object that is to be designated). And it should say that the behavior w.r.t. over-zealous double-quoting is identical. Correspondingly, the account of the format function would benefit from an x-ref to this account of quote_ident. PREFERRED FIX: TREAT THE DOC AS THE SPEC OF WHAT'S INTENDED AND FIX THE CODE TO MATCH IT ---------------------------------------------------------------------------------------- The key part of the specified behavior is "Quotes are added ONLY if necessary" (my emphasis). If the quote_ident behavior is changed to meet this, then it would be trivial for a user to implement a language SQL function thus: function is_exotic(name in text) returns boolean Here, "exotic" denotes the kind of name text that requires double-quoting to form the SQL identifier text that denotes it. (It would be useful to define this term in the PG doc and to define, say, "simple" for the kind of name text that does not require, but does allow, double-quoting to form the SQL identifier text that denotes it.) A very large body of anecdotal evidence shows that development shops outlaw the use of exotic names. If quote_ident behaved as the current doc specifies, then it woud be trivial to police this rule: select exists( select 1 from pg_class where relname != quote_ident(relname) )::text as "exotic name(s) found"; (Here, pg_class is an example. Other catalog tables would have to be queried too.) As it happens, the current PG uses only ASCII(7) name texts and so this query returns false. If a user creates an object thus (still using only ASCII(7): create table "my table"(n int); then this query: select relname as "exotic names" from pg_class where relname != quote_ident(relname) order by 1; produces this result: exotic names -------------- my table However, it's obviously prone to false negatives, as this shows: create table bønder(n int); select relname as "exotic names" from pg_class where relname != quote_ident(relname) order by 1; This is the new result. exotic names -------------- bønder my table But the name with the text bønder is NOT exotic.
PG Bug reporting form <noreply@postgresql.org> writes: > The quote_ident doc here: > https://www.postgresql.org/docs/11/functions-string.html#FUNCTIONS-STRING-OTHER > says this: > « > Return the given string suitably quoted to be used as an identifier in an > SQL statement string. Quotes are added only if necessary... > » > But tests show that the returned text is often double-quoted when this is > NOT necessary. As I said on the other thread, I think it's good from a future-proofing standpoint that quote_ident is more conservative than strictly necessary. One of the primary use-cases for it is preparing dumps that might get loaded into future PG versions, or other RDBMSes altogether, that could have different ideas of which characters are allowed in unquoted identifiers. (Most obviously, we might stop lumping all non-ASCII characters together.) So I don't want to change the code. Perhaps we should remove "only" from the quoted docs fragment, though. regards, tom lane