Re: Improving collation-dependent indexes in system catalogs

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Improving collation-dependent indexes in system catalogs
Дата
Msg-id 32248.1544994102@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Improving collation-dependent indexes in system catalogs  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-hackers
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I notice that some information_schema view columns end up with C
> collation after this patch, and others remain with default collation.
> Is that sensible?  (I think the only two cases where this might matter
> at all are information_schema.parameters.parameter_name,
> information_schema.routines.external_name and
> information_schema.foreign_servers.foreign_server_type.)

Yeah.  Looking closer at that, there are no collation-sensitive indexes
in information_schema (if there were, the existing opr_sanity test would
have caught 'em).  But there are collation-sensitive table columns, which
do have pg_statistic entries, and those entries are at least nominally
broken by copying them into a database with a different default collation.

We could think of two ways to deal with that.  One is to plaster
COLLATE "C" on each textual table column in the information_schema.
A more aggressive approach is to attach COLLATE "C" to each of the
domain types that information_schema defines, which fixes the table
columns a fortiori, and also causes all of the exposed information_schema
view columns to acquire database-independent collations.

I tried both ways, as in the attached patches below (each meant to be
applied on top of my patch upthread), and they both pass check-world.

A possible advantage of the second approach is that it could end up
allowing comparisons on information_schema view columns to be translated
to indexable comparisons on the underlying "name" columns, which would
be a pleasant outcome.  On the other hand, people might be annoyed by
the semantics change, if they'd previously been doing that with the
expectation of getting database-collation-based comparisons.

I'm not sure whether the SQL standard says anything that either patch
would be violating.  I see that it does say that these domains have
CHARACTER SET SQL_TEXT, and that the collation of that character set
is implementation-defined, so I think we could get away with changing
so far as spec compliance is concerned.

            regards, tom lane

diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 6227a8f3..742e2b6 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -1576,13 +1576,13 @@ GRANT SELECT ON sequences TO PUBLIC;
  */

 CREATE TABLE sql_features (
-    feature_id          character_data,
-    feature_name        character_data,
-    sub_feature_id      character_data,
-    sub_feature_name    character_data,
-    is_supported        yes_or_no,
-    is_verified_by      character_data,
-    comments            character_data
+    feature_id          character_data COLLATE "C",
+    feature_name        character_data COLLATE "C",
+    sub_feature_id      character_data COLLATE "C",
+    sub_feature_name    character_data COLLATE "C",
+    is_supported        yes_or_no COLLATE "C",
+    is_verified_by      character_data COLLATE "C",
+    comments            character_data COLLATE "C"
 );

 -- Will be filled with external data by initdb.
@@ -1599,11 +1599,11 @@ GRANT SELECT ON sql_features TO PUBLIC;
 -- clause 9.1.

 CREATE TABLE sql_implementation_info (
-    implementation_info_id      character_data,
-    implementation_info_name    character_data,
+    implementation_info_id      character_data COLLATE "C",
+    implementation_info_name    character_data COLLATE "C",
     integer_value               cardinal_number,
-    character_value             character_data,
-    comments                    character_data
+    character_value             character_data COLLATE "C",
+    comments                    character_data COLLATE "C"
 );

 INSERT INTO sql_implementation_info VALUES ('10003', 'CATALOG NAME', NULL, 'Y', NULL);
@@ -1628,13 +1628,13 @@ GRANT SELECT ON sql_implementation_info TO PUBLIC;
  */

 CREATE TABLE sql_languages (
-    sql_language_source         character_data,
-    sql_language_year           character_data,
-    sql_language_conformance    character_data,
-    sql_language_integrity      character_data,
-    sql_language_implementation character_data,
-    sql_language_binding_style  character_data,
-    sql_language_programming_language character_data
+    sql_language_source         character_data COLLATE "C",
+    sql_language_year           character_data COLLATE "C",
+    sql_language_conformance    character_data COLLATE "C",
+    sql_language_integrity      character_data COLLATE "C",
+    sql_language_implementation character_data COLLATE "C",
+    sql_language_binding_style  character_data COLLATE "C",
+    sql_language_programming_language character_data COLLATE "C"
 );

 INSERT INTO sql_languages VALUES ('ISO 9075', '1999', 'CORE', NULL, NULL, 'DIRECT', NULL);
@@ -1651,11 +1651,11 @@ GRANT SELECT ON sql_languages TO PUBLIC;
  */

 CREATE TABLE sql_packages (
-    feature_id      character_data,
-    feature_name    character_data,
-    is_supported    yes_or_no,
-    is_verified_by  character_data,
-    comments        character_data
+    feature_id      character_data COLLATE "C",
+    feature_name    character_data COLLATE "C",
+    is_supported    yes_or_no COLLATE "C",
+    is_verified_by  character_data COLLATE "C",
+    comments        character_data COLLATE "C"
 );

 INSERT INTO sql_packages VALUES ('PKG000', 'Core', 'NO', NULL, '');
@@ -1678,11 +1678,11 @@ GRANT SELECT ON sql_packages TO PUBLIC;
  */

 CREATE TABLE sql_parts (
-    feature_id      character_data,
-    feature_name    character_data,
-    is_supported    yes_or_no,
-    is_verified_by  character_data,
-    comments        character_data
+    feature_id      character_data COLLATE "C",
+    feature_name    character_data COLLATE "C",
+    is_supported    yes_or_no COLLATE "C",
+    is_verified_by  character_data COLLATE "C",
+    comments        character_data COLLATE "C"
 );

 INSERT INTO sql_parts VALUES ('1', 'Framework (SQL/Framework)', 'NO', NULL, '');
@@ -1705,9 +1705,9 @@ INSERT INTO sql_parts VALUES ('14', 'XML-Related Specifications (SQL/XML)', 'YES

 CREATE TABLE sql_sizing (
     sizing_id       cardinal_number,
-    sizing_name     character_data,
+    sizing_name     character_data COLLATE "C",
     supported_value cardinal_number,
-    comments        character_data
+    comments        character_data COLLATE "C"
 );

 INSERT INTO sql_sizing VALUES (34,    'MAXIMUM CATALOG NAME LENGTH', 63, NULL);
@@ -1753,10 +1753,10 @@ GRANT SELECT ON sql_sizing TO PUBLIC;

 CREATE TABLE sql_sizing_profiles (
     sizing_id       cardinal_number,
-    sizing_name     character_data,
-    profile_id      character_data,
+    sizing_name     character_data COLLATE "C",
+    profile_id      character_data COLLATE "C",
     required_value  cardinal_number,
-    comments        character_data
+    comments        character_data COLLATE "C"
 );

 GRANT SELECT ON sql_sizing_profiles TO PUBLIC;
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 52cd7b9..6dca1b7 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -2060,7 +2060,18 @@ ORDER BY 1;
 -- a representational error in pg_index, but simply wrong catalog design.
 -- It's bad because we expect to be able to clone template0 and assign the
 -- copy a different database collation.  It would especially not work for
--- shared catalogs.  Collation-sensitive indexes should have "C" collation.
+-- shared catalogs.
+SELECT relname, attname, attcollation
+FROM pg_class c, pg_attribute a
+WHERE c.oid = attrelid AND c.oid < 16384 AND
+    c.relkind != 'v' AND  -- we don't care about columns in views
+    attcollation != 0 AND
+    attcollation != (SELECT oid FROM pg_collation WHERE collname = 'C');
+ relname | attname | attcollation
+---------+---------+--------------
+(0 rows)
+
+-- Double-check that collation-sensitive indexes have "C" collation, too.
 SELECT indexrelid::regclass, indrelid::regclass, iclass, icoll
 FROM (SELECT indexrelid, indrelid,
              unnest(indclass) as iclass, unnest(indcollation) as icoll
diff --git a/src/test/regress/sql/opr_sanity.sql b/src/test/regress/sql/opr_sanity.sql
index 5cf4df1..64eca7e 100644
--- a/src/test/regress/sql/opr_sanity.sql
+++ b/src/test/regress/sql/opr_sanity.sql
@@ -1333,7 +1333,16 @@ ORDER BY 1;
 -- a representational error in pg_index, but simply wrong catalog design.
 -- It's bad because we expect to be able to clone template0 and assign the
 -- copy a different database collation.  It would especially not work for
--- shared catalogs.  Collation-sensitive indexes should have "C" collation.
+-- shared catalogs.
+
+SELECT relname, attname, attcollation
+FROM pg_class c, pg_attribute a
+WHERE c.oid = attrelid AND c.oid < 16384 AND
+    c.relkind != 'v' AND  -- we don't care about columns in views
+    attcollation != 0 AND
+    attcollation != (SELECT oid FROM pg_collation WHERE collname = 'C');
+
+-- Double-check that collation-sensitive indexes have "C" collation, too.

 SELECT indexrelid::regclass, indrelid::regclass, iclass, icoll
 FROM (SELECT indexrelid, indrelid,
diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 6227a8f3..0fbcfa8 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -208,7 +208,7 @@ CREATE DOMAIN cardinal_number AS integer
  * CHARACTER_DATA domain
  */

-CREATE DOMAIN character_data AS character varying;
+CREATE DOMAIN character_data AS character varying COLLATE "C";


 /*
@@ -216,7 +216,7 @@ CREATE DOMAIN character_data AS character varying;
  * SQL_IDENTIFIER domain
  */

-CREATE DOMAIN sql_identifier AS character varying;
+CREATE DOMAIN sql_identifier AS character varying COLLATE "C";


 /*
@@ -243,7 +243,7 @@ CREATE DOMAIN time_stamp AS timestamp(2) with time zone
  * YES_OR_NO domain
  */

-CREATE DOMAIN yes_or_no AS character varying(3)
+CREATE DOMAIN yes_or_no AS character varying(3) COLLATE "C"
     CONSTRAINT yes_or_no_check CHECK (value IN ('YES', 'NO'));


diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 52cd7b9..6dca1b7 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -2060,7 +2060,18 @@ ORDER BY 1;
 -- a representational error in pg_index, but simply wrong catalog design.
 -- It's bad because we expect to be able to clone template0 and assign the
 -- copy a different database collation.  It would especially not work for
--- shared catalogs.  Collation-sensitive indexes should have "C" collation.
+-- shared catalogs.
+SELECT relname, attname, attcollation
+FROM pg_class c, pg_attribute a
+WHERE c.oid = attrelid AND c.oid < 16384 AND
+    c.relkind != 'v' AND  -- we don't care about columns in views
+    attcollation != 0 AND
+    attcollation != (SELECT oid FROM pg_collation WHERE collname = 'C');
+ relname | attname | attcollation
+---------+---------+--------------
+(0 rows)
+
+-- Double-check that collation-sensitive indexes have "C" collation, too.
 SELECT indexrelid::regclass, indrelid::regclass, iclass, icoll
 FROM (SELECT indexrelid, indrelid,
              unnest(indclass) as iclass, unnest(indcollation) as icoll
diff --git a/src/test/regress/sql/opr_sanity.sql b/src/test/regress/sql/opr_sanity.sql
index 5cf4df1..64eca7e 100644
--- a/src/test/regress/sql/opr_sanity.sql
+++ b/src/test/regress/sql/opr_sanity.sql
@@ -1333,7 +1333,16 @@ ORDER BY 1;
 -- a representational error in pg_index, but simply wrong catalog design.
 -- It's bad because we expect to be able to clone template0 and assign the
 -- copy a different database collation.  It would especially not work for
--- shared catalogs.  Collation-sensitive indexes should have "C" collation.
+-- shared catalogs.
+
+SELECT relname, attname, attcollation
+FROM pg_class c, pg_attribute a
+WHERE c.oid = attrelid AND c.oid < 16384 AND
+    c.relkind != 'v' AND  -- we don't care about columns in views
+    attcollation != 0 AND
+    attcollation != (SELECT oid FROM pg_collation WHERE collname = 'C');
+
+-- Double-check that collation-sensitive indexes have "C" collation, too.

 SELECT indexrelid::regclass, indrelid::regclass, iclass, icoll
 FROM (SELECT indexrelid, indrelid,

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Valgrind failures in Apply Launcher's bgworker_quickdie() exit
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: Valgrind failures in Apply Launcher's bgworker_quickdie() exit