[PATCH] Run UTF8-dependent tests for citext [Re: daitch_mokotoff module]
От | Dag Lem |
---|---|
Тема | [PATCH] Run UTF8-dependent tests for citext [Re: daitch_mokotoff module] |
Дата | |
Msg-id | ygezgoacs4e.fsf_-_@sid.nimrod.no обсуждение исходный текст |
Ответ на | Re: daitch_mokotoff module (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [PATCH] Run UTF8-dependent tests for citext [Re: daitch_mokotoff module]
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-hackers |
Tom Lane <tgl@sss.pgh.pa.us> writes: > Dag Lem <dag@nimrod.no> writes: > >> Running "ack -l '[\x80-\xff]'" in the contrib/ directory reveals that >> two other modules are using UTF8 characters in tests - citext and >> unaccent. > > Yeah, neither of those have been upgraded to said best practice. > (If you feel like doing the legwork to improve that situation, > that'd be great.) > Please find attached a patch to run the previously commented-out UTF8-dependent tests for citext, according to best practice. For now I don't dare to touch the unaccent module, which seems to be UTF8-only anyway. Best regards Dag Lem diff --git a/contrib/citext/Makefile b/contrib/citext/Makefile index a7de52928d..789932fe36 100644 --- a/contrib/citext/Makefile +++ b/contrib/citext/Makefile @@ -11,7 +11,7 @@ DATA = citext--1.4.sql \ citext--1.0--1.1.sql PGFILEDESC = "citext - case-insensitive character string data type" -REGRESS = citext +REGRESS = citext citext_utf8 ifdef USE_PGXS PG_CONFIG = pg_config diff --git a/contrib/citext/expected/citext.out b/contrib/citext/expected/citext.out index 3bac0534fb..48b4de8993 100644 --- a/contrib/citext/expected/citext.out +++ b/contrib/citext/expected/citext.out @@ -48,29 +48,6 @@ SELECT 'a'::citext <> 'ab'::citext AS t; t (1 row) --- Multibyte sanity tests. Uncomment to run. --- SELECT 'À'::citext = 'À'::citext AS t; --- SELECT 'À'::citext = 'à'::citext AS t; --- SELECT 'À'::text = 'à'::text AS f; -- text wins. --- SELECT 'À'::citext <> 'B'::citext AS t; --- Test combining characters making up canonically equivalent strings. --- SELECT 'Ä'::text <> 'Ä'::text AS t; --- SELECT 'Ä'::citext <> 'Ä'::citext AS t; --- Test the Turkish dotted I. The lowercase is a single byte while the --- uppercase is multibyte. This is why the comparison code can't be optimized --- to compare string lengths. --- SELECT 'i'::citext = 'İ'::citext AS t; --- Regression. --- SELECT 'láska'::citext <> 'laská'::citext AS t; --- SELECT 'Ask Bjørn Hansen'::citext = 'Ask Bjørn Hansen'::citext AS t; --- SELECT 'Ask Bjørn Hansen'::citext = 'ASK BJØRN HANSEN'::citext AS t; --- SELECT 'Ask Bjørn Hansen'::citext <> 'Ask Bjorn Hansen'::citext AS t; --- SELECT 'Ask Bjørn Hansen'::citext <> 'ASK BJORN HANSEN'::citext AS t; --- SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'Ask Bjørn Hansen'::citext) AS zero; --- SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'ask bjørn hansen'::citext) AS zero; --- SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'ASK BJØRN HANSEN'::citext) AS zero; --- SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'Ask Bjorn Hansen'::citext) AS positive; --- SELECT citext_cmp('Ask Bjorn Hansen'::citext, 'Ask Bjørn Hansen'::citext) AS negative; -- Test > and >= SELECT 'B'::citext > 'a'::citext AS t; t diff --git a/contrib/citext/expected/citext_1.out b/contrib/citext/expected/citext_1.out index 57fc863f7a..8ab4d4224e 100644 --- a/contrib/citext/expected/citext_1.out +++ b/contrib/citext/expected/citext_1.out @@ -48,29 +48,6 @@ SELECT 'a'::citext <> 'ab'::citext AS t; t (1 row) --- Multibyte sanity tests. Uncomment to run. --- SELECT 'À'::citext = 'À'::citext AS t; --- SELECT 'À'::citext = 'à'::citext AS t; --- SELECT 'À'::text = 'à'::text AS f; -- text wins. --- SELECT 'À'::citext <> 'B'::citext AS t; --- Test combining characters making up canonically equivalent strings. --- SELECT 'Ä'::text <> 'Ä'::text AS t; --- SELECT 'Ä'::citext <> 'Ä'::citext AS t; --- Test the Turkish dotted I. The lowercase is a single byte while the --- uppercase is multibyte. This is why the comparison code can't be optimized --- to compare string lengths. --- SELECT 'i'::citext = 'İ'::citext AS t; --- Regression. --- SELECT 'láska'::citext <> 'laská'::citext AS t; --- SELECT 'Ask Bjørn Hansen'::citext = 'Ask Bjørn Hansen'::citext AS t; --- SELECT 'Ask Bjørn Hansen'::citext = 'ASK BJØRN HANSEN'::citext AS t; --- SELECT 'Ask Bjørn Hansen'::citext <> 'Ask Bjorn Hansen'::citext AS t; --- SELECT 'Ask Bjørn Hansen'::citext <> 'ASK BJORN HANSEN'::citext AS t; --- SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'Ask Bjørn Hansen'::citext) AS zero; --- SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'ask bjørn hansen'::citext) AS zero; --- SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'ASK BJØRN HANSEN'::citext) AS zero; --- SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'Ask Bjorn Hansen'::citext) AS positive; --- SELECT citext_cmp('Ask Bjorn Hansen'::citext, 'Ask Bjørn Hansen'::citext) AS negative; -- Test > and >= SELECT 'B'::citext > 'a'::citext AS t; t diff --git a/contrib/citext/expected/citext_utf8.out b/contrib/citext/expected/citext_utf8.out new file mode 100644 index 0000000000..1f4fa79aff --- /dev/null +++ b/contrib/citext/expected/citext_utf8.out @@ -0,0 +1,119 @@ +/* + * This test must be run in a database with UTF-8 encoding, + * because other encodings don't support all the characters used. + */ +SELECT getdatabaseencoding() <> 'UTF8' + AS skip_test \gset +\if :skip_test +\quit +\endif +set client_encoding = utf8; +-- CREATE EXTENSION IF NOT EXISTS citext; +-- Multibyte sanity tests. +SELECT 'À'::citext = 'À'::citext AS t; + t +--- + t +(1 row) + +SELECT 'À'::citext = 'à'::citext AS t; + t +--- + t +(1 row) + +SELECT 'À'::text = 'à'::text AS f; -- text wins. + f +--- + f +(1 row) + +SELECT 'À'::citext <> 'B'::citext AS t; + t +--- + t +(1 row) + +-- Test combining characters making up canonically equivalent strings. +SELECT 'Ä'::text <> 'Ä'::text AS t; + t +--- + t +(1 row) + +SELECT 'Ä'::citext <> 'Ä'::citext AS t; + t +--- + t +(1 row) + +-- Test the Turkish dotted I. The lowercase is a single byte while the +-- uppercase is multibyte. This is why the comparison code can't be optimized +-- to compare string lengths. +SELECT 'i'::citext = 'İ'::citext AS t; + t +--- + t +(1 row) + +-- Regression. +SELECT 'láska'::citext <> 'laská'::citext AS t; + t +--- + t +(1 row) + +SELECT 'Ask Bjørn Hansen'::citext = 'Ask Bjørn Hansen'::citext AS t; + t +--- + t +(1 row) + +SELECT 'Ask Bjørn Hansen'::citext = 'ASK BJØRN HANSEN'::citext AS t; + t +--- + t +(1 row) + +SELECT 'Ask Bjørn Hansen'::citext <> 'Ask Bjorn Hansen'::citext AS t; + t +--- + t +(1 row) + +SELECT 'Ask Bjørn Hansen'::citext <> 'ASK BJORN HANSEN'::citext AS t; + t +--- + t +(1 row) + +SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'Ask Bjørn Hansen'::citext) AS zero; + zero +------ + 0 +(1 row) + +SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'ask bjørn hansen'::citext) AS zero; + zero +------ + 0 +(1 row) + +SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'ASK BJØRN HANSEN'::citext) AS zero; + zero +------ + 0 +(1 row) + +SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'Ask Bjorn Hansen'::citext) AS positive; + positive +---------- + 15 +(1 row) + +SELECT citext_cmp('Ask Bjorn Hansen'::citext, 'Ask Bjørn Hansen'::citext) AS negative; + negative +---------- + -15 +(1 row) + diff --git a/contrib/citext/expected/citext_utf8_1.out b/contrib/citext/expected/citext_utf8_1.out new file mode 100644 index 0000000000..37aead89c0 --- /dev/null +++ b/contrib/citext/expected/citext_utf8_1.out @@ -0,0 +1,8 @@ +/* + * This test must be run in a database with UTF-8 encoding, + * because other encodings don't support all the characters used. + */ +SELECT getdatabaseencoding() <> 'UTF8' + AS skip_test \gset +\if :skip_test +\quit diff --git a/contrib/citext/sql/citext.sql b/contrib/citext/sql/citext.sql index 55fb1d11a6..bd62ab8047 100644 --- a/contrib/citext/sql/citext.sql +++ b/contrib/citext/sql/citext.sql @@ -19,34 +19,6 @@ SELECT 'a'::citext = 'b'::citext AS f; SELECT 'a'::citext = 'ab'::citext AS f; SELECT 'a'::citext <> 'ab'::citext AS t; --- Multibyte sanity tests. Uncomment to run. --- SELECT 'À'::citext = 'À'::citext AS t; --- SELECT 'À'::citext = 'à'::citext AS t; --- SELECT 'À'::text = 'à'::text AS f; -- text wins. --- SELECT 'À'::citext <> 'B'::citext AS t; - --- Test combining characters making up canonically equivalent strings. --- SELECT 'Ä'::text <> 'Ä'::text AS t; --- SELECT 'Ä'::citext <> 'Ä'::citext AS t; - --- Test the Turkish dotted I. The lowercase is a single byte while the --- uppercase is multibyte. This is why the comparison code can't be optimized --- to compare string lengths. --- SELECT 'i'::citext = 'İ'::citext AS t; - --- Regression. --- SELECT 'láska'::citext <> 'laská'::citext AS t; - --- SELECT 'Ask Bjørn Hansen'::citext = 'Ask Bjørn Hansen'::citext AS t; --- SELECT 'Ask Bjørn Hansen'::citext = 'ASK BJØRN HANSEN'::citext AS t; --- SELECT 'Ask Bjørn Hansen'::citext <> 'Ask Bjorn Hansen'::citext AS t; --- SELECT 'Ask Bjørn Hansen'::citext <> 'ASK BJORN HANSEN'::citext AS t; --- SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'Ask Bjørn Hansen'::citext) AS zero; --- SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'ask bjørn hansen'::citext) AS zero; --- SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'ASK BJØRN HANSEN'::citext) AS zero; --- SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'Ask Bjorn Hansen'::citext) AS positive; --- SELECT citext_cmp('Ask Bjorn Hansen'::citext, 'Ask Bjørn Hansen'::citext) AS negative; - -- Test > and >= SELECT 'B'::citext > 'a'::citext AS t; SELECT 'b'::citext > 'A'::citext AS t; diff --git a/contrib/citext/sql/citext_utf8.sql b/contrib/citext/sql/citext_utf8.sql new file mode 100644 index 0000000000..91822b85c2 --- /dev/null +++ b/contrib/citext/sql/citext_utf8.sql @@ -0,0 +1,42 @@ +/* + * This test must be run in a database with UTF-8 encoding, + * because other encodings don't support all the characters used. + */ + +SELECT getdatabaseencoding() <> 'UTF8' + AS skip_test \gset +\if :skip_test +\quit +\endif + +set client_encoding = utf8; + +-- CREATE EXTENSION IF NOT EXISTS citext; + +-- Multibyte sanity tests. +SELECT 'À'::citext = 'À'::citext AS t; +SELECT 'À'::citext = 'à'::citext AS t; +SELECT 'À'::text = 'à'::text AS f; -- text wins. +SELECT 'À'::citext <> 'B'::citext AS t; + +-- Test combining characters making up canonically equivalent strings. +SELECT 'Ä'::text <> 'Ä'::text AS t; +SELECT 'Ä'::citext <> 'Ä'::citext AS t; + +-- Test the Turkish dotted I. The lowercase is a single byte while the +-- uppercase is multibyte. This is why the comparison code can't be optimized +-- to compare string lengths. +SELECT 'i'::citext = 'İ'::citext AS t; + +-- Regression. +SELECT 'láska'::citext <> 'laská'::citext AS t; + +SELECT 'Ask Bjørn Hansen'::citext = 'Ask Bjørn Hansen'::citext AS t; +SELECT 'Ask Bjørn Hansen'::citext = 'ASK BJØRN HANSEN'::citext AS t; +SELECT 'Ask Bjørn Hansen'::citext <> 'Ask Bjorn Hansen'::citext AS t; +SELECT 'Ask Bjørn Hansen'::citext <> 'ASK BJORN HANSEN'::citext AS t; +SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'Ask Bjørn Hansen'::citext) AS zero; +SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'ask bjørn hansen'::citext) AS zero; +SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'ASK BJØRN HANSEN'::citext) AS zero; +SELECT citext_cmp('Ask Bjørn Hansen'::citext, 'Ask Bjorn Hansen'::citext) AS positive; +SELECT citext_cmp('Ask Bjorn Hansen'::citext, 'Ask Bjørn Hansen'::citext) AS negative;
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Peter EisentrautДата:
Сообщение: Re: Proposal: remove obsolete hot-standby testing infrastructure
Следующее
От: "osumi.takamichi@fujitsu.com"Дата:
Сообщение: RE: Optionally automatically disable logical replication subscriptions on error