Обсуждение: [COMMITTERS] pgsql: ICU support

Поиск
Список
Период
Сортировка

[COMMITTERS] pgsql: ICU support

От
Peter Eisentraut
Дата:
ICU support

Add a column collprovider to pg_collation that determines which library
provides the collation data.  The existing choices are default and libc,
and this adds an icu choice, which uses the ICU4C library.

The pg_locale_t type is changed to a union that contains the
provider-specific locale handles.  Users of locale information are
changed to look into that struct for the appropriate handle to use.

Also add a collversion column that records the version of the collation
when it is created, and check at run time whether it is still the same.
This detects potentially incompatible library upgrades that can corrupt
indexes and other structures.  This is currently only supported by
ICU-provided collations.

initdb initializes the default collation set as before from the `locale
-a` output but also adds all available ICU locales with a "-x-icu"
appended.

Currently, ICU-provided collations can only be explicitly named
collations.  The global database locales are still always libc-provided.

ICU support is enabled by configure --with-icu.

Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/eccfef81e1f73ee41f1d8bfe4fa4e80576945048

Modified Files
--------------
aclocal.m4                                       |    1 +
config/pkg.m4                                    |  275 ++++++
configure                                        |  313 ++++++
configure.in                                     |   35 +
doc/src/sgml/catalogs.sgml                       |   19 +
doc/src/sgml/charset.sgml                        |  177 +++-
doc/src/sgml/func.sgml                           |   17 +
doc/src/sgml/installation.sgml                   |   15 +
doc/src/sgml/mvcc.sgml                           |    3 +-
doc/src/sgml/ref/alter_collation.sgml            |   55 ++
doc/src/sgml/ref/create_collation.sgml           |   37 +-
src/Makefile.global.in                           |    4 +
src/backend/Makefile                             |    2 +-
src/backend/catalog/pg_collation.c               |   52 +-
src/backend/commands/collationcmds.c             |  288 +++++-
src/backend/common.mk                            |    2 +
src/backend/nodes/copyfuncs.c                    |   13 +
src/backend/nodes/equalfuncs.c                   |   11 +
src/backend/parser/gram.y                        |   18 +-
src/backend/regex/regc_pg_locale.c               |  110 ++-
src/backend/tcop/utility.c                       |    8 +
src/backend/utils/adt/formatting.c               |  453 +++++----
src/backend/utils/adt/like.c                     |   53 +-
src/backend/utils/adt/pg_locale.c                |  266 ++++-
src/backend/utils/adt/selfuncs.c                 |    8 +-
src/backend/utils/adt/varlena.c                  |  179 +++-
src/backend/utils/mb/encnames.c                  |   76 ++
src/bin/initdb/initdb.c                          |    3 +-
src/bin/pg_dump/pg_dump.c                        |   75 +-
src/bin/pg_dump/t/002_pg_dump.pl                 |    2 +-
src/bin/psql/describe.c                          |    7 +-
src/include/catalog/pg_collation.h               |   25 +-
src/include/catalog/pg_collation_fn.h            |    2 +
src/include/catalog/pg_proc.h                    |    3 +
src/include/commands/collationcmds.h             |    1 +
src/include/mb/pg_wchar.h                        |    6 +
src/include/nodes/nodes.h                        |    1 +
src/include/nodes/parsenodes.h                   |   11 +
src/include/pg_config.h.in                       |    6 +
src/include/utils/pg_locale.h                    |   32 +-
src/test/regress/GNUmakefile                     |    3 +
src/test/regress/expected/collate.icu.out        | 1126 ++++++++++++++++++++++
src/test/regress/expected/collate.linux.utf8.out |   80 +-
src/test/regress/sql/collate.icu.sql             |  433 +++++++++
src/test/regress/sql/collate.linux.utf8.sql      |   32 +
45 files changed, 3929 insertions(+), 409 deletions(-)


Re: [COMMITTERS] pgsql: ICU support

От
Peter Eisentraut
Дата:
On 3/23/17 15:33, Peter Eisentraut wrote:
> ICU support

Seeing some crashes on the build farm, investigating.  If someone can
reproduce locally and get a backtrace, let me know.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [COMMITTERS] pgsql: ICU support

От
Andrew Dunstan
Дата:

On 03/23/2017 04:07 PM, Peter Eisentraut wrote:
> On 3/23/17 15:33, Peter Eisentraut wrote:
>> ICU support
> Seeing some crashes on the build farm, investigating.  If someone can
> reproduce locally and get a backtrace, let me know.
>


The buildfarm actually does that :-)

See for example
<https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2017-03-23%2019%3A37%3A22>
which has stuff like:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  Generic_Text_IC_like (collation=100, pat=0x10cf3f8, str=0x10cf388) at
/home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/like.c:197
197        if (pg_database_encoding_max_length() > 1 || locale->provider == COLLPROVIDER_ICU)
#0  Generic_Text_IC_like (collation=100, pat=0x10cf3f8, str=0x10cf388) at
/home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/like.c:197
#1  texticlike (fcinfo=<optimized out>) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/like.c:394
#2  0x00000000005db832 in ExecMakeFunctionResultNoSets (fcache=0x106af48, econtext=0x106b828, isNull=0x7fff5884877c "")
at/home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/executor/execQual.c:1866 
#3  0x00000000005dfe76 in ExecEvalExprSwitchContext (expression=expression@entry=0x106af48, econtext=<optimized out>,
isNull=isNull@entry=0x7fff5884877c"") at
/home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/executor/execQual.c:4228
#4  0x0000000000671d29 in evaluate_expr (expr=<optimized out>, result_type=result_type@entry=16,
result_typmod=result_typmod@entry=-1,result_collation=result_collation@entry=0) at
/home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:4680
#5  0x00000000006732ab in evaluate_function (context=0x7fff58848aa0, func_tuple=0x7fb799a56048, funcvariadic=0 '\000',
args=0x10cf8a0,input_collid=100, result_collid=0, result_typmod=-1, result_type=16, funcid=1633) at
/home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:4237
#6  simplify_function (funcid=1633, result_type=16, result_typmod=result_typmod@entry=-1, result_collid=0,
input_collid=100,args_p=args_p@entry=0x7fff588488f8, funcvariadic=0 '\000', process_args=1 '\001', allow_non_const=1
'\001',context=0x7fff58848aa0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:3877 
#7  0x0000000000673b4a in eval_const_expressions_mutator (node=0x10cf410, context=0x7fff58848aa0) at
/home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:2584
#8  0x0000000000619be7 in expression_tree_mutator (node=node@entry=0x10cf460, mutator=mutator@entry=0x673a10
<eval_const_expressions_mutator>,context=context@entry=0x7fff58848aa0) at
/home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/nodes/nodeFuncs.c:3017
#9  0x0000000000673a50 in eval_const_expressions_mutator (node=0x10cf460, context=0x7fff58848aa0) at
/home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:3527
#10 0x0000000000619e4b in expression_tree_mutator (node=node@entry=0x10cf4b0, mutator=mutator@entry=0x673a10
<eval_const_expressions_mutator>,context=context@entry=0x7fff58848aa0) at
/home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/nodes/nodeFuncs.c:2912
#11 0x0000000000673a50 in eval_const_expressions_mutator (node=0x10cf4b0, context=context@entry=0x7fff58848aa0) at
/home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:3527
#12 0x000000000067574f in eval_const_expressions (root=root@entry=0x10cf5d0, node=<optimized out>) at
/home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:2378
#13 0x0000000000660d95 in preprocess_expression (root=root@entry=0x10cf5d0, expr=<optimized out>, kind=kind@entry=1) at
/home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/plan/planner.c:880
#14 0x000000000066519d in subquery_planner (glob=glob@entry=0x10cf198, parse=parse@entry=0x10cec38,
parent_root=parent_root@entry=0x0,hasRecursion=hasRecursion@entry=0 '\000', tuple_fraction=tuple_fraction@entry=0) at
/home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/plan/planner.c:593
#15 0x0000000000666321 in standard_planner (parse=0x10cec38, cursorOptions=256, boundParams=<optimized out>) at
/home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/plan/planner.c:306
#16 0x00000000006efeec in pg_plan_query (querytree=querytree@entry=0x10cec38, cursorOptions=cursorOptions@entry=256,
boundParams=boundParams@entry=0x0)at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/tcop/postgres.c:798 
#17 0x00000000006effce in pg_plan_queries (querytrees=<optimized out>, cursorOptions=cursorOptions@entry=256,
boundParams=boundParams@entry=0x0)at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/tcop/postgres.c:864 
#18 0x00000000006f0467 in exec_simple_query (query_string=0x10cdd68 "SELECT 'hawkeye' ILIKE 'h%' AS \"true\";") at
/home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/tcop/postgres.c:1029
#19 0x00000000006f17cb in PostgresMain (argc=<optimized out>, argv=argv@entry=0x10784a0, dbname=0x1078298 "regression",
username=<optimizedout>) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/tcop/postgres.c:4071 
#20 0x0000000000478ce0 in BackendRun (port=0x1073670) at
/home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:4317
#21 BackendStartup (port=0x1073670) at
/home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:3989
#22 ServerLoop () at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:1729
#23 0x00000000006907f7 in PostmasterMain (argc=argc@entry=8, argv=argv@entry=0x104c6e0) at
/home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:1337
#24 0x0000000000479e98 in main (argc=8, argv=0x104c6e0) at
/home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/main/main.c:228



cheers

andrew

--
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [COMMITTERS] pgsql: ICU support

От
Tom Lane
Дата:
The buildfarm's mostly happy, but "prion" still isn't, which
suggests there's someplace in the new code that references an
already-closed relcache entry.

            regards, tom lane


Re: [COMMITTERS] pgsql: ICU support

От
Tom Lane
Дата:
I wrote:
> The buildfarm's mostly happy, but "prion" still isn't, which
> suggests there's someplace in the new code that references an
> already-closed relcache entry.

Ah, scratch that, I was overthinking it.  The problem is pretty
clear from the error message:

+ ERROR:  collation "en-x-icu" for encoding "SQL_ASCII" does not exist

I can reproduce this with vanilla configure options if I'm running
with --with-icu and LANG=C.  In short, this regression test does not
work in C locale, and you need to find a way to disable it in that
environment.

            regards, tom lane


Re: [COMMITTERS] pgsql: ICU support

От
Andrew Dunstan
Дата:

On 03/23/2017 11:53 PM, Tom Lane wrote:
> I wrote:
>> The buildfarm's mostly happy, but "prion" still isn't, which
>> suggests there's someplace in the new code that references an
>> already-closed relcache entry.
> Ah, scratch that, I was overthinking it.  The problem is pretty
> clear from the error message:
>
> + ERROR:  collation "en-x-icu" for encoding "SQL_ASCII" does not exist
>
> I can reproduce this with vanilla configure options if I'm running
> with --with-icu and LANG=C.  In short, this regression test does not
> work in C locale, and you need to find a way to disable it in that
> environment.



Possibly something like this:

diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index a747fac..d3a68cb 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -126,8 +126,12 @@ tablespace-setup:

 REGRESS_OPTS = --dlpath=. $(EXTRA_REGRESS_OPTS)
 ifeq ($(with_icu),yes)
+ifndef NO_LOCALE
+ifneq ($(LANG),C)
 override EXTRA_TESTS := collate.icu $(EXTRA_TESTS)
 endif
+endif
+endif

 check: all tablespace-setup
    $(pg_regress_check) $(REGRESS_OPTS)
--schedule=$(srcdir)/parallel_schedule $(MAXCONNOPT) $(EXTRA_TESTS)


cheers

andrew

--
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: ICU support

От
Tom Lane
Дата:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> On 03/23/2017 11:53 PM, Tom Lane wrote:
>> + ERROR:  collation "en-x-icu" for encoding "SQL_ASCII" does not exist
>>
>> I can reproduce this with vanilla configure options if I'm running
>> with --with-icu and LANG=C.  In short, this regression test does not
>> work in C locale, and you need to find a way to disable it in that
>> environment.

> Possibly something like this:

>  REGRESS_OPTS = --dlpath=. $(EXTRA_REGRESS_OPTS)
>  ifeq ($(with_icu),yes)
> +ifndef NO_LOCALE
> +ifneq ($(LANG),C)
>  override EXTRA_TESTS := collate.icu $(EXTRA_TESTS)
>  endif
> +endif
> +endif

That's better than nothing, certainly, but it doesn't catch all the ways
that C locale could be selected (LC_ALL, no valid setting for LANG, etc).

I suppose that what we really want to know is what encoding will be used
in the regression database.  I wonder whether the Makefile could find
that out more directly.  Although maybe it's time to bite the bullet
and teach pg_regress how to select whether or not to run the test ---
that would have the advantage of (probably) working on Windows, which
no amount of Makefile-hacking will do.

BTW, is there a reason that override isn't spelled more like

override EXTRA_TESTS += collate.icu

?  That seems more readable and idiomatic to me.

            regards, tom lane


Re: pgsql: ICU support

От
Andrew Dunstan
Дата:

On 03/24/2017 09:26 AM, Tom Lane wrote:
> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>> On 03/23/2017 11:53 PM, Tom Lane wrote:
>>> + ERROR:  collation "en-x-icu" for encoding "SQL_ASCII" does not exist
>>>
>>> I can reproduce this with vanilla configure options if I'm running
>>> with --with-icu and LANG=C.  In short, this regression test does not
>>> work in C locale, and you need to find a way to disable it in that
>>> environment.
>> Possibly something like this:
>>  REGRESS_OPTS = --dlpath=. $(EXTRA_REGRESS_OPTS)
>>  ifeq ($(with_icu),yes)
>> +ifndef NO_LOCALE
>> +ifneq ($(LANG),C)
>>  override EXTRA_TESTS := collate.icu $(EXTRA_TESTS)
>>  endif
>> +endif
>> +endif
> That's better than nothing, certainly, but it doesn't catch all the ways
> that C locale could be selected (LC_ALL, no valid setting for LANG, etc).
>
> I suppose that what we really want to know is what encoding will be used
> in the regression database.  I wonder whether the Makefile could find
> that out more directly.  Although maybe it's time to bite the bullet
> and teach pg_regress how to select whether or not to run the test ---
> that would have the advantage of (probably) working on Windows, which
> no amount of Makefile-hacking will do.


All true, although Peter hasn't actually committed a change that would
enable the test in MSVC builds, so that's not a danger right now.

While the change could be made more robust, along with 3 lines of change
in the buildfarm code to make sure LANG is set in appropriate places it
is enough to turn prion green, and actually running the tests when
testing en_US.UTF8.

I agree we need a general mechanism for running tests conditionally.
Let's not wait for that, though :-)


>
> BTW, is there a reason that override isn't spelled more like
>
> override EXTRA_TESTS += collate.icu
>
> ?  That seems more readable and idiomatic to me.
>
>

+1.

cheers

andrew

--
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: ICU support

От
David Rowley
Дата:
On 24 March 2017 at 06:33, Peter Eisentraut <peter_e@gmx.net> wrote:
> http://git.postgresql.org/pg/commitdiff/eccfef81e1f73ee41f1d8bfe4fa4e80576945048

[...]

> src/include/utils/pg_locale.h                    |   32 +-

I see this commit changed the definition of pg_locale_t

+struct pg_locale_t
+{
+   char    provider;
+   union
+   {
 #ifdef HAVE_LOCALE_T
-typedef locale_t pg_locale_t;
-#else
-typedef int pg_locale_t;
+       locale_t lt;
+#endif
+#ifdef USE_ICU
+       struct {
+           const char *locale;
+           UCollator *ucol;
+       } icu;
 #endif
+   } info;
+};

but forgot to update varstr_cmp() completely.

result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, mylocale);

should be:

result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, mylocale->info.lt);

Patch attached.


--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: pgsql: ICU support

От
Tom Lane
Дата:
David Rowley <david.rowley@2ndquadrant.com> writes:
> I see this commit changed the definition of pg_locale_t
> but forgot to update varstr_cmp() completely.

It's pretty disturbing that we don't seem to have caught that
in testing.  Some work on code coverage seems indicated.

            regards, tom lane


Re: pgsql: ICU support

От
Peter Eisentraut
Дата:
On 3/23/17 23:53, Tom Lane wrote:
> Ah, scratch that, I was overthinking it.  The problem is pretty
> clear from the error message:
>
> + ERROR:  collation "en-x-icu" for encoding "SQL_ASCII" does not exist
>
> I can reproduce this with vanilla configure options if I'm running
> with --with-icu and LANG=C.  In short, this regression test does not
> work in C locale, and you need to find a way to disable it in that
> environment.

Done.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pgsql: ICU support

От
Peter Eisentraut
Дата:
On 3/24/17 20:32, David Rowley wrote:
> but forgot to update varstr_cmp() completely.
>
> result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, mylocale);
>
> should be:
>
> result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, mylocale->info.lt);
>
> Patch attached.

Fixed.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pgsql: ICU support

От
Peter Eisentraut
Дата:
On 3/24/17 22:28, Tom Lane wrote:
> David Rowley <david.rowley@2ndquadrant.com> writes:
>> I see this commit changed the definition of pg_locale_t
>> but forgot to update varstr_cmp() completely.
>
> It's pretty disturbing that we don't seem to have caught that
> in testing.

Or during compilation!?!

> Some work on code coverage seems indicated.

That would require a Windows variant of collate.linux.utf8.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services