Обсуждение: [COMMITTERS] pgsql: ICU support
ICU support Add a column collprovider to pg_collation that determines which library provides the collation data. The existing choices are default and libc, and this adds an icu choice, which uses the ICU4C library. The pg_locale_t type is changed to a union that contains the provider-specific locale handles. Users of locale information are changed to look into that struct for the appropriate handle to use. Also add a collversion column that records the version of the collation when it is created, and check at run time whether it is still the same. This detects potentially incompatible library upgrades that can corrupt indexes and other structures. This is currently only supported by ICU-provided collations. initdb initializes the default collation set as before from the `locale -a` output but also adds all available ICU locales with a "-x-icu" appended. Currently, ICU-provided collations can only be explicitly named collations. The global database locales are still always libc-provided. ICU support is enabled by configure --with-icu. Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com> Reviewed-by: Andreas Karlsson <andreas@proxel.se> Branch ------ master Details ------- http://git.postgresql.org/pg/commitdiff/eccfef81e1f73ee41f1d8bfe4fa4e80576945048 Modified Files -------------- aclocal.m4 | 1 + config/pkg.m4 | 275 ++++++ configure | 313 ++++++ configure.in | 35 + doc/src/sgml/catalogs.sgml | 19 + doc/src/sgml/charset.sgml | 177 +++- doc/src/sgml/func.sgml | 17 + doc/src/sgml/installation.sgml | 15 + doc/src/sgml/mvcc.sgml | 3 +- doc/src/sgml/ref/alter_collation.sgml | 55 ++ doc/src/sgml/ref/create_collation.sgml | 37 +- src/Makefile.global.in | 4 + src/backend/Makefile | 2 +- src/backend/catalog/pg_collation.c | 52 +- src/backend/commands/collationcmds.c | 288 +++++- src/backend/common.mk | 2 + src/backend/nodes/copyfuncs.c | 13 + src/backend/nodes/equalfuncs.c | 11 + src/backend/parser/gram.y | 18 +- src/backend/regex/regc_pg_locale.c | 110 ++- src/backend/tcop/utility.c | 8 + src/backend/utils/adt/formatting.c | 453 +++++---- src/backend/utils/adt/like.c | 53 +- src/backend/utils/adt/pg_locale.c | 266 ++++- src/backend/utils/adt/selfuncs.c | 8 +- src/backend/utils/adt/varlena.c | 179 +++- src/backend/utils/mb/encnames.c | 76 ++ src/bin/initdb/initdb.c | 3 +- src/bin/pg_dump/pg_dump.c | 75 +- src/bin/pg_dump/t/002_pg_dump.pl | 2 +- src/bin/psql/describe.c | 7 +- src/include/catalog/pg_collation.h | 25 +- src/include/catalog/pg_collation_fn.h | 2 + src/include/catalog/pg_proc.h | 3 + src/include/commands/collationcmds.h | 1 + src/include/mb/pg_wchar.h | 6 + src/include/nodes/nodes.h | 1 + src/include/nodes/parsenodes.h | 11 + src/include/pg_config.h.in | 6 + src/include/utils/pg_locale.h | 32 +- src/test/regress/GNUmakefile | 3 + src/test/regress/expected/collate.icu.out | 1126 ++++++++++++++++++++++ src/test/regress/expected/collate.linux.utf8.out | 80 +- src/test/regress/sql/collate.icu.sql | 433 +++++++++ src/test/regress/sql/collate.linux.utf8.sql | 32 + 45 files changed, 3929 insertions(+), 409 deletions(-)
On 3/23/17 15:33, Peter Eisentraut wrote: > ICU support Seeing some crashes on the build farm, investigating. If someone can reproduce locally and get a backtrace, let me know. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 03/23/2017 04:07 PM, Peter Eisentraut wrote: > On 3/23/17 15:33, Peter Eisentraut wrote: >> ICU support > Seeing some crashes on the build farm, investigating. If someone can > reproduce locally and get a backtrace, let me know. > The buildfarm actually does that :-) See for example <https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2017-03-23%2019%3A37%3A22> which has stuff like: Program terminated with signal SIGSEGV, Segmentation fault. #0 Generic_Text_IC_like (collation=100, pat=0x10cf3f8, str=0x10cf388) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/like.c:197 197 if (pg_database_encoding_max_length() > 1 || locale->provider == COLLPROVIDER_ICU) #0 Generic_Text_IC_like (collation=100, pat=0x10cf3f8, str=0x10cf388) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/like.c:197 #1 texticlike (fcinfo=<optimized out>) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/like.c:394 #2 0x00000000005db832 in ExecMakeFunctionResultNoSets (fcache=0x106af48, econtext=0x106b828, isNull=0x7fff5884877c "") at/home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/executor/execQual.c:1866 #3 0x00000000005dfe76 in ExecEvalExprSwitchContext (expression=expression@entry=0x106af48, econtext=<optimized out>, isNull=isNull@entry=0x7fff5884877c"") at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/executor/execQual.c:4228 #4 0x0000000000671d29 in evaluate_expr (expr=<optimized out>, result_type=result_type@entry=16, result_typmod=result_typmod@entry=-1,result_collation=result_collation@entry=0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:4680 #5 0x00000000006732ab in evaluate_function (context=0x7fff58848aa0, func_tuple=0x7fb799a56048, funcvariadic=0 '\000', args=0x10cf8a0,input_collid=100, result_collid=0, result_typmod=-1, result_type=16, funcid=1633) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:4237 #6 simplify_function (funcid=1633, result_type=16, result_typmod=result_typmod@entry=-1, result_collid=0, input_collid=100,args_p=args_p@entry=0x7fff588488f8, funcvariadic=0 '\000', process_args=1 '\001', allow_non_const=1 '\001',context=0x7fff58848aa0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:3877 #7 0x0000000000673b4a in eval_const_expressions_mutator (node=0x10cf410, context=0x7fff58848aa0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:2584 #8 0x0000000000619be7 in expression_tree_mutator (node=node@entry=0x10cf460, mutator=mutator@entry=0x673a10 <eval_const_expressions_mutator>,context=context@entry=0x7fff58848aa0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/nodes/nodeFuncs.c:3017 #9 0x0000000000673a50 in eval_const_expressions_mutator (node=0x10cf460, context=0x7fff58848aa0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:3527 #10 0x0000000000619e4b in expression_tree_mutator (node=node@entry=0x10cf4b0, mutator=mutator@entry=0x673a10 <eval_const_expressions_mutator>,context=context@entry=0x7fff58848aa0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/nodes/nodeFuncs.c:2912 #11 0x0000000000673a50 in eval_const_expressions_mutator (node=0x10cf4b0, context=context@entry=0x7fff58848aa0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:3527 #12 0x000000000067574f in eval_const_expressions (root=root@entry=0x10cf5d0, node=<optimized out>) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:2378 #13 0x0000000000660d95 in preprocess_expression (root=root@entry=0x10cf5d0, expr=<optimized out>, kind=kind@entry=1) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/plan/planner.c:880 #14 0x000000000066519d in subquery_planner (glob=glob@entry=0x10cf198, parse=parse@entry=0x10cec38, parent_root=parent_root@entry=0x0,hasRecursion=hasRecursion@entry=0 '\000', tuple_fraction=tuple_fraction@entry=0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/plan/planner.c:593 #15 0x0000000000666321 in standard_planner (parse=0x10cec38, cursorOptions=256, boundParams=<optimized out>) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/optimizer/plan/planner.c:306 #16 0x00000000006efeec in pg_plan_query (querytree=querytree@entry=0x10cec38, cursorOptions=cursorOptions@entry=256, boundParams=boundParams@entry=0x0)at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/tcop/postgres.c:798 #17 0x00000000006effce in pg_plan_queries (querytrees=<optimized out>, cursorOptions=cursorOptions@entry=256, boundParams=boundParams@entry=0x0)at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/tcop/postgres.c:864 #18 0x00000000006f0467 in exec_simple_query (query_string=0x10cdd68 "SELECT 'hawkeye' ILIKE 'h%' AS \"true\";") at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/tcop/postgres.c:1029 #19 0x00000000006f17cb in PostgresMain (argc=<optimized out>, argv=argv@entry=0x10784a0, dbname=0x1078298 "regression", username=<optimizedout>) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/tcop/postgres.c:4071 #20 0x0000000000478ce0 in BackendRun (port=0x1073670) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:4317 #21 BackendStartup (port=0x1073670) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:3989 #22 ServerLoop () at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:1729 #23 0x00000000006907f7 in PostmasterMain (argc=argc@entry=8, argv=argv@entry=0x104c6e0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:1337 #24 0x0000000000479e98 in main (argc=8, argv=0x104c6e0) at /home/bf/bfr/root/HEAD/pgsql.build/../pgsql/src/backend/main/main.c:228 cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
The buildfarm's mostly happy, but "prion" still isn't, which suggests there's someplace in the new code that references an already-closed relcache entry. regards, tom lane
I wrote: > The buildfarm's mostly happy, but "prion" still isn't, which > suggests there's someplace in the new code that references an > already-closed relcache entry. Ah, scratch that, I was overthinking it. The problem is pretty clear from the error message: + ERROR: collation "en-x-icu" for encoding "SQL_ASCII" does not exist I can reproduce this with vanilla configure options if I'm running with --with-icu and LANG=C. In short, this regression test does not work in C locale, and you need to find a way to disable it in that environment. regards, tom lane
On 03/23/2017 11:53 PM, Tom Lane wrote: > I wrote: >> The buildfarm's mostly happy, but "prion" still isn't, which >> suggests there's someplace in the new code that references an >> already-closed relcache entry. > Ah, scratch that, I was overthinking it. The problem is pretty > clear from the error message: > > + ERROR: collation "en-x-icu" for encoding "SQL_ASCII" does not exist > > I can reproduce this with vanilla configure options if I'm running > with --with-icu and LANG=C. In short, this regression test does not > work in C locale, and you need to find a way to disable it in that > environment. Possibly something like this: diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile index a747fac..d3a68cb 100644 --- a/src/test/regress/GNUmakefile +++ b/src/test/regress/GNUmakefile @@ -126,8 +126,12 @@ tablespace-setup: REGRESS_OPTS = --dlpath=. $(EXTRA_REGRESS_OPTS) ifeq ($(with_icu),yes) +ifndef NO_LOCALE +ifneq ($(LANG),C) override EXTRA_TESTS := collate.icu $(EXTRA_TESTS) endif +endif +endif check: all tablespace-setup $(pg_regress_check) $(REGRESS_OPTS) --schedule=$(srcdir)/parallel_schedule $(MAXCONNOPT) $(EXTRA_TESTS) cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > On 03/23/2017 11:53 PM, Tom Lane wrote: >> + ERROR: collation "en-x-icu" for encoding "SQL_ASCII" does not exist >> >> I can reproduce this with vanilla configure options if I'm running >> with --with-icu and LANG=C. In short, this regression test does not >> work in C locale, and you need to find a way to disable it in that >> environment. > Possibly something like this: > REGRESS_OPTS = --dlpath=. $(EXTRA_REGRESS_OPTS) > ifeq ($(with_icu),yes) > +ifndef NO_LOCALE > +ifneq ($(LANG),C) > override EXTRA_TESTS := collate.icu $(EXTRA_TESTS) > endif > +endif > +endif That's better than nothing, certainly, but it doesn't catch all the ways that C locale could be selected (LC_ALL, no valid setting for LANG, etc). I suppose that what we really want to know is what encoding will be used in the regression database. I wonder whether the Makefile could find that out more directly. Although maybe it's time to bite the bullet and teach pg_regress how to select whether or not to run the test --- that would have the advantage of (probably) working on Windows, which no amount of Makefile-hacking will do. BTW, is there a reason that override isn't spelled more like override EXTRA_TESTS += collate.icu ? That seems more readable and idiomatic to me. regards, tom lane
On 03/24/2017 09:26 AM, Tom Lane wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> On 03/23/2017 11:53 PM, Tom Lane wrote: >>> + ERROR: collation "en-x-icu" for encoding "SQL_ASCII" does not exist >>> >>> I can reproduce this with vanilla configure options if I'm running >>> with --with-icu and LANG=C. In short, this regression test does not >>> work in C locale, and you need to find a way to disable it in that >>> environment. >> Possibly something like this: >> REGRESS_OPTS = --dlpath=. $(EXTRA_REGRESS_OPTS) >> ifeq ($(with_icu),yes) >> +ifndef NO_LOCALE >> +ifneq ($(LANG),C) >> override EXTRA_TESTS := collate.icu $(EXTRA_TESTS) >> endif >> +endif >> +endif > That's better than nothing, certainly, but it doesn't catch all the ways > that C locale could be selected (LC_ALL, no valid setting for LANG, etc). > > I suppose that what we really want to know is what encoding will be used > in the regression database. I wonder whether the Makefile could find > that out more directly. Although maybe it's time to bite the bullet > and teach pg_regress how to select whether or not to run the test --- > that would have the advantage of (probably) working on Windows, which > no amount of Makefile-hacking will do. All true, although Peter hasn't actually committed a change that would enable the test in MSVC builds, so that's not a danger right now. While the change could be made more robust, along with 3 lines of change in the buildfarm code to make sure LANG is set in appropriate places it is enough to turn prion green, and actually running the tests when testing en_US.UTF8. I agree we need a general mechanism for running tests conditionally. Let's not wait for that, though :-) > > BTW, is there a reason that override isn't spelled more like > > override EXTRA_TESTS += collate.icu > > ? That seems more readable and idiomatic to me. > > +1. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 24 March 2017 at 06:33, Peter Eisentraut <peter_e@gmx.net> wrote: > http://git.postgresql.org/pg/commitdiff/eccfef81e1f73ee41f1d8bfe4fa4e80576945048 [...] > src/include/utils/pg_locale.h | 32 +- I see this commit changed the definition of pg_locale_t +struct pg_locale_t +{ + char provider; + union + { #ifdef HAVE_LOCALE_T -typedef locale_t pg_locale_t; -#else -typedef int pg_locale_t; + locale_t lt; +#endif +#ifdef USE_ICU + struct { + const char *locale; + UCollator *ucol; + } icu; #endif + } info; +}; but forgot to update varstr_cmp() completely. result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, mylocale); should be: result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, mylocale->info.lt); Patch attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
David Rowley <david.rowley@2ndquadrant.com> writes: > I see this commit changed the definition of pg_locale_t > but forgot to update varstr_cmp() completely. It's pretty disturbing that we don't seem to have caught that in testing. Some work on code coverage seems indicated. regards, tom lane
On 3/23/17 23:53, Tom Lane wrote: > Ah, scratch that, I was overthinking it. The problem is pretty > clear from the error message: > > + ERROR: collation "en-x-icu" for encoding "SQL_ASCII" does not exist > > I can reproduce this with vanilla configure options if I'm running > with --with-icu and LANG=C. In short, this regression test does not > work in C locale, and you need to find a way to disable it in that > environment. Done. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3/24/17 20:32, David Rowley wrote: > but forgot to update varstr_cmp() completely. > > result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, mylocale); > > should be: > > result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, mylocale->info.lt); > > Patch attached. Fixed. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3/24/17 22:28, Tom Lane wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: >> I see this commit changed the definition of pg_locale_t >> but forgot to update varstr_cmp() completely. > > It's pretty disturbing that we don't seem to have caught that > in testing. Or during compilation!?! > Some work on code coverage seems indicated. That would require a Windows variant of collate.linux.utf8. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services