Re: [pgsql-packagers] Palle Girgensohn's ICU patch

Поиск
Список
Период
Сортировка
От Palle Girgensohn
Тема Re: [pgsql-packagers] Palle Girgensohn's ICU patch
Дата
Msg-id 0ECFF0FA-2D9C-46D4-BEF8-34C7A5215FDC@pingpong.net
обсуждение исходный текст
Ответ на Re: [pgsql-packagers] Palle Girgensohn's ICU patch  (Dave Page <dpage@pgadmin.org>)
Ответы Re: [pgsql-packagers] Palle Girgensohn's ICU patch  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-hackers
> 27 nov 2014 kl. 10:15 skrev Dave Page <dpage@pgadmin.org>:
>
>
>
> On Thu, Nov 27, 2014 at 9:09 AM, Jakob Egger <jakob@eggerapps.at> wrote:
> Am 26.11.2014 um 17:46 schrieb Geoff Montee <geoff.montee@gmail.com>:
> > This topic reminds me of a thread from a couple months ago:
> >
> > http://www.postgresql.org/message-id/F8268DB6-B50F-429F-8289-DA8FFA5F22BA@tripadvisor.com
> >
> > It sounds like adding ICU support to core may also allow for adding
> > collation versioning to indexes.
>
> Reading through this thread it becomes clear to me that adding support for ICU is more important than I thought, and
theonly problem is that no one has yet volunteered for it :) 
>
> I've started looking through the PostgreSQL source and Palle's patch to estimate what needs to be done.
>
> MINIMUM TODO
> ============
>
> * Add support for per-column collations in varstr_comp() in varlena.c. Currently the patch creates a single ICU
collatorfor the default collation and stores it in a static variable. We would need to change this to create collators
foreach collation and store them in a hash table similar to pg_newlocale_from_collation() / lookup_collation_cache() 
>
> * There's a new feature in trunk for faster sorting using SortSupport, so we would also need to also patch
bttextfastcmp_locale()in varlena.c 
>
> These two changes would allow using ICU for collation. This has two major advantages:
> 1) Systems with broken strcoll like OS X and FreeBSD can take advantage of ICU to offer proper text sorting
> 2) You can link with a specific version of ICU to avoid index corruption and duplicate keys caused by changing
implementationsof the glibc strcoll function 
>
>
> NEXT STEPS: Support for more collations
> =======================================
>
> ICU offers a lot more collations than the OS. For example, besides "de_CH" it also offers
"de_CH@collation=phonebook".Adding support for these is a bit more involved. 
>
> * initdb would need to be extended to also look for collations offered by ICU and add them to the pg_collation
catalog.
>
> * A special case for LC_COLLATE must be added to check_locale() in the backend, get_canonical_locale_name() in
pg_upgrade,check_locale_name() in initdb to support collations provided by ICU 
>
> * pg_perm_setlocale() must get a special case to handle ICU collations
>
> * the local handling code in pgperl must be modified (when using a ICU collation as default collation, we must decide
whatcollation to send to perl) 
>
> * convert_string_datum() in selfuncs.c could be patched to use ICU instead of strxfrm. However, as far as I
understand,this is not absolutely required as this is only used by the query planner and would in the worst case
preventsome optimisation in corner cases 
>
> These changes would probably have an even bigger impact, because then people would no longer be limited to the
collationssupported by the locales installed on their OS. 
>
> NEXT STEPS: Collation versioning in indices
> ===========================================
>
> Since ICU provides reliable versioning of collations, this would allow us to finally prevent index corruption caused
bychanging implementations of strcoll. I haven't looked at this in detail, but I assume that this would be a small
changewith potentially big impact. 
>
> Ideally, PostgreSQL would detect when the collation is a different version than the one used to create the index, and
stopusing the index until it is rebuilt. 
>
>
> I'll take a shot at the MINIMUM TODO as outlined above.
>
>
> We've already included ICU support in our Postgres Plus Advanced Server product. Before you spend too much time on
this,give me a few days to see if we can get that change contributed back. The people I need to speak to are OOO for
Thanksgivingat the moment though, so it may be a few days. 
>
> --


Hi,

Just poking this old thread again. What happened here, is anyone putting work into this area at the moment?

Palle



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: deparsing utility commands
Следующее
От: Michael Paquier
Дата:
Сообщение: RewindTest.pm missing --debug for remote mode in pg_rewind tests