Proof of concept COLLATE support with patch

Поиск
Список
Период
Сортировка
От Martijn van Oosterhout
Тема Proof of concept COLLATE support with patch
Дата
Msg-id 20050902130420.GA15466@svana.org
обсуждение исходный текст
Ответы Re: Proof of concept COLLATE support with patch  (Greg Stark <gsstark@mit.edu>)
Re: Proof of concept COLLATE support with patch  (Martijn van Oosterhout <kleptog@svana.org>)
Re: Proof of concept COLLATE support with patch  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-hackers
Supports any glibc platform and possibly Win32.

Adds: SELECT ... ORDER BY expr COLLATE 'locale' CREATE INDEX locale_index ON table(expr COLLATE 'locale') Index scan
usedwhen COLLATE order permits 

This is just a proof of concept patch. I didn't send it to -patches
because as Tom pointed out, there's no hope of it getting in due to
platform dependant behaviour.

This patch does not use setlocale and is completely orthoganal to any
locale support already in the backend.

As it turns out, meaningful locale support only needs a handful of
support functions to work. These are listed at the bottom. My patch
only uses the first two, but the third will be needed at some stage.
The use of the last one depends on how the backend ends up support
locales. Both glibc and wine32 have locale sensetive versions of many
functions including:

toupper_l, tolower_l, strfmon_l, strtoul_l, strtof_l, strftime_l, is*_l

A windows function list is at:
http://msdn2.microsoft.com/library/wyzd2bce(en-us,vs.80).aspx

Patch available here:
http://svana.org/kleptog/pgsql/collate1.patch

Implementation notes follow and table of functions is at the bottom.

I hope this helps whenever someone gets around to full COLLATE support.

Have a nice day,

Notes:  * It works by replacing (expr COLLATE 'locale') with          pg_strxfrm(expr, pg_findlocale(locale))    in the
parsetree.
    pg_findlocale returns an opaque pointer to the locale. It is    STRICT IMMUTABLE and is optimised away in the final
query.
    pg_strxfrm takes the string and the locale and returns a bytea.     bytea comparison uses memcmp so is safe from
otherlocale effects    in the backend. 
  * Use of COLLATE for an index will probably double the diskspace    required for that index due to the strxfrm.
  * I had to add the functions to pg_proc.h because CREATE FUNCTION    couldn't find them. So they have OIDs I made up.
Youmay need to    initdb, I'm not sure. 
    You can compile pg_xlocale.c as an shared object and load them    that way too if you want to avoid the initdb.
  * Internally they are defined as taking and returning "internal".    CREATE FUNCTION doesn't like that so specify
opaqueor oid    instead. The declarations are: 
 create function pg_findlocale(text) returns oid as 'pg_findlocale' language internal strict immutable; create function
pg_strxfrm(text,oid)returns bytea as 'pg_strxfrm' language internal strict immutable; 
  * The clause ORDER BY 1 COLLATE 'en_AU' breaks, it treats the 1 like    a constant. I couldn't quickly work out how
toreference the    columns the right way. Long term that code should be in the    sorting code anyway. 
  * The locale needs to be in quotes, otherwise the parser converts it    to lower-case. Locale names are
case-sensetiveon many systems. 
  * There is a text function strcoll_l for testing collation:
 create function pg_strcoll_l(text,text,text) returns int4 as 'pg_strcoll_l' language internal strict immutable;
  * Yes this is the easy way out, implementing the inheritence of the    COLLATE attribute will be much more invasive.
Thisgives most    people what they want though. 
  * Although these functions are documented on Windows, they are not    for glibc, so it is an unstable insterface.

Function Needed                 glibc                 Win32
---------------------------------------------------------------------
Function returing opaque        newlocale             _create_locale
pointer to locale data

strxfrm with locale parameter   strxfrm_l             _strxfrm_l

Method finding encoding for     nl_langinfo_l         ???
locale

strcoll with locale parameter   strcoll_l             _strcoll_l

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Hannu Krosing
Дата:
Сообщение: Question about explain of index scan
Следующее
От: "Merlin Moncure"
Дата:
Сообщение: Re: Call for 7.5 feature completion