Обсуждение: Collation in ORDER BY not lexicographical

Поиск
Список
Период
Сортировка

Collation in ORDER BY not lexicographical

От
Paul Gaspar
Дата:
Hi!

We have big problems with collation in ORDER BY, which happens in
binary order, not alphabetic (lexicographical), like:.

A
B
Z
a
z
Ä
Ö
ä
ö

This is what we have done:

1. initdb -U myuser -E UTF-8 --locale=de_DE.UTF-8 -D /mydata

2. pg_ctl -U myuser -D /mydata -l logfile start

3. createdb -U myuser mydb

4. create table mytable(name text)

5.  insert into mytable values('Adam'); ....

6. select * from mytable order by name

The result is:

"Adam"
"Berta"
"Mann"
"Zoo"
"Zoodirektor"
"adam"
"Äpfel"
"Öl"
"äpfel"

"locale -a" shows:

de_DE
de_DE.ISO8859-1
de_DE.ISO8859-15
de_DE.UTF-8

"psql -l" shows:

                                            List of databases
    Name       |     Owner      | Encoding |  Collation  |    Ctype
|         Access privileges
----------+-----------+----------+-------------+-------------
+-----------------------------------
  postgres  | myuser | UTF8     | de_DE.UTF-8 | de_DE.UTF-8 |
  template0 | myuser | UTF8     | de_DE.UTF-8 | de_DE.UTF-8 | =c/myuser
: myuser =CTc/myuser
  template1 | myuser | UTF8     | de_DE.UTF-8 | de_DE.UTF-8 | =c/myuser
                                                                    :
myuser =CTc/myuser
  mydb      | myuser | UTF8     | de_DE.UTF-8 | de_DE.UTF-8 |

PG is running on Mac OS X 10.5 and 10.6 Intel.


Any help is appreciated. Thanks very much in advance.

Paul

Re: Collation in ORDER BY not lexicographical

От
Scott Marlowe
Дата:
On Tue, Sep 29, 2009 at 2:52 AM, Paul Gaspar <devlist@revolversoft.com> wrote:
> Hi!
>
> We have big problems with collation in ORDER BY, which happens in binary
> order, not alphabetic (lexicographical), like:.
>
> A
> B
> Z
> a
> z
> Ä
> Ö
> ä
> ö
>

> PG is running on Mac OS X 10.5 and 10.6 Intel.

I seem to recall there were some problem with Mac locales at some
point being broken.  Could be you're running into that issue.

Re: Collation in ORDER BY not lexicographical

От
Maximilian Tyrtania
Дата:
am 29.09.2009 11:21 Uhr schrieb Scott Marlowe unter scott.marlowe@gmail.com:

> On Tue, Sep 29, 2009 at 2:52 AM, Paul Gaspar <devlist@revolversoft.com> wrote:
>> Hi!
>>
>> We have big problems with collation in ORDER BY, which happens in binary
>> order, not alphabetic (lexicographical), like:.
>>
>> A
>> B
>> Z
>> a
>> z
>> Ä
>> Ö
>> ä
>> ö
>>
>
>> PG is running on Mac OS X 10.5 and 10.6 Intel.
>
> I seem to recall there were some problem with Mac locales at some
> point being broken.  Could be you're running into that issue.

Yep, i ran into this as well. Here is my workaround: Create a function like
this:

CREATE OR REPLACE FUNCTION f_getorderbyfriendlyversion(texttoconvert text)

RETURNS text AS
$BODY$
select
replace(replace(replace(replace(replace(replace($1,'Ä','A'),'Ö','O'),'Ü','U'
),'ä','a'),'ö','o'),'ü','u');

$BODY$

LANGUAGE 'sql' IMMUTABLE STRICT
  COST 100;

ALTER FUNCTION f_getorderbyfriendlyversion(text) OWNER TO postgres;

Then create an index like this:

create index idx_personen_nachname_orderByFriendly on personen
(f_getorderbyfriendlyversion(nachname))


Now you can do:

select * from personen order by f_getorderbyfriendlyversion(p.nachname)

Seems pretty fast.

Best,

Maximilian Tyrtania



Re: Collation in ORDER BY not lexicographical

От
Peter Eisentraut
Дата:
On Tue, 2009-09-29 at 03:21 -0600, Scott Marlowe wrote:
> On Tue, Sep 29, 2009 at 2:52 AM, Paul Gaspar <devlist@revolversoft.com> wrote:
> > Hi!
> >
> > We have big problems with collation in ORDER BY, which happens in binary
> > order, not alphabetic (lexicographical), like:.
> >
> > A
> > B
> > Z
> > a
> > z
> > Ä
> > Ö
> > ä
> > ö
> >
>
> > PG is running on Mac OS X 10.5 and 10.6 Intel.
>
> I seem to recall there were some problem with Mac locales at some
> point being broken.  Could be you're running into that issue.

Yes, the UTF8 locales on BSD systems (Mac OS X, FreeBSD, etc.) are
dysfunctional.  Either switch to a non-UTF8 locale or a different
operating system.


Re: Collation in ORDER BY not lexicographical

От
Paul Gaspar
Дата:
Thank you all very much for your help.

Maximilian, we simplified your replacing code:

> replace(replace(replace(replace(replace(replace
> ($1,'Ä','A'),'Ö','O'),'Ü','U'
> ),'ä','a'),'ö','o'),'ü','u');



to this:

translate(upper($1),'ÄÖÜ','AOU')


Paul






Am 29.09.2009 um 14:36 schrieb Maximilian Tyrtania:

> am 29.09.2009 11:21 Uhr schrieb Scott Marlowe unter scott.marlowe@gmail.com
> :
>
>> On Tue, Sep 29, 2009 at 2:52 AM, Paul Gaspar <devlist@revolversoft.com
>> > wrote:
>>> Hi!
>>>
>>> We have big problems with collation in ORDER BY, which happens in
>>> binary
>>> order, not alphabetic (lexicographical), like:.
>>
>>> PG is running on Mac OS X 10.5 and 10.6 Intel.
>>
>> I seem to recall there were some problem with Mac locales at some
>> point being broken.  Could be you're running into that issue.
>
> Yep, i ran into this as well. Here is my workaround: Create a
> function like
> this:
>
> CREATE OR REPLACE FUNCTION f_getorderbyfriendlyversion(texttoconvert
> text)
>
> RETURNS text AS
> $BODY$
> select
> replace(replace(replace(replace(replace(replace
> ($1,'Ä','A'),'Ö','O'),'Ü','U'
> ),'ä','a'),'ö','o'),'ü','u');
>
> $BODY$
>
> LANGUAGE 'sql' IMMUTABLE STRICT
> COST 100;
>
> ALTER FUNCTION f_getorderbyfriendlyversion(text) OWNER TO postgres;
>
> Then create an index like this:
>
> create index idx_personen_nachname_orderByFriendly on personen
> (f_getorderbyfriendlyversion(nachname))
>
>
> Now you can do:
>
> select * from personen order by f_getorderbyfriendlyversion
> (p.nachname)
>
> Seems pretty fast.
>
> Best,
>
> Maximilian Tyrtania