Re: locale

Поиск
Список
Период
Сортировка
От Dennis Bjorklund
Тема Re: locale
Дата
Msg-id Pine.LNX.4.44.0404081729510.4551-100000@zigo.dhs.org
обсуждение исходный текст
Ответ на Re: locale  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: locale  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Thu, 8 Apr 2004, Tom Lane wrote:

> No, the ordering *will* be the same as it was before, because strcoll()
> is still functioning the same.  You'd get the same answer from a sort
> operation since it depends on the same operators.
> 
> It interprets them according to LC_CTYPE, which does not change.

I'm afraid that I don't understand you yet, and would like to have
it explained in more detail if possible. While I feel a bit stupid to not 
understand what you are stating, but I'm sure there are more then me who 
feels like that :-)

Maybe we can look at an example. Let us take some utf-8 strings correctly
ordered in swedish
 Åke Ära

now, since these are utf-8 they are encoded as
 c3 85 6b 65        (Åke) c3 84 72 61        (Ära)

and that is the order they have in the index.

Now, this index is copied into a new database where
the encoding is Latin1. Now we want to in the above table
lookup the string that in Latin1 is represented as
  c3 84 72 61

So we look in the index and see that the first row in the index is
not the same. But, now when we compare these strings as latin1 strings
it's no longer the case that c3 84 72 61 > c3 85 6b 65. As latin1 strings
we compare each character and c3 = c3, and then 84 < 85 (in latin1 84
and 85 are some control characters). Se, we will not find this string
in the index since we think it should have been before the first entry.

We might even insert a new copy of this string in another
position in the index.

So, my question is.

a) What have we gained by copying this table into the latin1 database.  It looks broken to me. As far as I understand
wehave to rebuild  the index to get something that works at least a little.
 

b) Maybe one should not just reindex but reencode. In some cases that  works and produces good result. For example from
latin1to utf-8.
 

c) if we are going to reindex anyway, then why not do that and solve the  per database locale also. This is an
independentpoint from a) and b)  that I still want to understand the first two points even if we don't  talk about per
databaselocale.
 


-- 
/Dennis Björklund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Joseph Tate
Дата:
Сообщение: Re: PostgreSQL configuration
Следующее
От: Tom Lane
Дата:
Сообщение: Re: PostgreSQL configuration