Re: Locale + encoding combinations

Поиск
Список
Период
Сортировка
От Trevor Talbot
Тема Re: Locale + encoding combinations
Дата
Msg-id 90bce5730710120603t1d10b20ld689ef41b201026b@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Locale + encoding combinations  (Dave Page <dpage@postgresql.org>)
Ответы Re: Locale + encoding combinations  (Dave Page <dpage@postgresql.org>)
Re: Locale + encoding combinations  (Magnus Hagander <magnus@hagander.net>)
Список pgsql-hackers
On 10/12/07, Dave Page <dpage@postgresql.org> wrote:
> Tom Lane wrote
> > That still leaves us with the problem of how to tell whether a locale
> > spec is bad on Windows.  Judging by your example, Windows checks whether
> > the code page is present but not whether it is sane for the base locale.
> > What happens when there's a mismatch --- eg, what encoding do system
> > messages come out in?
>
> I'm not sure how to test that specifically, but it seems that accented
> characters simply fall back to their undecorated equivalents if the
> encoding is not appropriate, eg:
>
> Dave@SNAKE:~$ ./setlc French_France.1252
> Locale: French_France.1252
> The date is: sam. 01 of août  2007
> Dave@SNAKE:~$ ./setlc French_France.28597
> Locale: French_France.28597
> The date is: sam. 01 of aout  2007
>
> (the encodings used there are WIN1252 and ISO8859-7 (Greek)).
>
> I'm happy to test further is you can suggest how I can figure out the
> encoding actually output.

The encoding output is the one you specified.  Keep in mind,
underneath Windows is mostly working with Unicode, so all characters
exist and the locale rules specify their behavior there.  The encoding
is just the byte stream it needs to force them all into after doing
whatever it does to them.  As you've seen, it uses some sort of
best-fit mapping I don't know the details of.  (It will drop accent
marks and choose characters with similar shape where possible, by
default.)

I think it's a bit more complex for input/transform cases where you
operate on the byte stream directly without intermediate conversion to
Unicode, which is why UTF-8 doesn't work as a codepage, but again I
don't have the details nearby.  I can try to do more digging if
needed.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Gregory Stark
Дата:
Сообщение: Re: Locales and Encodings
Следующее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: Locales and Encodings