Re: character encoding in StartupMessage

Поиск
Список
Период
Сортировка
От Martijn van Oosterhout
Тема Re: character encoding in StartupMessage
Дата
Msg-id 20060228164527.GF535@svana.org
обсуждение исходный текст
Ответ на Re: character encoding in StartupMessage  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Tue, Feb 28, 2006 at 11:19:02AM -0500, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> >>> This may be the only solution. Converting everything to UTF-8 has
> >>> issues because some encodings are not roundtrip-safe
>
> >> Is this still true?
>
> > I beleive so. If use the ICU Converter Explorer [1] to examine some of
> > the encodings we support, they have "Contains ambiguous aliases? TRUE".
>
> Which ones, and are they client-only encodings?  If all our server-side
> encodings are round-trip safe then I think there's no big issue.
>
> In any case I don't think there's a huge problem if we say that database
> and user names had better be chosen from the round-trip-safe subset.

This is what it says here [1]:
 There are only 19 encodings currently used worldwide as legitimate POSIX multi-byte locale encodings:
   UTF-8, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-5, ISO-8859-6,   ISO-8859-7, ISO-8859-8, ISO-8859-9,
ISO-8859-13,ISO-8859-15,   EUC-JP, EUC-KR, GB2312 (= EUC-CN), KOI8-R, KOI8-U, VISCII,   WINDOWS-1251, WINDOWS-1256 
 Each of these is fully roundtrip compatible to ISO 10646, therefore all these locales can be represented nicely in
wchar_tas the equivalent UCS values. The above names and the corresponding defining documents are listed in the IANA
charsetregistry. 

Some of these have multiple definitions according to ICU meaning that
different platforms have implemented them differently in the past
(EUC-JP falls into this catagory), but presumably the IANA charset
registry has proper definitions.

Of the reminaing encodings we support, Big5 is OK, although the term
win-950 which is the windows version has changed over time. GBK has
same problem, win-936 has changed to over time. I don't think we should
concern ourselves with bugs in the windows encodings.

IOW, I think we are mostly safe.

[1] http://www.cl.cam.ac.uk/~mgk25/ucs/iso2022-wc.html
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Jim C. Nasby"
Дата:
Сообщение: Re: temporary indexes
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [PERFORM] temporary indexes