Re: UTF8 national character data type support WIP patch and list of open issues.

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: UTF8 national character data type support WIP patch and list of open issues.
Дата
Msg-id 592.1379524680@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: UTF8 national character data type support WIP patch and list of open issues.  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: UTF8 national character data type support WIP patch and list of open issues.  ("MauMau" <maumau307@gmail.com>)
Список pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Sep 16, 2013 at 8:49 AM, MauMau <maumau307@gmail.com> wrote:
>> 2. NCHAR/NVARCHAR columns can be used in non-UTF-8 databases and always
>> contain Unicode data.
>> ...
>> 3. Store strings in UTF-16 encoding in NCHAR/NVARCHAR columns.
>> Fixed-width encoding may allow faster string manipulation as described in
>> Oracle's manual.  But I'm not sure about this, because UTF-16 is not a real
>> fixed-width encoding due to supplementary characters.

> It seems to me that these two points here are the real core of your
> proposal.  The rest is just syntactic sugar.

> Let me start with the second one: I don't think there's likely to be
> any benefit in using UTF-16 as the internal encoding.  In fact, I
> think it's likely to make things quite a bit more complicated, because
> we have a lot of code that assumes that server encodings have certain
> properties that UTF-16 doesn't - specifically, that any byte with the
> high-bit clear represents the corresponding ASCII character.

Another point to keep in mind is that UTF16 is not really any easier
to deal with than UTF8, unless you write code that fails to support
characters outside the basic multilingual plane.  Which is a restriction
I don't believe we'd accept.  But without that restriction, you're still
forced to deal with variable-width characters; and there's nothing very
nice about the way that's done in UTF16.  So on the whole I think it
makes more sense to use UTF8 for this.

I share Robert's misgivings about difficulties in dealing with characters
that are not representable in the database's principal encoding.  Still,
you probably won't find out about many of those until you try it.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Sergey Konoplev
Дата:
Сообщение: Re: System catalog bloat removing safety
Следующее
От: Kevin Grittner
Дата:
Сообщение: Re: record identical operator