Re: UTF16 surrogate pairs in UTF8 encoding

Поиск
Список
Период
Сортировка
От Peter Eisentraut
Тема Re: UTF16 surrogate pairs in UTF8 encoding
Дата
Msg-id 1283942118.18999.1.camel@fsopti579.F-Secure.com
обсуждение исходный текст
Ответ на Re: UTF16 surrogate pairs in UTF8 encoding  (Marko Kreen <markokr@gmail.com>)
Ответы Re: UTF16 surrogate pairs in UTF8 encoding  (Marko Kreen <markokr@gmail.com>)
Список pgsql-hackers
On ons, 2010-09-08 at 10:18 +0300, Marko Kreen wrote:
> On 9/7/10, Peter Eisentraut <peter_e@gmx.net> wrote:
> > On sön, 2010-08-22 at 15:15 -0400, Tom Lane wrote:
> >  > > We combine the surrogate pair components to a single code point and
> >  > > encode that in UTF-8.  We don't encode the components separately;
> >  > that
> >  > > would be wrong.
> >  >
> >  > Oh, OK.  Should the docs make that a bit clearer?
> >
> >
> > Done.
> 
> This is confusing:
> 
>  (When surrogate
>  pairs are used when the server encoding is <literal>UTF8</>, they
>  are first combined into a single code point that is then encoded
>  in UTF-8.)
> 
> So something else happens if encoding is not UTF8?

Then you can't specify surrogate pairs because they are outside of the
ASCII range, per constraint mentioned earlier in the paragraph.

> I think this part can be simply removed, it does not add anything.
> 
> Or say that surrogate pairs are only allowed in UTF8 encoding.
> Reason is that you cannot encode 0..7F codepoints with them,
> and only those are allowed to be given numerically.  But this is
> already mentioned before.

Well, Tom wanted an additional explanation.  I personally agree with
you; this is not the place to explain encoding and Unicode internals,
when really the code only does what it's supposed to.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Boszormenyi Zoltan
Дата:
Сообщение: Re: Synchronization levels in SR
Следующее
От: Fujii Masao
Дата:
Сообщение: Re: Synchronization levels in SR