Re: [GENERAL] How well does PostgreSQL 9.6.1 support unicode?

Поиск
Список
Период
Сортировка
От Vick Khera
Тема Re: [GENERAL] How well does PostgreSQL 9.6.1 support unicode?
Дата
Msg-id CALd+dcfA2-p2CquiokLPxQKWzFP-ggtQ7uqcab3ozYsdajkGAQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [GENERAL] How well does PostgreSQL 9.6.1 support unicode?  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Ответы Re: [GENERAL] How well does PostgreSQL 9.6.1 support unicode?
Список pgsql-general

On Wed, Dec 21, 2016 at 2:56 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> A PostgreSQL database with encoding=UTF8 just accepts the whole
> range of Unicode, regardless that a character is defined for the
> code or not.

Interesting... when I converted my application and database to utf8 encoding, I discovered that Postgres is picky about UTF-8. Specifically the UTF-8 code point 0xed 0xa0 0x8d which maps to UNICODE code point 0xd80d. This looks like a proper character but in fact is not a defined character code point.

Given the above unicode table:

insert into unicode(id, string) values(1, E'\xed\xa0\x8d');
ERROR:  invalid byte sequence for encoding "UTF8": 0xed 0xa0 0x8d

So I think when you present an actual string of UTF8 encoded characters, Postgres does refuse characters unknown. However, as you observe, inserting the unicode code point directly does not produce an error:

insert into unicode(id, string) values(1, U&'\d80d');
INSERT 0 1

I discovered this when that specific byte sequence was found in my database during the conversion. I have no idea what my customer entered in the form to make that sequence, but it was part of the Vietnamese spelling of Ho Chi Minh City as best I could figure.

В списке pgsql-general по дате отправления:

Предыдущее
От: Yogesh Sharma
Дата:
Сообщение: Re: [GENERAL] Request to share approach during REINDEX operation
Следующее
От: Vick Khera
Дата:
Сообщение: Re: [GENERAL] Request to share approach during REINDEX operation