Re: chr() is still too loose about UTF8 code points

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: chr() is still too loose about UTF8 code points
Дата
Msg-id 5376404B.8090106@vmware.com
обсуждение исходный текст
Ответ на chr() is still too loose about UTF8 code points  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: chr() is still too loose about UTF8 code points
Re: chr() is still too loose about UTF8 code points
Список pgsql-hackers
On 05/16/2014 06:05 PM, Tom Lane wrote:
> Quite some time ago, we made the chr() function accept Unicode code points
> up to U+1FFFFF, which is the largest value that will fit in a 4-byte UTF8
> string.  It was pointed out to me though that RFC3629 restricted the
> original definition of UTF8 to only allow code points up to U+10FFFF (for
> compatibility with UTF16).  While that might not be something we feel we
> need to follow exactly, pg_utf8_islegal implements the checking algorithm
> specified by RFC3629, and will therefore reject points above U+10FFFF.
>
> This means you can use chr() to create values that will be rejected on
> dump and reload:
>
> u8=# create table tt (f1 text);
> CREATE TABLE
> u8=# insert into tt values(chr('x001fffff'::bit(32)::int));
> INSERT 0 1
> u8=# select * from tt;
>   f1
> ----
>
> (1 row)
>
> u8=# \copy tt to 'junk'
> COPY 1
> u8=# \copy tt from 'junk'
> ERROR:  22021: invalid byte sequence for encoding "UTF8": 0xf7 0xbf 0xbf 0xbf
> CONTEXT:  COPY tt, line 1
> LOCATION:  report_invalid_encoding, wchar.c:2011
>
> I think this probably means we need to change chr() to reject code points
> above 10ffff.  Should we back-patch that, or just do it in HEAD?

+1 for back-patching. A value that cannot be restored is bad, and I 
can't imagine any legitimate use case for producing a Unicode character 
larger than U+10FFFF with chr(x), when the rest of the system doesn't 
handle it. Fully supporting such values might be useful, but that's a 
different story.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: pg_basebackup: could not get transaction log end position from server: FATAL: could not open file "./pg_hba.conf~": Permission denied
Следующее
От: "Joshua D. Drake"
Дата:
Сообщение: Re: pg_basebackup: could not get transaction log end position from server: FATAL: could not open file "./pg_hba.conf~": Permission denied