Re: chr() is still too loose about UTF8 code points

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: chr() is still too loose about UTF8 code points
Дата	16 мая 2014 г. 20:21:31
Msg-id	16094.1400260886@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Re: chr() is still too loose about UTF8 code points (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список	pgsql-hackers

Дерево обсуждения

Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> On 05/16/2014 06:05 PM, Tom Lane wrote:
>> I think this probably means we need to change chr() to reject code points
>> above 10ffff.  Should we back-patch that, or just do it in HEAD?

> +1 for back-patching. A value that cannot be restored is bad, and I 
> can't imagine any legitimate use case for producing a Unicode character 
> larger than U+10FFFF with chr(x), when the rest of the system doesn't 
> handle it. Fully supporting such values might be useful, but that's a 
> different story.

Well, AFAICT "the rest of the system" does handle any code point up to
U+1FFFFF.  It's only pg_utf8_islegal that's being picky.  So another
possible answer is to weaken the check in pg_utf8_islegal.  However,
that could create interoperability concerns with other software, and
as you say the use-case for larger values seems pretty thin.

Actually, after re-reading the spec there's more to it than this:
chr() will allow creating utf8 sequences that correspond to the
surrogate-pair codes, which are expressly disallowed in UTF8 by
the RFCs.  Maybe we should apply pg_utf8_islegal to the result
string rather than duplicating its checks?

BTW, there are various places that have comments or ifdefd-out code
anticipating possible future support of 5- or 6-byte UTF8 sequences,
which were specified in RFC2279 but then rescinded by RFC3629.
I guess as a matter of cleanup we should think about removing that
stuff.
        regards, tom lane

В списке pgsql-hackers по дате отправления:

Предыдущее

От: David G Johnston
Дата: 16 мая 2014 г., 20:12:04
Сообщение: Re: pg_basebackup: could not get transaction log end position from server: FATAL: could not open file "./pg_hba.conf~": Permission denied

Следующее

От: Noah Misch
Дата: 16 мая 2014 г., 20:39:18
Сообщение: Re: chr() is still too loose about UTF8 code points

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: chr() is still too loose about UTF8 code points

Предыдущее

Следующее