Re: Bug in UTF8-Validation Code?

Поиск
Список
Период
Сортировка
От Mark Dilger
Тема Re: Bug in UTF8-Validation Code?
Дата
Msg-id 46117E27.9030703@markdilger.com
обсуждение исходный текст
Ответ на Re: Bug in UTF8-Validation Code?  (Mark Dilger <pgsql@markdilger.com>)
Список pgsql-hackers
Mark Dilger wrote:
> Tom Lane wrote:
>> Mark Dilger <pgsql@markdilger.com> writes:
>>>> pgsql=# select chr(14989485);
>>>> chr
>>>> -----
>>>> 中
>>>> (1 row)
>>
>> Is there a principled rationale for this particular behavior as
>> opposed to any other?
>>
>> In particular, in UTF8 land I'd have expected the argument of chr()
>> to be interpreted as a Unicode code point, not as actual UTF8 bytes
>> with a randomly-chosen endianness.
>>
>> Not sure what to do in other multibyte encodings.
> 
> "Not sure what to do in other multibyte encodings" was pretty much my 
> rationale for this particular behavior.  I standardized on network byte 
> order because there are only two endianesses to choose from, and the 
> other seems to be a more surprising choice.
> 
> I looked around on the web for a standard for how to convert an integer 
> into a valid multibyte character and didn't find anything.  Andrew, 
> Supernews has said upthread that chr() is clearly wrong and needs to be 
> fixed. If so, we need some clear definition what "fixed" means.
> 
> Any suggestions?
> 
> mark

Another issue to consider when thinking about the corect definition of chr() is 
that ascii(chr(X)) = X.  This gets weird if X is greater than 255.  If nothing 
else, the name "ascii" is no longer appropriate.

mark


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mark Dilger
Дата:
Сообщение: Re: Bug in UTF8-Validation Code?
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: Interaction of PITR backups and Bulkoperationsavoiding WAL