Re: The "char" type versus non-ASCII characters

Поиск

Список

Период

Сортировка

От	Andrew Dunstan
Тема	Re: The "char" type versus non-ASCII characters
Дата	3 декабря 2021 г. 22:35:03
Msg-id	c44b31d4-044a-0e45-1a98-995517b47df7@dunslane.net обсуждение исходный текст
Ответ на	The "char" type versus non-ASCII characters (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: The "char" type versus non-ASCII characters (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-hackers

Дерево обсуждения

On 12/3/21 14:12, Tom Lane wrote:
> [ breaking off a different new thread ]
>
> Chapman Flack <chap@anastigmatix.net> writes:
>> Then there's "char". It's category S, but does not apply the server
>> encoding. You could call it an 8-bit int type, but it's typically used
>> as a character, making it well-defined for ASCII values and not so
>> for others, just like SQL_ASCII encoding. You could as well say that
>> the "char" type has a defined encoding of SQL_ASCII at all times,
>> regardless of the database encoding.
> This reminds me of something I've been intending to bring up, which
> is that the "char" type is not very encoding-safe.  charout() for
> example just regurgitates the single byte as-is.  I think we deemed
> that okay the last time anyone thought about it, but that was when
> single-byte encodings were the mainstream usage for non-ASCII data.
> If you're using UTF8 or another multi-byte server encoding, it's
> quite easy to get an invalidly-encoded string this way, which at
> minimum is going to break dump/restore scenarios.
>
> I can think of at least three ways we might address this:
>
> * Forbid all non-ASCII values for type "char".  This results in
> simple and portable semantics, but it might break usages that
> work okay today.
>
> * Allow such values only in single-byte server encodings.  This
> is a bit messy, but it wouldn't break any cases that are not
> problematic already.
>
> * Continue to allow non-ASCII values, but change charin/charout,
> char_text, etc so that the external representation is encoding-safe
> (perhaps make it an octal or decimal number).
>
> Either of the first two ways would have to contemplate what to do
> with disallowed values that snuck into the DB via pg_upgrade.
> That leads me to think that the third way might be the most
> preferable, even though it's not terribly backward-compatible.
>


I don't like #2. Is #3 going to change the external representation only
for non-ASCII values? If so, that seems OK.  Changing it for ASCII
values seems ugly. #1 is the simplest to implement and to understand,
and I suspect it would break very little in practice, but others might
disagree with that assessment.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tom Lane
Дата: 03 декабря 2021 г., 22:12:10
Сообщение: The "char" type versus non-ASCII characters

Следующее

От: Tom Lane
Дата: 03 декабря 2021 г., 22:42:11
Сообщение: Re: The "char" type versus non-ASCII characters

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: The "char" type versus non-ASCII characters

Предыдущее

Следующее