Re: The "char" type versus non-ASCII characters

Поиск
Список
Период
Сортировка
От Nikolay Shaplov
Тема Re: The "char" type versus non-ASCII characters
Дата
Msg-id 4569234.kDpM2PENKo@thinkpad-pgpro
обсуждение исходный текст
Ответ на The "char" type versus non-ASCII characters  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: The "char" type versus non-ASCII characters  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
В письме от пятница, 3 декабря 2021 г. 22:12:10 MSK пользователь Tom Lane
написал:
> which
> is that the "char" type is not very encoding-safe.  charout() for
> example just regurgitates the single byte as-is.  I think we deemed
> that okay the last time anyone thought about it, but that was when
> single-byte encodings were the mainstream usage for non-ASCII data.
> If you're using UTF8 or another multi-byte server encoding, it's
> quite easy to get an invalidly-encoded string this way, which at
> minimum is going to break dump/restore scenarios.

As I've mentioned in another thread I've been very surprised when I first  saw
"char" type name. And I was also very confused.

This leads me to an idea that may be as we fix "char" behaviour, we should also
change it's name to something more speaking for itself. Like ascii_char or
something like that.
Or better to add ascii_char with behaviour we need, update system tables with
it, and keep "char" with old behaviour in "deprecated" status in the case
somebody still using it. To give them time to change it to something more
decent: ascii_char or char(1).

I've also talked to a guy who knows postgres history very well, he told me
that "char" existed at least from portgres version 3.1, it also had "char16",
and in v.4  "char2", "char4", "char8" were added. But later on they was all
removed, and we have only "char".

Aslo "char" has nothing in common with SQL standard. Actually it looks very
unnaturally.  May be it is time to get rid of it too, if we are changing this
part of code...

> I can think of at least three ways we might address this:
>
> * Forbid all non-ASCII values for type "char".  This results in
> simple and portable semantics, but it might break usages that
> work okay today.
>
> * Allow such values only in single-byte server encodings.  This
> is a bit messy, but it wouldn't break any cases that are not
> problematic already.
>
> * Continue to allow non-ASCII values, but change charin/charout,
> char_text, etc so that the external representation is encoding-safe
> (perhaps make it an octal or decimal number).

This will give us steady #1 for ascii_char, and deprecation and removal of
"char" later on.

--
Nikolay Shaplov aka Nataraj
Fuzzing Engineer at Postgres Professional
Matrix IM: @dhyan:nataraj.su

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: remove reset_shared()
Следующее
От: Tom Lane
Дата:
Сообщение: Re: The "char" type versus non-ASCII characters