Re: Pre-proposal: unicode normalized text

Поиск

Список

Период

Сортировка

От	Nico Williams
Тема	Re: Pre-proposal: unicode normalized text
Дата	2 ноября 2023 г. 22:38:47
Msg-id	ZUQk99GFe787WC/z@ubby21 обсуждение исходный текст
Ответ на	Re: Pre-proposal: unicode normalized text (Robert Haas <robertmhaas@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Fri, Oct 06, 2023 at 02:37:06PM -0400, Robert Haas wrote:
> > Sure, because TEXT in PG doesn't have codeset+encoding as part of it --
> > it's whatever the database's encoding is.  Collation can and should be a
> > porperty of a column, since for Unicode it wouldn't be reasonable to
> > make that part of the type.  But codeset+encoding should really be a
> > property of the type if PG were to support more than one.  IMO.
> 
> No, what I mean is, you can't just be like "oh, the varlena will be
> different in memory than on disk" as if that were no big deal.

It would have to be the same in memory as on disk, indeed, but you might
need new types in C as well for that.

> I agree that, as an alternative to encoding being a column property,
> it could instead be completely a type property, meaning that if you
> want to store, say, LATIN1 text in your UTF-8 database, you first
> create a latint1text data type and then use it, rather than, as in the
> model I proposed, creating a text column and then applying a setting
> like ENCODING latin1 to it. I think that there might be some problems

Yes, that was the idea.

> with that model, but it could also have some benefits. [...]

Mainly, I think, whether you want PG to do automatic codeset conversions
(ugly and problematic) or not, like for when using text functions.

Automatic codeset conversions are problematic because a) it can be lossy
(so what to do when it is?) and b) automatic type conversions can be
surprising.

Ultimately the client would have to do its own codeset conversions, if
it wants them, or treat text in codesets other than its local one as
blobs and leave it for a higher app layer to deal with.

I wouldn't want to propose automatic codeset conversions.  If you'd want
that then you might as well declare it has to all be UTF-8 and say no to
any other codesets.

> But, even if we were all convinced that this kind of feature was good
> to add, I think it would almost certainly be wrong to invent new
> varlena features along the way.

Yes.

Nico
--

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Pre-proposal: unicode normalized text