Re: UTF8 national character data type support WIP patch and list of open issues.

Поиск
Список
Период
Сортировка
От Valentine Gogichashvili
Тема Re: UTF8 national character data type support WIP patch and list of open issues.
Дата
Msg-id CAP93muULVtyd-HQd=h2VOWzaPUrf2Z9efqXDJvmV0Xx3Auj16Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: UTF8 national character data type support WIP patch and list of open issues.  ("MauMau" <maumau307@gmail.com>)
Ответы Re: UTF8 national character data type support WIP patch and list of open issues.  ("MauMau" <maumau307@gmail.com>)
Список pgsql-hackers
Hi, 


That may be what's important to you, but it's not what's important to
me.

National character types support may be important to some potential users of PostgreSQL and the popularity of PostgreSQL, not me.  That's why national character support is listed in the PostgreSQL TODO wiki.  We might be losing potential users just because their selection criteria includes national character support.


the whole NCHAR appeared as hack for the systems, that did not have it from the beginning. It would not be needed, if all the text would be magically stored in UNICODE or UTF from the beginning and idea of character would be the same as an idea of a rune and not a byte.

PostgreSQL has a very powerful possibilities for storing any kind of encoding. So maybe it makes sense to add the ENCODING as another column property, the same way a COLLATION was added?

It would make it possible to have a database, that talks to the clients in UTF8 and stores text and varchar data in the encoding that is the most appropriate for the situation.

It will make it impossible (or complicated) to make the database have a non-UTF8 default encoding (I wonder who should need that in this case), as conversions will not be possible from the broader charsets into the default database encoding.

One could define an additional DATABASE property like LC_ENCODING that would work for the ENCODING property of a column like LC_COLLATE for COLLATE property of a column.

Text operations should work automatically, as in memory all strings will be converted to the database encoding.

This approach will also open a possibility to implement custom ENCODINGs for the column data storage, like snappy compression or even BSON, gobs or protbufs for much more compact type storage.

Regards, 

-- Valentine Gogichashvili

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Range types do not display in pg_stats
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: [RFC] Extend namespace of valid guc names