Re: UTF8 national character data type support WIP patch and list of open issues.

Поиск
Список
Период
Сортировка
От Valentine Gogichashvili
Тема Re: UTF8 national character data type support WIP patch and list of open issues.
Дата
Msg-id CAP93muVF=baHDtRs1JBPS3A85j6jRTjEUDMBjr=Voa-xFym4qg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: UTF8 national character data type support WIP patch and list of open issues.  ("MauMau" <maumau307@gmail.com>)
Список pgsql-hackers

PostgreSQL has a very powerful possibilities for storing any kind of
encoding. So maybe it makes sense to add the ENCODING as another column
property, the same way a COLLATION was added?

Some other people in this community suggested that.  ANd the SQL standard suggests the same -- specifying a character encoding for each column: CHAR(n) CHARASET SET ch.


Text operations should work automatically, as in memory all strings will be
converted to the database encoding.

This approach will also open a possibility to implement custom ENCODINGs
for the column data storage, like snappy compression or even BSON, gobs or
protbufs for much more compact type storage.

Thanks for your idea that sounds interesting, although I don't understand that well.


The idea is very simple: 

CREATE DATABASE utf8_database ENCODING 'utf8';

\c utf8_database

CREATE TABLE a(
  id serial,
  ascii_data text ENCODING 'ascii', -- will use ascii_to_utf8 to read and utf8_to_ascii to write
  koi8_data text ENCODING 'koi8_r', -- will use koi8_r_to_utf8 to read and utf8_to_koi8_r to write
  json_data json ENCODING 'bson' -- will use bson_to_json to read and json_to_bson to write
);

The problem with bson_to_json here is that probably it will not be possible to write JSON in koi8_r for example. But now it is also even not considered in these discussions.

If the ENCODING machinery would get not only the encoding name, but also the type OID, it should be possible to write encoders for TYPEs and array of TYPEs (I had to do it using the casts to bytea and protobuff to minimize the size of storage for an array of types when writing a lot of data, that could be unpacked afterwords directly in the DB as normal database types).

I hope I made my point a little bit clearer.

Regards, 

Valentine Gogichashvili

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Korotkov
Дата:
Сообщение: Re: GIN improvements part 1: additional information
Следующее
От: Hannu Krosing
Дата:
Сообщение: Re: SSI freezing bug