Re: 回复: May "PostgreSQL server side GB18030 character set support" reconsidered?

Поиск
Список
Период
Сортировка
От Tatsuo Ishii
Тема Re: 回复: May "PostgreSQL server side GB18030 character set support" reconsidered?
Дата
Msg-id 20201006.121142.2002518154310370203.t-ishii@sraoss.co.jp
обсуждение исходный текст
Ответ на Re: 回复: May "PostgreSQL server side GB18030 character set support" reconsidered?  (Tatsuo Ishii <ishii@sraoss.co.jp>)
Список pgsql-general
> But as he already admitted, actually GB18030 is 4 byte encoding, rather
> than 2 bytes. So maybe we could find a way to map original GB18030 to
> ASCII-safe GB18030 using 4 bytes.

Here is an idea (in-byte represents GB18030, out-byte represents
internal server encoding):

if (in-byte1 is 0x00-80)    /* ASCII */
   out-byte1 = in-byte1

else if (in-byte1 is 0x81-0xfe && in-byte2 is 0x40-0x7f)    /* 2 bytes GB18030 */
   out-byte1 = in-byte1
   out-byte2 = 0x80
   out-byte3 = in-byte2 + 0x80 (should be 0xc0-0xc9)
   out-byte4 = 0x80

else if (in-byte1 is 0x81-0xfe && in-byte2 is 0x80-0xfe)    /* 2 bytes GB18030 */
   out-byte1 = in-byte1
   out-byte2 = 0x80
   out-byte3 = 0x80
   out-byte4 = in-byte2 (should be 0x80-0xfe)

else if (in-byte1 is 0x81-0xfe && in-byte2 is 0x30-0x39)    /* 4 bytes GB18030 */
   out-byte1 = in-byte1
   out-byte2 = in-byte2 + 0x80 (should be 0xb0-0xb9)
   out-byte3 = in-byte3
   out-byte4 = in-byte4 + 0x80 (should be 0xb0-0xb9)

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



В списке pgsql-general по дате отправления:

Предыдущее
От: Tatsuo Ishii
Дата:
Сообщение: Re: 回复: May "PostgreSQL server side GB18030 character set support" reconsidered?
Следующее
От: Han Parker
Дата:
Сообщение: 回复: 回复: May "PostgreSQL server side GB18030 character set support" reconsidered?