Re: Patch: add conversion from pg_wchar to multibyte

Поиск

Список

Период

Сортировка

От	Alexander Korotkov
Тема	Re: Patch: add conversion from pg_wchar to multibyte
Дата	22 мая 2012 г. 10:48:51
Msg-id	CAPpHfduQEZUV89CnDJcjnPrdDmB810O4_xLc71GbEA42Yi=40Q@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Patch: add conversion from pg_wchar to multibyte (Tatsuo Ishii <ishii@postgresql.org>)
Ответы	Re: Patch: add conversion from pg_wchar to multibyte
Список	pgsql-hackers

Дерево обсуждения

On Tue, May 22, 2012 at 11:50 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:

I think it's possible. The first characters are defined like this:

#define IS_LCPRV1(c) ((unsigned char)(c) == 0x9a || (unsigned char)(c) == 0x9b)
#define IS_LCPRV2(c) ((unsigned char)(c) == 0x9c || (unsigned char)(c) == 0x9d)

It seems IS_LCPRV1 is not used in any of PostgreSQL supported
encodings at this point, that means there's 0 chance which existing
databases include LCPRV1. So you could safely ignore it.

For IS_LCPRV2, it is only used for Chinese encodings (EUC_TW and BIG5)
in backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
and it is fixed to 0x9d. So you can always restore the value to 0x9d.

> Also in this part of code we're shifting first byte by 16 bits:
>
> if (IS_LC1(*from) && len >= 2)
> {
> *to = *from++ << 16;
> *to |= *from++;
> len -= 2;
> }
> else if (IS_LCPRV1(*from) && len >= 3)
> {
> from++;
> *to = *from++ << 16;
> *to |= *from++;
> len -= 3;
> }
>
> Why don't we shift it by 8 bits?

Because we want the first byte of LC1 case to be placed in the second
byte of wchar. i.e.

0th byte: always 0
1th byte: leading byte (the first byte of the multibyte)
2th byte: always 0
3th byte: the second byte of the multibyte

Note that we always assume that the 1th byte (called "leading byte":
LB in short) represents the id of the character set (from 0x81 to
0xff) in MULE INTERNAL encoding. For the mapping between LB and
charsets, see pg_wchar.h.

Thanks for your comments. They clarify a lot.

But I still don't realize how can we distinguish IS_LCPRV2 and IS_LC2? Isn't it possible for them to produce same pg_wchar?

------
With best regards,
Alexander Korotkov.

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Simon Riggs
Дата: 22 мая 2012 г., 09:46:45
Сообщение: Changing the concept of a DATABASE

Следующее

От: José Luis Tallón
Дата: 22 мая 2012 г., 11:06:29
Сообщение: Re: Changing the concept of a DATABASE

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Patch: add conversion from pg_wchar to multibyte

Предыдущее

Следующее