Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8

Поиск
Список
Период
Сортировка
От Tatsuo Ishii
Тема Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8
Дата
Msg-id 20201030.130626.465466054321175014.t-ishii@sraoss.co.jp
обсуждение исходный текст
Ответ на MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8  (Ashutosh Sharma <ashu.coek88@gmail.com>)
Список pgsql-hackers
> Hi All,
> 
> Today while working on some other task related to database encoding, I
> noticed that the MINUS SIGN (with byte sequence a1-dd) in EUC-JP is
> mapped to FULLWIDTH HYPHEN-MINUS (with byte sequence ef-bc-8d) in
> UTF-8. See below:
> 
> postgres=# select convert('\xa1dd', 'euc_jp', 'utf8');
>  convert
> ----------
>  \xefbc8d
> (1 row)
> 
> Isn't this a bug? Shouldn't this have been converted to the MINUS SIGN
> (with byte sequence e2-88-92) in UTF-8 instead of FULLWIDTH
> HYPHEN-MINUS SIGN.

Yeah. Originally EUC_JP 0xa1dd was converted to UTF8 0xe28892. At some
point, someone changed the mapping and now you see it.

> When the MINUS SIGN (with byte sequence e2-88-92) in UTF-8 is
> converted to EUC-JP, the convert functions fails with an error saying:
> "character with byte sequence 0xe2 0x88 0x92 in encoding UTF8 has no
> equivalent in encoding EUC_JP". See below:
> 
> postgres=# select convert('\xe28892', 'utf-8', 'euc_jp');
> ERROR:  character with byte sequence 0xe2 0x88 0x92 in encoding "UTF8"
> has no equivalent in encoding "EUC_JP"

Again, originally UTF8 0xe28892 was converted to EUC_JP 0xa1dd . At
some point, someone changed the mapping.

> However, when the same MINUS SIGN in UTF-8 is converted to SJIS
> encoding, the convert function returns the correct result. See below:
> 
> postgres=# select convert('\xe28892', 'utf-8', 'sjis');
>  convert
> ---------
>  \x817c
> (1 row)
> 
> Please note that the byte sequence (81-7c) in SJIS represents MINUS
> SIGN in SJIS which means the MINUS SIGN in UTF8 got converted to the
> MINUS SIGN in SJIS and that is what we expect. Isn't it?

Agreed.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Julien Rouhaud
Дата:
Сообщение: Re: Online checksums verification in the backend
Следующее
От: Tatsuo Ishii
Дата:
Сообщение: Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8