Обсуждение: CP949 for EUC-KR?

Поиск
Список
Период
Сортировка

CP949 for EUC-KR?

От
Takahiro Itagaki
Дата:
I heard pg_get_encoding_from_locale() failed in kor locale.
   WARNING:  could not determine encoding for locale "kor": codeset is "CP949"

I found the following description in the web:   CP949 is EUC-KR, extended with UHC (Unified Hangul Code).
http://www.opensource.apple.com/source/libiconv/libiconv-13.2/libiconv/lib/cp949.h

but we define CP51949 for EUC-KR in chklocale.c.   {PG_EUC_KR, "CP51949"},        /* or 20949 ? */

Which is the compatible codeset with our PG_EUC_KR encoding?
949, 51949, or 20949? Should we add (or replace) CP949 for EUC-KR?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



Re: CP949 for EUC-KR?

От
Heikki Linnakangas
Дата:
Takahiro Itagaki wrote:
> I heard pg_get_encoding_from_locale() failed in kor locale.
> 
>     WARNING:  could not determine encoding for locale "kor": codeset is "CP949"
> 
> I found the following description in the web:
>     CP949 is EUC-KR, extended with UHC (Unified Hangul Code).
>     http://www.opensource.apple.com/source/libiconv/libiconv-13.2/libiconv/lib/cp949.h
> 
> but we define CP51949 for EUC-KR in chklocale.c.
>     {PG_EUC_KR, "CP51949"},        /* or 20949 ? */
> 
> Which is the compatible codeset with our PG_EUC_KR encoding?
> 949, 51949, or 20949?

A bit of googling suggests that 51949 is indeed the Windows codepage
that's equivalent with EUC-KR.

> Should we add (or replace) CP949 for EUC-KR?

No. CP949 is not plain EUC-KR, but EUC-KR with some extensions (UHC). At
least on CVS HEAD, we recognize CP949 as an alias for the PostgreSQL
PG_UHC encoding. There's a significant difference between the two,
because PG_EUC_KR is supported as a server-encoding while PG_UHC is not.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: CP949 for EUC-KR?

От
Takahiro Itagaki
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

> > Should we add (or replace) CP949 for EUC-KR?
> 
> No. CP949 is not plain EUC-KR, but EUC-KR with some extensions (UHC). At
> least on CVS HEAD, we recognize CP949 as an alias for the PostgreSQL
> PG_UHC encoding.

That's it! We should have added an additional alias to chklocale, too.

Index: src/port/chklocale.c
===================================================================
--- src/port/chklocale.c    (HEAD)
+++ src/port/chklocale.c    (fixed)
@@ -172,6 +172,7 @@    {PG_GBK, "CP936"},    {PG_UHC, "UHC"},
+    {PG_UHC, "CP949"},    {PG_JOHAB, "JOHAB"},    {PG_JOHAB, "CP1361"},


Except UHC, we don't have any codepage aliases for the encodings below.
I assume we don't need to add CPxxx because Windows does not have
corresponding codepages for them, right?
   {PG_LATIN6, "ISO-8859-10"},   {PG_LATIN7, "ISO-8859-13"},   {PG_LATIN8, "ISO-8859-14"},   {PG_LATIN10,
"ISO-8859-16"},  {PG_SHIFT_JIS_2004, "SJIS_2004"},
 

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: CP949 for EUC-KR?

От
Heikki Linnakangas
Дата:
Takahiro Itagaki wrote:
> That's it! We should have added an additional alias to chklocale, too.
> 
> Index: src/port/chklocale.c
> ===================================================================
> --- src/port/chklocale.c    (HEAD)
> +++ src/port/chklocale.c    (fixed)
> @@ -172,6 +172,7 @@
>      {PG_GBK, "CP936"},
>  
>      {PG_UHC, "UHC"},
> +    {PG_UHC, "CP949"},
>  
>      {PG_JOHAB, "JOHAB"},
>      {PG_JOHAB, "CP1361"},

Yeah, seems correct.

> Except UHC, we don't have any codepage aliases for the encodings below.
> I assume we don't need to add CPxxx because Windows does not have
> corresponding codepages for them, right?
> 
>     {PG_LATIN6, "ISO-8859-10"},
>     {PG_LATIN7, "ISO-8859-13"},
>     {PG_LATIN8, "ISO-8859-14"},
>     {PG_LATIN10, "ISO-8859-16"},
>     {PG_SHIFT_JIS_2004, "SJIS_2004"},

Yeah, I guess so. I can't find Windows codepages for these either, by
google.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: CP949 for EUC-KR?

От
"Ioseph Kim"
Дата:
Hi, I'm Korean.

CP51949 is EUC-KR correct.
so, that defined code is correct too.

But in Korea, EUC-KR code is not good to use all Korean character.
In recent years, many people in Korea use the CP949 code.
MS Windows codepage also is CP949.

----- Original Message ----- 
From: "Takahiro Itagaki" <itagaki.takahiro@oss.ntt.co.jp>
To: <pgsql-hackers@postgresql.org>
Sent: Tuesday, April 27, 2010 7:27 PM
Subject: [HACKERS] CP949 for EUC-KR?


>I heard pg_get_encoding_from_locale() failed in kor locale.
> 
>    WARNING:  could not determine encoding for locale "kor": codeset is "CP949"
> 
> I found the following description in the web:
>    CP949 is EUC-KR, extended with UHC (Unified Hangul Code).
>    http://www.opensource.apple.com/source/libiconv/libiconv-13.2/libiconv/lib/cp949.h
> 
> but we define CP51949 for EUC-KR in chklocale.c.
>    {PG_EUC_KR, "CP51949"}, /* or 20949 ? */
> 
> Which is the compatible codeset with our PG_EUC_KR encoding?
> 949, 51949, or 20949? Should we add (or replace) CP949 for EUC-KR?
> 
> Regards,
> ---
> Takahiro Itagaki
> NTT Open Source Software Center
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: CP949 for EUC-KR?

От
Takahiro Itagaki
Дата:
"Ioseph Kim" <pgsql-kr@postgresql.kr> wrote:

> CP51949 is EUC-KR correct.
> >    {PG_EUC_KR, "CP51949"}, /* or 20949 ? */

Thank you for the information. I removed "or 20949 ?" from the line.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center