Обсуждение: Cyrillic to UNICODE conversion

Поиск
Список
Период
Сортировка

Cyrillic to UNICODE conversion

От
Victor Wagner
Дата:
Despite of advertized support of Unicode to other charset conversion,
PostgreSQL-7.1 reports that Conversion of UNICODE to KOI8 is not
supported. Same for WIN, ALT and other charsets.

As I found out, it was simply forgotten to add these charsets to list
of 8-bit charsets which should be converted. May be becouse their maps
are stored in another directory on ftp.unicode.org (see VENDORS/MicroSoft
for cp1251 and cp866 maps, and somewhere else for KOI8-R.TXT. At least all
those maps are included in the catdoc distribution)

Attached patch fixes this problem. It adds script UCS_to_cyrillic.pl
into src/backend/utils/mb/Unicode directory. Mapping of the PostgreSQL
charset names to filenames (as they appear in catdoc distribution, i.e.
lowercased) is hardcoded into script. It is almost exact copy of
UCS_to_iso script, with only file and constant names changed.

Generated maps are included in the patch, as they are included in the
source tarball, and maps are omitted, becouse they are removed by
make distclean

file src/backend/mb/conv.c is modified
to include new maps and provide appropriate conversion functions



--
Victor Wagner            vitus@ice.ru
Chief Technical Officer        Office:7-(095)-748-53-88
Communiware.Net         Home: 7-(095)-135-46-61
http://www.communiware.net      http://www.ice.ru/~vitus

Вложения

Re: Cyrillic to UNICODE conversion

От
Tatsuo Ishii
Дата:
Thanks for the fixes. I have committed your patches and they should
appear in 7.1.1.

BTW, I have not added cp1251.txt  cp866.txt  koi8-r.txt, since they
come from Unicode.org and are not permitted to re-distribute.
--
Tatsuo Ishii

From: Victor Wagner <vitus@ice.ru>
Subject: [PATCHES] Cyrillic to UNICODE conversion
Date: Thu, 26 Apr 2001 20:51:25 +0400 (MSD)
Message-ID: <Pine.LNX.4.30.0104262041500.9539-101000@party.ice.ru>

>
> Despite of advertized support of Unicode to other charset conversion,
> PostgreSQL-7.1 reports that Conversion of UNICODE to KOI8 is not
> supported. Same for WIN, ALT and other charsets.
>
> As I found out, it was simply forgotten to add these charsets to list
> of 8-bit charsets which should be converted. May be becouse their maps
> are stored in another directory on ftp.unicode.org (see VENDORS/MicroSoft
> for cp1251 and cp866 maps, and somewhere else for KOI8-R.TXT. At least all
> those maps are included in the catdoc distribution)
>
> Attached patch fixes this problem. It adds script UCS_to_cyrillic.pl
> into src/backend/utils/mb/Unicode directory. Mapping of the PostgreSQL
> charset names to filenames (as they appear in catdoc distribution, i.e.
> lowercased) is hardcoded into script. It is almost exact copy of
> UCS_to_iso script, with only file and constant names changed.
>
> Generated maps are included in the patch, as they are included in the
> source tarball, and maps are omitted, becouse they are removed by
> make distclean
>
> file src/backend/mb/conv.c is modified
> to include new maps and provide appropriate conversion functions
>
>
>
> --
> Victor Wagner            vitus@ice.ru
> Chief Technical Officer        Office:7-(095)-748-53-88
> Communiware.Net         Home: 7-(095)-135-46-61
> http://www.communiware.net      http://www.ice.ru/~vitus

Re: Cyrillic to UNICODE conversion

От
Tatsuo Ishii
Дата:
> > BTW, I have not added cp1251.txt  cp866.txt  koi8-r.txt, since they
> > come from Unicode.org and are not permitted to re-distribute.
>
> It is not true for koi8-r.txt. At least one which is included into catdoc
> distribution I've made myself from RFC1483, and only afterward it has
> appear on unicode.org, and Chernov's KOI8 pages.

Oh, I didn't know that.

>  But anyway, if anybody
> is able to get them from unicode.org, why bother.

Agreed.
--
Tatsuo Ishii

Re: Cyrillic to UNICODE conversion

От
Victor Wagner
Дата:
On Sun, 29 Apr 2001, Tatsuo Ishii wrote:

> From: Tatsuo Ishii <t-ishii@sra.co.jp>
> Subject: Re: [PATCHES] Cyrillic to UNICODE conversion
> X-Mailer: Mew version 1.94.2 on Emacs 20.7 / Mule 4.1
>  [iso-2022-jp] (^[$B0*^[(B)
>
> Thanks for the fixes. I have committed your patches and they should
> appear in 7.1.1.
>
> BTW, I have not added cp1251.txt  cp866.txt  koi8-r.txt, since they
> come from Unicode.org and are not permitted to re-distribute.

It is not true for koi8-r.txt. At least one which is included into catdoc
distribution I've made myself from RFC1483, and only afterward it has
appear on unicode.org, and Chernov's KOI8 pages.

 But anyway, if anybody
is able to get them from unicode.org, why bother.
--
Victor Wagner            vitus@ice.ru
Chief Technical Officer        Office:7-(095)-748-53-88
Communiware.Net         Home: 7-(095)-135-46-61
http://www.communiware.net      http://www.ice.ru/~vitus