Обсуждение: Suggestion for Encodings table

Поиск
Список
Период
Сортировка

Suggestion for Encodings table

От
Preston Landers
Дата:
http://www.postgresql.org/docs/8.0/interactive/multibyte.html#CHARSET-TABLE

I would humbly suggest a few improvements to that Encodings table to
improve the clarity.

Many of the entries clearly indicate the language or writing system, such
as WIN1256 = "Windows CP1256 (Arabic)"

I would suggest that every single entry should be described that way with
the common language or writing system name.  Even Unicode could say "All
languages".

In particular, the "WIN" encoding just says "CP1251" -- this is Cyrillic
(Russian) but some people might just see the WIN and assume it's the
character set that Western/US Windows uses (CP 1252).

It's an easy mistake to make and one I see repeated frequently on other
web pages (calling Windows "Western" CP 1251.)  Someone reading English
language docs and seeing a "WIN" character set might naturally assume that
it is the English Windows character set.  (Which BTW is not currently
supported by PG for conversions.)

Some more examples that might improve clarity:

 LATIN5 should say "Turkish"

 LATIN6 should say "Nordic"

 ALT and KOI8 should say "Cyrillic"   (or Russian)


Re: Suggestion for Encodings table

От
Bruce Momjian
Дата:
Preston Landers wrote:
>
> http://www.postgresql.org/docs/8.0/interactive/multibyte.html#CHARSET-TABLE
>
> I would humbly suggest a few improvements to that Encodings table to
> improve the clarity.
>
> Many of the entries clearly indicate the language or writing system, such
> as WIN1256 = "Windows CP1256 (Arabic)"
>
> I would suggest that every single entry should be described that way with
> the common language or writing system name.  Even Unicode could say "All
> languages".
>
> In particular, the "WIN" encoding just says "CP1251" -- this is Cyrillic
> (Russian) but some people might just see the WIN and assume it's the
> character set that Western/US Windows uses (CP 1252).
>
> It's an easy mistake to make and one I see repeated frequently on other
> web pages (calling Windows "Western" CP 1251.)  Someone reading English
> language docs and seeing a "WIN" character set might naturally assume that
> it is the English Windows character set.  (Which BTW is not currently
> supported by PG for conversions.)
>
> Some more examples that might improve clarity:
>
>  LATIN5 should say "Turkish"
>
>  LATIN6 should say "Nordic"
>
>  ALT and KOI8 should say "Cyrillic"   (or Russian)

Great.  Would you submit a patch to the SGML sources?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Suggestion for Encodings table

От
Bruce Momjian
Дата:
Thanks for the ideas.  I have applied the following patch which
documents all our encodings.  Also, the URL I added is very extensive.

---------------------------------------------------------------------------

Preston Landers wrote:
>
> http://www.postgresql.org/docs/8.0/interactive/multibyte.html#CHARSET-TABLE
>
> I would humbly suggest a few improvements to that Encodings table to
> improve the clarity.
>
> Many of the entries clearly indicate the language or writing system, such
> as WIN1256 = "Windows CP1256 (Arabic)"
>
> I would suggest that every single entry should be described that way with
> the common language or writing system name.  Even Unicode could say "All
> languages".
>
> In particular, the "WIN" encoding just says "CP1251" -- this is Cyrillic
> (Russian) but some people might just see the WIN and assume it's the
> character set that Western/US Windows uses (CP 1252).
>
> It's an easy mistake to make and one I see repeated frequently on other
> web pages (calling Windows "Western" CP 1251.)  Someone reading English
> language docs and seeing a "WIN" character set might naturally assume that
> it is the English Windows character set.  (Which BTW is not currently
> supported by PG for conversions.)
>
> Some more examples that might improve clarity:
>
>  LATIN5 should say "Turkish"
>
>  LATIN6 should say "Nordic"
>
>  ALT and KOI8 should say "Cyrillic"   (or Russian)
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: doc/src/sgml/charset.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v
retrieving revision 2.49
diff -c -c -r2.49 charset.sgml
*** doc/src/sgml/charset.sgml    7 Mar 2005 04:30:48 -0000    2.49
--- doc/src/sgml/charset.sgml    12 Mar 2005 06:24:51 -0000
***************
*** 344,390 ****
          </row>
          <row>
           <entry><literal>MULE_INTERNAL</literal></entry>
!          <entry>Mule internal code</entry>
          </row>
          <row>
           <entry><literal>LATIN1</literal></entry>
!          <entry>ISO 8859-1/<acronym>ECMA</> 94 (Latin alphabet no.1)</entry>
          </row>
          <row>
           <entry><literal>LATIN2</literal></entry>
!          <entry>ISO 8859-2/<acronym>ECMA</> 94 (Latin alphabet no.2)</entry>
          </row>
          <row>
           <entry><literal>LATIN3</literal></entry>
!          <entry>ISO 8859-3/<acronym>ECMA</> 94 (Latin alphabet no.3)</entry>
          </row>
          <row>
           <entry><literal>LATIN4</literal></entry>
!          <entry>ISO 8859-4/<acronym>ECMA</> 94 (Latin alphabet no.4)</entry>
          </row>
          <row>
           <entry><literal>LATIN5</literal></entry>
!          <entry>ISO 8859-9/<acronym>ECMA</> 128 (Latin alphabet no.5)</entry>
          </row>
          <row>
           <entry><literal>LATIN6</literal></entry>
!          <entry>ISO 8859-10/<acronym>ECMA</> 144 (Latin alphabet no.6)</entry>
          </row>
          <row>
           <entry><literal>LATIN7</literal></entry>
!          <entry>ISO 8859-13 (Latin alphabet no.7)</entry>
          </row>
          <row>
           <entry><literal>LATIN8</literal></entry>
!          <entry>ISO 8859-14 (Latin alphabet no.8)</entry>
          </row>
          <row>
           <entry><literal>LATIN9</literal></entry>
!          <entry>ISO 8859-15 (Latin alphabet no.9)</entry>
          </row>
          <row>
           <entry><literal>LATIN10</literal></entry>
!          <entry>ISO 8859-16/<acronym>ASRO</> SR 14111 (Latin alphabet no.10)</entry>
          </row>
          <row>
           <entry><literal>ISO_8859_5</literal></entry>
--- 344,390 ----
          </row>
          <row>
           <entry><literal>MULE_INTERNAL</literal></entry>
!          <entry>Mule internal code (Multi-lingual Emacs)</entry>
          </row>
          <row>
           <entry><literal>LATIN1</literal></entry>
!          <entry>ISO 8859-1/<acronym>ECMA</> 94 (Western European)</entry>
          </row>
          <row>
           <entry><literal>LATIN2</literal></entry>
!          <entry>ISO 8859-2/<acronym>ECMA</> 94 (Central European)</entry>
          </row>
          <row>
           <entry><literal>LATIN3</literal></entry>
!          <entry>ISO 8859-3/<acronym>ECMA</> 94 (South European)</entry>
          </row>
          <row>
           <entry><literal>LATIN4</literal></entry>
!          <entry>ISO 8859-4/<acronym>ECMA</> 94 (North European)</entry>
          </row>
          <row>
           <entry><literal>LATIN5</literal></entry>
!          <entry>ISO 8859-9/<acronym>ECMA</> 128 (Turkish)</entry>
          </row>
          <row>
           <entry><literal>LATIN6</literal></entry>
!          <entry>ISO 8859-10/<acronym>ECMA</> 144 (Nordic)</entry>
          </row>
          <row>
           <entry><literal>LATIN7</literal></entry>
!          <entry>ISO 8859-13 (Baltic)</entry>
          </row>
          <row>
           <entry><literal>LATIN8</literal></entry>
!          <entry>ISO 8859-14 (Celtic)</entry>
          </row>
          <row>
           <entry><literal>LATIN9</literal></entry>
!          <entry>ISO 8859-15 (LATIN1 with Euro and accents)</entry>
          </row>
          <row>
           <entry><literal>LATIN10</literal></entry>
!          <entry>ISO 8859-16/<acronym>ASRO</> SR 14111 (Romanian)</entry>
          </row>
          <row>
           <entry><literal>ISO_8859_5</literal></entry>
***************
*** 404,414 ****
          </row>
          <row>
           <entry><literal>KOI8</literal></entry>
!          <entry><acronym>KOI</acronym>8-R(U)</entry>
          </row>
          <row>
           <entry><literal>WIN866</literal></entry>
!          <entry>Windows CP866</entry>
          </row>
          <row>
           <entry><literal>WIN874</literal></entry>
--- 404,414 ----
          </row>
          <row>
           <entry><literal>KOI8</literal></entry>
!          <entry><acronym>KOI</acronym>8-R(U) (Cyrillic)</entry>
          </row>
          <row>
           <entry><literal>WIN866</literal></entry>
!          <entry>Windows CP866 (Cyrillic)</entry>
          </row>
          <row>
           <entry><literal>WIN874</literal></entry>
***************
*** 416,426 ****
          </row>
          <row>
           <entry><literal>WIN1250</literal></entry>
!          <entry>Windows CP1250</entry>
          </row>
          <row>
           <entry><literal>WIN1251</literal></entry>
!          <entry>Windows CP1251</entry>
          </row>
          <row>
           <entry><literal>WIN1256</literal></entry>
--- 416,426 ----
          </row>
          <row>
           <entry><literal>WIN1250</literal></entry>
!          <entry>Windows CP1250 (Central European)</entry>
          </row>
          <row>
           <entry><literal>WIN1251</literal></entry>
!          <entry>Windows CP1251 (Cyrillic)</entry>
          </row>
          <row>
           <entry><literal>WIN1256</literal></entry>
***************
*** 883,888 ****
--- 883,900 ----

       <variablelist>
        <varlistentry>
+        <term><ulink url="http://www.i18ngurus.com/docs/984813247.html"></ulink></term>
+
+        <listitem>
+         <para>
+          An extensive collection of documents about character sets, encodings,
+          and code pages.
+         </para>
+        </listitem>
+       </varlistentry>
+
+      <variablelist>
+       <varlistentry>
         <term><ulink url="ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf"></ulink></term>

         <listitem>