> Hi,
>
> attached is patch with:
>
> - new encoding names stuff with better performance (binary search
> intead for() and prevent some needless searching)
>
> - possible is use synonyms for encoding (an example ISO-8859-1,
> Latin1, l1)
>
> - implemented is Peter's idea about "encoding names clearing"
> (other chars than [A-Za-z0-9] are irrelevan -- 'ISO-8859-1' is
> same as 'iso8859_1' or iso-8-8-5-9-1 :-)
>
> - share routines for this between FE and BE (never more define
> encoding names separate in FE and BE)
>
> - add prefix PG_ to encoding identificator macros, something like 'ALT'
> is pretty dirty in source code, rather use PG_ALT.
>
> (Note: patch add new file mb/encname.c and remove mb/common.c)
>
> Karel
Thanks for the patches, but...
1) There is a compiler error if --enable-unicode-conversion is not enabled
2) The patches break createdb. createdb should raise an error if client-only encodings such as SJIS etc. is
specified.
3) I don't like following ugliness. Why not changing all of SQL_ASCII occurrences in the sources.
/* * A lot of PG stuff use 'SQL_ASCII' without prefix (dirty...) */ #define SQL_ASCII PG_SQL_ASCII
4) Encoding "official" names are inconsistent. Here are my suggested changes (referring
http://www.iana.org/assignments/character-sets, according to Peter's suggestiuon):
ALT -> IBM866 KOI8 -> KOI8_R UNICODE -> UTF_8 (Peter's suggestion) Also, I'm wondering why windows-1251,
notwindows_1251? or ISO_8859_1, not ISO-8859-1? there seems a confusion about the usage of "_" and "-".
pg_enc2name pg_enc2name_tbl[] =
{{ "SQL_ASCII", PG_SQL_ASCII },{ "EUC_JP", PG_EUC_JP },{ "EUC_CN", PG_EUC_CN },{ "EUC_KR", PG_EUC_KR },{
"EUC_TW", PG_EUC_TW },{ "UNICODE", PG_UNICODE },{ "MULE_INTERNAL",PG_MULE_INTERNAL },{ "ISO_8859_1", PG_LATIN1
},{"ISO_8859_2", PG_LATIN2 },{ "ISO_8859_3", PG_LATIN3 },{ "ISO_8859_4", PG_LATIN4 },{ "ISO_8859_5",
PG_LATIN5},{ "KOI8", PG_KOI8 },{ "window-1251",PG_WIN1251 },{ "ALT", PG_ALT },{ "Shift_JIS", PG_SJIS },{
"Big5", PG_BIG5 },{ "window-1250",PG_WIN1251 }
};