Re: invalidly encoded strings

Поиск
Список
Период
Сортировка
От Tatsuo Ishii
Тема Re: invalidly encoded strings
Дата
Msg-id 20070911.112750.70199461.t-ishii@sraoss.co.jp
обсуждение исходный текст
Ответ на Re: invalidly encoded strings  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: invalidly encoded strings  (Jeff Davis <pgsql@j-davis.com>)
Re: invalidly encoded strings  (Andrew Dunstan <andrew@dunslane.net>)
Re: invalidly encoded strings  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: invalidly encoded strings  (Martijn van Oosterhout <kleptog@svana.org>)
Список pgsql-hackers
> Tatsuo Ishii <ishii@postgresql.org> writes:
> > If you regard the unicode code point as simply a number, why not
> > regard the multibyte characters as a number too?
> 
> Because there's a standard specifying the Unicode code points *as
> numbers*.  The mapping from those numbers to UTF8 strings (and other
> representations) is well-defined by the standard.
> 
> > Also I'm wondering you what we should do with different
> > backend/frontend encoding combo.
> 
> Nothing.  chr() has always worked with reference to the database
> encoding, and we should keep it that way.

Where is it documented?

> BTW, it strikes me that there is another hole that we need to plug in
> this area, and that's the convert() function.  Being able to create
> a value of type text that is not in the database encoding is simply
> broken.  Perhaps we could make it work on bytea instead (providing
> a cast from text to bytea but not vice versa), or maybe we should just
> forbid the whole thing if the database encoding isn't SQL_ASCII.

Please don't do that. It will break an usefull use case of convert().

A user has a database encoded in UTF-8. He has English, French,
Chinese  and Japanese data in tables. To sort the tables in the
language order, he will do like this:

SELECT * FROM japanese_table ORDER BY convert(japanese_text using utf8_to_euc_jp);

Without using convert(), he will get random order of data. This is
because Kanji characters are in random order in UTF-8, while Kanji
characters are reasonably ordered in EUC_JP.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: invalidly encoded strings
Следующее
От: Tom Lane
Дата:
Сообщение: Re: "txn" in pg_stat_activity