Re: UTF-8 and =, LIKE problems

Поиск
Список
Период
Сортировка
От Michael Glaesemann
Тема Re: UTF-8 and =, LIKE problems
Дата
Msg-id 53FE8566-2E1C-11D9-9FAD-000A95C88220@myrealbox.com
обсуждение исходный текст
Ответ на UTF-8 and =, LIKE problems  (Edmund Lian <elian@inbrief.net>)
Список pgsql-general
On Nov 4, 2004, at 1:24 PM, Edmund Lian wrote:

> I am running a web-based accounting package (SQL-Ledger) that supports
> multiple languages on PostgreSQL. When a database encoding is set to
> Unicode, multilingual operation is possible.
>

<snip />

> Semantically, one might expect U+FF17 U+FF19 to be identical to U+0037
> U+0039, but of course they aren't if a simple-minded byte-by-byte or
> character-by-character comparison is done.
>
> In the ideal case, one would probably want to convert all full width
> chars to their half width equivalents because the numbers look wierd
> on the screen (e.g., "7 9  B r i s b a n e  S t r e e t" instead of
> "79 Brisbane Street". Is there any way to get PostgreSQL to do so?
>
> Failing this, is there any way to get PostgreSQL to be a bit smarter
> in doing comparisons? I think I'm SOL, but I thought I'd ask anyway.

I've thought this would be a useful addition to PostgreSQL, but
currently I think it's best handled in the application layer. A brief
glance at the SQL-Ledger homepage shows that it's written in Perl. I'm
still in the early learning stages of Perl (heck, I'm the in the early
learning stages of nearly everthing), but I'd assume with Perl's good
Unicode support there should be a way to do this, similar to PHP's
mb_convert_kana (which handles much more than just kana, btw). Ideally,
I'd think you'd want to store all numbers and latin characters as
single-width characters, so you'd filter them before they enter the
database.

I'd think this might be best placed in the SQL-Ledger code, though you
might be able to fashion a plperl function that would do the same
thing. You could either update all entries (UPDATE foo SET bar =
double_to_single(bar)) or make a functional index on
double_to_single(bar).

I'm not sure which would be the best, and others out there have more
informed opinions than mine which I'd love to read.

Hope this helps a bit.

Michael


В списке pgsql-general по дате отправления:

Предыдущее
От: Edmund Lian
Дата:
Сообщение: UTF-8 and =, LIKE problems
Следующее
От: Philippe Schmid
Дата:
Сообщение: Re: PostgreSQL on Linux PC vs MacOS X