Re: UTF-8 and =, LIKE problems

Поиск

Список

Период

Сортировка

От	Michael Glaesemann
Тема	Re: UTF-8 and =, LIKE problems
Дата	4 ноября 2004 г. 07:46:10
Msg-id	53FE8566-2E1C-11D9-9FAD-000A95C88220@myrealbox.com обсуждение исходный текст
Ответ на	UTF-8 and =, LIKE problems (Edmund Lian <elian@inbrief.net>)
Список	pgsql-general

Дерево обсуждения

On Nov 4, 2004, at 1:24 PM, Edmund Lian wrote:

> I am running a web-based accounting package (SQL-Ledger) that supports
> multiple languages on PostgreSQL. When a database encoding is set to
> Unicode, multilingual operation is possible.
>

<snip />

> Semantically, one might expect U+FF17 U+FF19 to be identical to U+0037
> U+0039, but of course they aren't if a simple-minded byte-by-byte or
> character-by-character comparison is done.
>
> In the ideal case, one would probably want to convert all full width
> chars to their half width equivalents because the numbers look wierd
> on the screen (e.g., "7 9  B r i s b a n e  S t r e e t" instead of
> "79 Brisbane Street". Is there any way to get PostgreSQL to do so?
>
> Failing this, is there any way to get PostgreSQL to be a bit smarter
> in doing comparisons? I think I'm SOL, but I thought I'd ask anyway.

I've thought this would be a useful addition to PostgreSQL, but
currently I think it's best handled in the application layer. A brief
glance at the SQL-Ledger homepage shows that it's written in Perl. I'm
still in the early learning stages of Perl (heck, I'm the in the early
learning stages of nearly everthing), but I'd assume with Perl's good
Unicode support there should be a way to do this, similar to PHP's
mb_convert_kana (which handles much more than just kana, btw). Ideally,
I'd think you'd want to store all numbers and latin characters as
single-width characters, so you'd filter them before they enter the
database.

I'd think this might be best placed in the SQL-Ledger code, though you
might be able to fashion a plperl function that would do the same
thing. You could either update all entries (UPDATE foo SET bar =
double_to_single(bar)) or make a functional index on
double_to_single(bar).

I'm not sure which would be the best, and others out there have more
informed opinions than mine which I'd love to read.

Hope this helps a bit.

Michael

В списке pgsql-general по дате отправления:

Предыдущее

От: Edmund Lian
Дата: 04 ноября 2004 г., 07:25:17
Сообщение: UTF-8 and =, LIKE problems

Следующее

От: Philippe Schmid
Дата: 04 ноября 2004 г., 10:44:25
Сообщение: Re: PostgreSQL on Linux PC vs MacOS X

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: UTF-8 and =, LIKE problems

Предыдущее

Следующее