Re: Character Conversions Handling

Поиск
Список
Период
Сортировка
От Martijn van Oosterhout
Тема Re: Character Conversions Handling
Дата
Msg-id 20051018213924.GC13902@svana.org
обсуждение исходный текст
Ответ на Character Conversions Handling  (Volkan YAZICI <volkan.yazici@gmail.com>)
Список pgsql-hackers
On Tue, Oct 18, 2005 at 10:29:30PM +0300, Volkan YAZICI wrote:
> Hi,
>
> I'm trying to understand the schema laying behind
> backend/utils/adt/like.c to downcase letters [1]. When I look at the
> other tolower() implementations, there're lots of them spread around.
> (In interfaces/libpq, backend/regex, backend/utils/adt/like and etc.)
> For example, despite having pg_wc_tolower() function in regc_locale.c,
> achieving same with manually in iwchareq() of like.c.
>
> I'd so appreciated if somebody can point me the places where I should
> start to look at to understand the character handling with different
> encodings. Also, I wonder why didn't we use any btow/mbsrtowc/wctomb
> like functions. Is this for portability with other compilers?

PostgreSQL has to be compatable across many platforms, including those
that don't have any multibyte support, and there are a few of those.
Just like PostgreSQL includes a complete copy of the timezone library,
so various bits usually handled by system libraries have been
incorporated into the backend. This include encoding support.

> [1] iwchareq() is using pg_mb2wchar_with_len() which decides the right
> mb2wchar function from pg_wchar_table. When I look at
> backend/mb/wchar.c there're some other specific to locale mblen and
> mb2wchar routines. For example, EUC_KR is handled with
> pg_euc2wchar_with_len() function, but LATIN5 is handled with
> pg_latin12wchar_with_len() function. Will we write a new function for
> latin5 like pg_latin52wchar_with_len() if we'd encounter with a new
> problem with latin5?

In this particular case it's not an issue since all the Latin-N
encodings are all single byte encodings, they don't have to be handled
seperately. But yes, this means that PostgreSQL's behaviour may vary
from that of the surrounding system.

The current planning is to use a cross-platform library (ICU) to handle
all the locale and encoding related issues. This is a large task and I
wouldn't be surprised if it takes a release or two. Hopefully it will
clean all these issues up...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: 2nd try @NetBSD/2.0 Alpha
Следующее
От: Josh Berkus
Дата:
Сообщение: Re: Seeing context switch storm with 10/13 snapshot of