Re: PostgreSQL 8.3.7: soundex function returns UTF-16 characters

Поиск
Список
Период
Сортировка
От Frans
Тема Re: PostgreSQL 8.3.7: soundex function returns UTF-16 characters
Дата
Msg-id 49DB29AA.5050502@geodan.nl
обсуждение исходный текст
Ответ на Re: PostgreSQL 8.3.7: soundex function returns UTF-16 characters  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: PostgreSQL 8.3.7: soundex function returns UTF-16 characters  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
Tom Lane wrote:
> Frans <frans@geodan.nl> writes:
>
>> Tom Lane wrote:
>>
>>> The
>>> fuzzystrmatch module doesn't really work with utf8 (nor any other
>>> multibyte encoding), because it depends on the <ctype.h> functions.
>>> What you'll probably get when applying it to non-ascii utf8 is
>>> an invalidly encoded string.
>>>
>>>
>> Well, in 8.2.6 the result for non-ASCII UTF-8 was an empty string (ASCII
>> code 0).
>>
>
> A comparison of the 8.2 and 8.3 fuzzystrmatch sources shows no
> difference.  The behavior of the ascii() function has indeed changed,
> but soundex() is no more nor less broken than it was before.
>
> [ thinks for a bit... ]  If you are seeing a difference in what soundex
> itself does, the most likely explanation is a difference in the behavior
> of isalpha() or perhaps toupper().  Are you running on the same
> underlying C library as before?  Are you quite sure you have the same
> encoding and locale selected?
>

Thank you for pointing me in the right direction. I have done some more
research now.. I have installed 8.2.13 and 8.3.7 on the same
workstation, selecting locale=C and character encoding=UTF-8 in both
cases. In both cases soundex() behaved as desired, i.e. it produces a
null string if it can not handle the input. It looks like the difference
in behaviour I noticed was not caused by the different PostgreSQL
versions after all, but by a different locale setting. I see (in
postgresql.ini) that for the database in which soundex() produces the
'wrong' output the locale apparently was set to 'Dutch_Netherlands'. I
can not recall consciously selecting this locale but it might have been
set by the installer. Does it make sense that the locale setting
influences the workings of the soundex function?

In the database where I noticed the undesired soundex() behaviour I did
a further test, using the bit_length() function. The query "(select
bit_length(soundex('?'))" returns a value of 0 where ascii() also
returns 0 but it returns a value of 32 in the other case (where ascii()
returns 944). So it seems soundex() really has a different output in
both cases.

I don't know now if this issue should still be regarded as a bug.. At
least it seems to me that the locale setting is also affecting the
soundex function should be documented.

>             regards, tom lane
>

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Fujii Masao
Дата:
Сообщение: Re: data loss with pg_standby when doing a controlled failover
Следующее
От: "suresh adapa"
Дата:
Сообщение: postgresql-8.3.6-1PGDG : redirect_stderr = on does not start server