Re: unaccent extension missing some accents

Поиск

Список

Период

Сортировка

От	Florian Pflug
Тема	Re: unaccent extension missing some accents
Дата	6 ноября 2011 г. 17:19:01
Msg-id	4767BAA6-05D0-4103-A88E-7835D5F06D7D@phlo.org обсуждение исходный текст
Ответ на	Re: unaccent extension missing some accents (J Smith <dark.panda+lists@gmail.com>)
Ответы	Re: unaccent extension missing some accents (J Smith <dark.panda+lists@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Nov6, 2011, at 18:43 , J Smith wrote:
> I put some elog debugging lines into unaccent.c and found that sscanf
> sometimes reads the scanned line by finding only one byte for the for
> the source character rather than the two required for the complete
> UTF-8 code point. It appears that the following characters are causing
> the problem, along with the code points and such:
>
> 'Å' => 'A' | c3,85 => 41
> 'à' => 'a' | c3,a0 => 61
> 'ą' => 'a' | c4,85 => 61
> 'Ġ' => 'G' | c4,a0 => 47
> 'Ņ' => 'N' | c5,85 => 4e
> 'Š' => 'S' | c5,a0 => 53
>
> In each case, one byte was being read in the source string rather than
> two, leading to the "duplicate TO" warnings above. This later leads to
> the characters that produced the warning being ignored when unaccent
> is called and left in the output.

What's the locale of the database you're seeing this in, and which charset
does it use?

I think scanf() uses isspace() and friends, and last time I looked the
locale definitions where all pretty bogus on OSX. So maybe scanf() somehow
decides that 0xA0 is whitespace.

> I haven't been able to reproduce in a smaller example, and haven't
> been able to reproduce on a CentOS server, so at this point I'm at a
> loss as to the problem.

Have you tried to set the same locale as postgres (using setlocale()) in
your tests?

best regards,
Florian Pflug

В списке pgsql-hackers по дате отправления:

Предыдущее

От: "Dickson S. Guedes"
Дата: 06 ноября 2011 г., 17:16:26
Сообщение: Re: proposal: psql concise mode

Следующее

От: "Tomas Vondra"
Дата: 06 ноября 2011 г., 18:16:00
Сообщение: Re: [PATCH] optional cleaning queries stored in pg_stat_statements

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: unaccent extension missing some accents

Предыдущее

Следующее