Re: unaccent extension missing some accents

Поиск
Список
Период
Сортировка
От J Smith
Тема Re: unaccent extension missing some accents
Дата
Msg-id CADFUPgeEw31kAoY3_9nH==uP9QesYKKTwLV_OgwVKM=P1VvnFg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: unaccent extension missing some accents  (Florian Pflug <fgp@phlo.org>)
Ответы Re: unaccent extension missing some accents  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Sun, Nov 6, 2011 at 1:18 PM, Florian Pflug <fgp@phlo.org> wrote:
>
> What's the locale of the database you're seeing this in, and which charset
> does it use?
>
> I think scanf() uses isspace() and friends, and last time I looked the
> locale definitions where all pretty bogus on OSX. So maybe scanf() somehow
> decides that 0xA0 is whitespace.
>

Ah, that does it: the locale I was using in the test code was just
plain ol' C locale, while in the database it was en_CA.UTF-8. Changing
the locale in my test code produced the wonky results. I should have
figured it was a locale problem. Sure enough, in a UTF-8 locale, it
believes that both 0xa0 and 0x85 are spaces. Pretty wonky behaviour
indeed.

Apparently this is a known OSX issue that has its roots in and older
version of FreeBSD's libc I guess, eh? I've found various bug reports
that allude to the problem and they all seem to point that way.

I've attached a patch against master for unaccent.c that uses swscanf
along with char2wchar and wchar2char instead of sscanf directly to
initialize the unaccent extension and it appears to fix the problem in
both the master and 9.1 branches.

I haven't added any tests in the expected output file 'cause I'm not
exactly sure what I should be testing against, but I could take a
crack at that, too, if the patch looks reasonable and is usable.

Cheers.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [PATCH] optional cleaning queries stored in pg_stat_statements
Следующее
От: Tom Lane
Дата:
Сообщение: Re: unaccent extension missing some accents