Re: unaccent extension missing some accents

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: unaccent extension missing some accents
Дата
Msg-id 27438.1320624904@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: unaccent extension missing some accents  (J Smith <dark.panda+lists@gmail.com>)
Ответы Re: unaccent extension missing some accents  (J Smith <dark.panda+lists@gmail.com>)
Re: unaccent extension missing some accents  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
J Smith <dark.panda+lists@gmail.com> writes:
> I've attached a patch against master for unaccent.c that uses swscanf
> along with char2wchar and wchar2char instead of sscanf directly to
> initialize the unaccent extension and it appears to fix the problem in
> both the master and 9.1 branches.

swscanf doesn't seem like an acceptable approach: it's a function that
is relied on nowhere else in PG, so it adds new portability risks of its
own.  It doesn't exist on some platforms that we support (like the one
I'm typing this message on) and there's no real good reason to assume
that it's not broken in its own ways on others.

If you really want to pursue this, I'd suggest parsing the line
manually, perhaps via strchr searches for \t and \n.  It likely wouldn't
be very many more lines than what you've got here.

However, the bigger picture is that OS X's UTF8 locales are broken
through-and-through, and most of their other problems are not feasible
to work around.  So basically you can't use them for anything
interesting, and it's not clear that it's worth putting any time into
solving individual problems.  In the particular case here, the issue
presumably is that sscanf is relying on isspace() ... but we rely on
isspace() directly, in quite a lot of places, so how much is it going
to fix to dodge it right here?
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: J Smith
Дата:
Сообщение: Re: unaccent extension missing some accents
Следующее
От: Jeff Davis
Дата:
Сообщение: btree gist known problems