Re: unaccent extension missing some accents

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: unaccent extension missing some accents
Дата
Msg-id 26051.1320682367@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: unaccent extension missing some accents  (J Smith <dark.panda+lists@gmail.com>)
Ответы Re: unaccent extension missing some accents
Список pgsql-hackers
J Smith <dark.panda+lists@gmail.com> writes:
> Alright, I wrote up another patch that uses strchr to parse out the
> lines of the unaccent.rules file, foregoing sscanf completely.
> Hopefully this looks a bit better than using swscanf.

I looked at this a bit and realized that sscanf is actually doing a
couple of critical things for us, which are lost in translation when
doing it like this:

1. It ignores whitespace other than the dividing tab.  If we don't
continue to do that, we'll likely break existing config files.

2. It ensures that src and trg each consist of at least one (nonblank)
character.  placeChar() is critically dependent on the assumption that
src is not empty.

However, after looking around a bit at the other tsearch config-file-
reading functions, I noted that they all use t_isspace() to identify
whitespace ... and that function in fact should be okay on OS X,
because it uses iswspace in multibyte encodings.

So it's fairly simple to improve this code to reject whitespace that
way.  I don't like the existing code anyway because of its potential
vulnerability to buffer overrun.  I'll fix it up and commit.

> As for the other problems with isspace and such on OSX, it might be
> worth looking at the python portability fixes.

If OS X's UTF8 locales weren't so thoroughly broken (eg sorting does not
work), I might be tempted to try to do it that way, but I still fail
to see the point.  After reviewing the code I feel that unaccent needs
to be fixed because it's not consistent with the other tsearch config
file parsers, and not so much because it works or doesn't work on any
specific platform.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: [PATCH] optional cleaning queries stored in pg_stat_statements
Следующее
От: Robert Haas
Дата:
Сообщение: Re: synchronous commit vs. hint bits