Re: daitch_mokotoff module

Поиск
Список
Период
Сортировка
От Dag Lem
Тема Re: daitch_mokotoff module
Дата
Msg-id ygeo84tvugy.fsf@sid.nimrod.no
обсуждение исходный текст
Ответ на Re: daitch_mokotoff module  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: daitch_mokotoff module  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Thomas Munro <thomas.munro@gmail.com> writes:
>> Erm, it looks like something weird is happening somewhere in cfbot's
>> pipeline, because Dag's patch says:
>
>> +SELECT daitch_mokotoff('Straßburg');
>> + daitch_mokotoff
>> +-----------------
>> + 294795
>> +(1 row)
>
> ... so, that test case is guaranteed to fail in non-UTF8 encodings,
> I suppose?  I wonder what the LANG environment is in that cfbot
> instance.
>
> (We do have methods for dealing with non-ASCII test cases, but
> I can't see that this patch is using any of them.)
>
>             regards, tom lane
>

I naively assumed that tests would be run in an UTF8 environment.

Running "ack -l '[\x80-\xff]'" in the contrib/ directory reveals that
two other modules are using UTF8 characters in tests - citext and
unaccent.

The citext tests seem to be commented out - "Multibyte sanity
tests. Uncomment to run."

Looking into the unaccent module, I don't quite understand how it will
work with various encodings, since it doesn't seem to decode its input -
will it fail if run under anything but ASCII or UTF8?

In any case, I see that unaccent.sql starts as follows:


CREATE EXTENSION unaccent;

-- must have a UTF8 database
SELECT getdatabaseencoding();

SET client_encoding TO 'UTF8';


Would doing the same thing in fuzzystrmatch.sql fix the problem with
failing tests? Should I prepare a new patch?


Best regards

Dag Lem



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Suraj Kharage
Дата:
Сообщение: Remove extra spaces
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: Add Boolean node