Re: Hunspell as filtering dictionary

Поиск
Список
Период
Сортировка
От Hugh Ranalli
Тема Re: Hunspell as filtering dictionary
Дата
Msg-id CAAhbUMPEwNgvcVJRdta5RR3TVxNf6MGjhGms5RFS63gwYPVXhA@mail.gmail.com
обсуждение исходный текст
Ответ на Hunspell as filtering dictionary  (Bibi Mansione <golgote@gmail.com>)
Ответы Re: Hunspell as filtering dictionary
Список pgsql-general
On Tue, 5 Nov 2019 at 09:42, Bibi Mansione <golgote@gmail.com> wrote:
Hi,
I am trying to create a ts_vector from a French text. Here are the operations that seem logical to perform in that order:

1. remove stopwords
2. use hunspell to find words roots
3. unaccent

I can't speak to French, but we use a similar configuration in English, with unaccent first, then hunspell. We found that there were words that hunspell didn't recognise, but instead pulled apart (for example, "contract" became "con" and "tract"), so I wonder if something similar is happening with "découvrir." To solve this, we put a custom dictionary with these terms in front of hunspell. Unaccent definitely has to be called first. We also modified hunspell with a custom stopwords file, to eliminate select other terms, such as profanities:

    -- We use a custom stopwords file, to filter out other terms, such as profanities
    ALTER TEXT SEARCH DICTIONARY
        hunspell_en_ca (
            Stopwords = our_custom_stopwords
            );

    -- Adding english_stem allows us to recognize words which hunspell
    -- doesn't, particularly acronyms such as CGA 
    ALTER TEXT SEARCH CONFIGURATION    
        our_configuration   
    ALTER MAPPING FOR
        asciiword, asciihword, hword_asciipart,
        word, hword, hword_part
    WITH
        unaccent, our_custom_dictionary, hunspell_en_ca, english_stem
        ;

There was definitely a fair bit of trial and error to determine the correct order and configuration.

В списке pgsql-general по дате отправления:

Предыдущее
От: Thomas Kellerer
Дата:
Сообщение: Re: Upgrade PGSQL main version without backup/restore all databases?
Следующее
От: Ravi Krishna
Дата:
Сообщение: Re: Upgrade PGSQL main version without backup/restore alldatabases?