Re: [HACKERS] Flexible configuration for full-text search

Поиск
Список
Период
Сортировка
От Emre Hasegeli
Тема Re: [HACKERS] Flexible configuration for full-text search
Дата
Msg-id CAE2gYzwAeuNB=e1tvM826CxFrons5beEYqTshdo2HOMTQb9XKg@mail.gmail.com
обсуждение исходный текст
Ответ на [HACKERS] Flexible configuration for full-text search  (Aleksandr Parfenov <a.parfenov@postgrespro.ru>)
Ответы Re: [HACKERS] Flexible configuration for full-text search  (Aleksandr Parfenov <a.parfenov@postgrespro.ru>)
Список pgsql-hackers
> The patch introduces way to configure FTS based on CASE/WHEN/THEN/ELSE
> construction.

Interesting feature.  I needed this flexibility before when I was
implementing text search for a Turkish private listing application.
Aleksandr and Arthur were kind enough to discuss it with me off-list
today.

> 1) Multilingual search. Can be used for FTS on a set of documents in
> different languages (example for German and English languages).
>
> ALTER TEXT SEARCH CONFIGURATION multi
>   ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
>                     word, hword, hword_part WITH CASE
>     WHEN english_hunspell AND german_hunspell THEN
>       english_hunspell UNION german_hunspell
>     WHEN english_hunspell THEN english_hunspell
>     WHEN german_hunspell THEN german_hunspell
>     ELSE german_stem UNION english_stem
>   END;

I understand the need to support branching, but this syntax is overly
complicated.  I don't think there is any need to support different set
of dictionaries as condition and action.  Something like this might
work better:

ALTER TEXT SEARCH CONFIGURATION multi   ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word,hword, hword_part WITH   CASE english_hunspell UNION german_hunspell       WHEN MATCH THEN KEEP       ELSE
german_stemUNION english_stem   END;
 

To put it formally:

ALTER TEXT SEARCH CONFIGURATION name   ADD MAPPING FOR token_type [, ... ] WITH config

where config is one of:
   dictionary_name   config { UNION | INTERSECT | EXCEPT } config   CASE config WHEN [ NO ] MATCH THEN [ KEEP ELSE ]
configEND
 

> 2) Combination of exact search with morphological one. This patch not
> fully solve the problem but it is a step toward solution. Currently, we
> should split exact and morphological search in query manually and use
> separate index for each part. With new way to configure FTS we can use
> following configuration:
>
> ALTER TEXT SEARCH CONFIGURATION exact_and_morph
>   ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
>                   word, hword, hword_part WITH CASE
>     WHEN english_hunspell THEN english_hunspell UNION simple
>     ELSE english_stem UNION simple
>   END

This could be:
       CASE english_hunspell           THEN KEEP           ELSE english_stem       END   UNION       simple

> 3) Using different dictionaries for recognizing and output generation.
> As I mentioned before, in new syntax condition and command are separate
> and we can use it for some more complex text processing. Here an
> example for processing only nouns:
>
> ALTER TEXT SEARCH CONFIGURATION nouns_only
>   ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
>                     word, hword, hword_part WITH CASE
>   WHEN english_noun THEN english_hunspell
> END

This would also still work with the simpler syntax because
"english_noun", still being a dictionary, would pass the tokens to the
next one.

> 4) Special stopword processing allows us to discard stopwords even if
> the main dictionary doesn't support such feature (in example pl_ispell
> dictionary keeps stopwords in text):
>
> ALTER TEXT SEARCH CONFIGURATION pl_without_stops
>   ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
>                     word, hword, hword_part WITH CASE
>     WHEN simple_pl IS NOT STOPWORD THEN pl_ispell
>   END

Instead of supporting old way of putting stopwords on dictionaries, we
can make them dictionaries on their own.  This would then become
something like:
   CASE polish_stopword       WHEN NO MATCH THEN polish_isspell   END


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: [HACKERS] taking stdbool.h into use
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] Timeline ID in backup_label file