Re: Fulltext search configuration

Поиск
Список
Период
Сортировка
От Mohamed
Тема Re: Fulltext search configuration
Дата
Msg-id 861fed220902020339r6f34b4fchc578966f81ec81f9@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Fulltext search configuration  (Daniel Chiaramello <daniel.chiaramello@golog.net>)
Ответы Re: Fulltext search configuration  (Oleg Bartunov <oleg@sai.msu.su>)
Список pgsql-general
No, I don't. But the ts_lexize don't return anything so I figured there must be an error somehow. 

I think we are using the same dictionary + that I am using the stopwords file and a different affix file, because using the hunspell (ayaspell) .aff gives me this error : 

ERROR:  wrong affix file format for flag
CONTEXT:  line 42 of configuration file "C:/Program Files/PostgreSQL/8.3/share/tsearch_data/hunarabic.affix": "PFX Aa Y 40

/ Moe




On Mon, Feb 2, 2009 at 12:13 PM, Daniel Chiaramello <daniel.chiaramello@golog.net> wrote:
Hi Mohamed.

I don't know where you get the dictionary - I unsuccessfully tried the OpenOffice one by myself (the Ayaspell one), and I had no arabic stopwords file.

Renaming the file is supposed to be enough (I did it successfully for Thailandese dictionary) - the ".aff'" file becoming the ".affix" one.
When I tried to create the dictionary:

CREATE TEXT SEARCH DICTIONARY ar_ispell (
    TEMPLATE = ispell,
    DictFile = ar_utf8,
    AffFile = ar_utf8,
    StopWords = english
);

I had an error:

ERREUR:  mauvais format de fichier affixe pour le drapeau
CONTEXTE : ligne 42 du fichier de configuration « /usr/share/pgsql/tsearch_data/ar_utf8.affix » : « PFX Aa      Y       40

(which means Bad format of Affix file for flag, line 42 of configuration file)

Do you have an error when creating your dictionary?

Daniel

Mohamed a écrit :
I have ran into some problems here.

I am trying to implement arabic fulltext search on three columns.

To create a dictionary I have a hunspell dictionary and and arabic stop file.

CREATE TEXT SEARCH DICTIONARY hunspell_dic (
    TEMPLATE = ispell,
    DictFile = hunarabic,
    AffFile = hunarabic,
    StopWords = arabic
);

1) The problem is that the hunspell contains a .dic and a .aff file but the configuration requeries a .dict and .affix file. I have tried to change the endings but with no success.

2) ts_lexize('hunspell_dic', 'ARABIC WORD') returns nothing

3) How can I convert my .dic and .aff to valid .dict and .affix ? 

4) I have read that when using dictionaries, if a word is not recognized by any dictionary it will not be indexed. I find that troublesome. I would like everything but the stop words to be indexed. I guess this might be a step that I am not ready for yet, but just wanted to put it out there.



Also I would like to know how the process of the fulltext search implementation looks like, from config to search.

Create dictionary, then a text configuration, add dic to configuration, index columns with gin or gist ...

How does a search look like? Does it match against the gin/gist index. Have that index been built up using the dictionary/configuration, or is the dictionary only used on search frases? 

/ Moe





В списке pgsql-general по дате отправления:

Предыдущее
От: Daniel Chiaramello
Дата:
Сообщение: Re: Fulltext search configuration
Следующее
От: "Paolo Saudin"
Дата:
Сообщение: R: R: complex custom aggregate function