Re: Fulltext search configuration

Поиск
Список
Период
Сортировка
От Oleg Bartunov
Тема Re: Fulltext search configuration
Дата
Msg-id Pine.LNX.4.64.0902021746080.4158@sn.sai.msu.ru
обсуждение исходный текст
Ответ на Re: Fulltext search configuration  (Mohamed <mohamed5432154321@gmail.com>)
Ответы Re: Fulltext search configuration  (Mohamed <mohamed5432154321@gmail.com>)
Список pgsql-general
Mohamed,

comment line in ar.affix
#FLAG   long
and creation of ispell dictionary will work.
This is temp, solution.
Teodor is working on fixing affix autorecognizing.

I can't say anything about testing, since somebody should provide
first test case. I don't know how to type arabic :)

Oleg

On Mon, 2 Feb 2009, Mohamed wrote:

> Oleg, like I mentioned earlier. I have a different .affix file that I got
> from Andrew with the stop file and I get no errors creating the dictionary
> using that one but I get nothing out from ts_lexize.
> The size on that one is : 406,219 bytes
> And the size on the hunspell one (first) : 406,229 bytes
>
> Little to close, don't you think ?
>
> It might be that the arabic hunspell (ayaspell) affix file is damaged on
> some lines and I got the fixed one from Andrew.
>
> Just wanted to let you know.
>
> / Moe
>
>
>
> On Mon, Feb 2, 2009 at 3:25 PM, Mohamed <mohamed5432154321@gmail.com> wrote:
>
>> Ok, thank you Oleg.
>> I have another dictionary package which is a conversion to hunspell
>> aswell:
>>
>>
>> http://wiki.services.openoffice.org/wiki/Dictionaries#Arabic_.28North_Africa_and_Middle_East.29
>> (Conversion of Buckwalter's Arabic morphological analyser) 2006-02-08
>>
>> And running that gives me this error : (again the affix file)
>>
>> ERROR:  wrong affix file format for flag
>> CONTEXT:  line 560 of configuration file "C:/Program
>> Files/PostgreSQL/8.3/share/tsearch_data/arabic_utf8_alias.affix": "PFX 1013
>> Y 6
>> "
>>
>> / Moe
>>
>>
>>
>> On Mon, Feb 2, 2009 at 2:41 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:
>>
>>> Mohamed,
>>>
>>> We are looking on the problem.
>>>
>>> Oleg
>>>
>>> On Mon, 2 Feb 2009, Mohamed wrote:
>>>
>>>  No, I don't. But the ts_lexize don't return anything so I figured there
>>>> must
>>>> be an error somehow.
>>>> I think we are using the same dictionary + that I am using the stopwords
>>>> file and a different affix file, because using the hunspell (ayaspell)
>>>> .aff
>>>> gives me this error :
>>>>
>>>> ERROR:  wrong affix file format for flag
>>>> CONTEXT:  line 42 of configuration file "C:/Program
>>>> Files/PostgreSQL/8.3/share/tsearch_data/hunarabic.affix": "PFX Aa Y 40
>>>>
>>>> / Moe
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Feb 2, 2009 at 12:13 PM, Daniel Chiaramello <
>>>> daniel.chiaramello@golog.net> wrote:
>>>>
>>>>   Hi Mohamed.
>>>>>
>>>>> I don't know where you get the dictionary - I unsuccessfully tried the
>>>>> OpenOffice one by myself (the Ayaspell one), and I had no arabic
>>>>> stopwords
>>>>> file.
>>>>>
>>>>> Renaming the file is supposed to be enough (I did it successfully for
>>>>> Thailandese dictionary) - the ".aff'" file becoming the ".affix" one.
>>>>> When I tried to create the dictionary:
>>>>>
>>>>> CREATE TEXT SEARCH DICTIONARY ar_ispell (
>>>>>    TEMPLATE = ispell,
>>>>>    DictFile = ar_utf8,
>>>>>    AffFile = ar_utf8,
>>>>>    StopWords = english
>>>>> );
>>>>>
>>>>> I had an error:
>>>>>
>>>>> ERREUR:  mauvais format de fichier affixe pour le drapeau
>>>>> CONTEXTE : ligne 42 du fichier de configuration ?
>>>>> /usr/share/pgsql/tsearch_data/ar_utf8.affix ? : ? PFX Aa      Y       40
>>>>>
>>>>> (which means Bad format of Affix file for flag, line 42 of configuration
>>>>> file)
>>>>>
>>>>> Do you have an error when creating your dictionary?
>>>>>
>>>>> Daniel
>>>>>
>>>>> Mohamed a ?crit :
>>>>>
>>>>>
>>>>> I have ran into some problems here.
>>>>>  I am trying to implement arabic fulltext search on three columns.
>>>>>
>>>>>  To create a dictionary I have a hunspell dictionary and and arabic stop
>>>>> file.
>>>>>
>>>>>  CREATE TEXT SEARCH DICTIONARY hunspell_dic (
>>>>>    TEMPLATE = ispell,
>>>>>    DictFile = hunarabic,
>>>>>    AffFile = hunarabic,
>>>>>    StopWords = arabic
>>>>> );
>>>>>
>>>>>
>>>>>  1) The problem is that the hunspell contains a .dic and a .aff file but
>>>>> the configuration requeries a .dict and .affix file. I have tried to
>>>>> change
>>>>> the endings but with no success.
>>>>>
>>>>> 2) ts_lexize('hunspell_dic', 'ARABIC WORD') returns nothing
>>>>>
>>>>> 3) How can I convert my .dic and .aff to valid .dict and .affix ?
>>>>>
>>>>> 4) I have read that when using dictionaries, if a word is not recognized
>>>>> by
>>>>> any dictionary it will not be indexed. I find that troublesome. I would
>>>>> like
>>>>> everything but the stop words to be indexed. I guess this might be a
>>>>> step
>>>>> that I am not ready for yet, but just wanted to put it out there.
>>>>>
>>>>>
>>>>>
>>>>>  Also I would like to know how the process of the fulltext search
>>>>> implementation looks like, from config to search.
>>>>>
>>>>>  Create dictionary, then a text configuration, add dic to configuration,
>>>>> index columns with gin or gist ...
>>>>>
>>>>>  How does a search look like? Does it match against the gin/gist index.
>>>>> Have that index been built up using the dictionary/configuration, or is
>>>>> the
>>>>> dictionary only used on search frases?
>>>>>
>>>>>  / Moe
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>        Regards,
>>>                Oleg
>>> _____________________________________________________________
>>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>>> Sternberg Astronomical Institute, Moscow University, Russia
>>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>>> phone: +007(495)939-16-83, +007(495)939-23-83
>>>
>>
>>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

В списке pgsql-general по дате отправления:

Предыдущее
От: Mohamed
Дата:
Сообщение: Re: Fulltext search configuration
Следующее
От: Mohamed
Дата:
Сообщение: Re: Fulltext search configuration