Re: Fulltext search configuration

Поиск
Список
Период
Сортировка
От Oleg Bartunov
Тема Re: Fulltext search configuration
Дата
Msg-id Pine.LNX.4.64.0902021813530.4158@sn.sai.msu.ru
обсуждение исходный текст
Ответ на Re: Fulltext search configuration  (Mohamed <mohamed5432154321@gmail.com>)
Ответы Re: Fulltext search configuration  (Oleg Bartunov <oleg@sai.msu.su>)
Список pgsql-general
On Mon, 2 Feb 2009, Mohamed wrote:

> Hehe, ok..
> I don't know either but I took some lines from Al-Jazeera :
> http://aljazeera.net/portal
>
> just made the change you said and created it successfully and tried this :
>
> select ts_lexize('ayaspell', '?????? ??????? ????? ????? ?? ???? ?????????
> ?????')
>
> but I got nothing... :(

Mohamed, what did you expect from ts_lexize ?  Please, provide us valuable
information, else we can't help you.

>
> Is there a way of making sure that words not recognized also gets
> indexed/searched for ? (Not that I think this is the problem)

yes


>
> / Moe
>
>
>
> On Mon, Feb 2, 2009 at 3:50 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:
>
>> Mohamed,
>>
>> comment line in ar.affix
>> #FLAG   long
>> and creation of ispell dictionary will work. This is temp, solution. Teodor
>> is working on fixing affix autorecognizing.
>>
>> I can't say anything about testing, since somebody should provide
>> first test case. I don't know how to type arabic :)
>>
>>
>> Oleg
>>
>> On Mon, 2 Feb 2009, Mohamed wrote:
>>
>>  Oleg, like I mentioned earlier. I have a different .affix file that I got
>>> from Andrew with the stop file and I get no errors creating the dictionary
>>> using that one but I get nothing out from ts_lexize.
>>> The size on that one is : 406,219 bytes
>>> And the size on the hunspell one (first) : 406,229 bytes
>>>
>>> Little to close, don't you think ?
>>>
>>> It might be that the arabic hunspell (ayaspell) affix file is damaged on
>>> some lines and I got the fixed one from Andrew.
>>>
>>> Just wanted to let you know.
>>>
>>> / Moe
>>>
>>>
>>>
>>> On Mon, Feb 2, 2009 at 3:25 PM, Mohamed <mohamed5432154321@gmail.com>
>>> wrote:
>>>
>>>  Ok, thank you Oleg.
>>>> I have another dictionary package which is a conversion to hunspell
>>>> aswell:
>>>>
>>>>
>>>>
>>>> http://wiki.services.openoffice.org/wiki/Dictionaries#Arabic_.28North_Africa_and_Middle_East.29
>>>> (Conversion of Buckwalter's Arabic morphological analyser) 2006-02-08
>>>>
>>>> And running that gives me this error : (again the affix file)
>>>>
>>>> ERROR:  wrong affix file format for flag
>>>> CONTEXT:  line 560 of configuration file "C:/Program
>>>> Files/PostgreSQL/8.3/share/tsearch_data/arabic_utf8_alias.affix": "PFX
>>>> 1013
>>>> Y 6
>>>> "
>>>>
>>>> / Moe
>>>>
>>>>
>>>>
>>>> On Mon, Feb 2, 2009 at 2:41 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:
>>>>
>>>>  Mohamed,
>>>>>
>>>>> We are looking on the problem.
>>>>>
>>>>> Oleg
>>>>>
>>>>> On Mon, 2 Feb 2009, Mohamed wrote:
>>>>>
>>>>>  No, I don't. But the ts_lexize don't return anything so I figured there
>>>>>
>>>>>> must
>>>>>> be an error somehow.
>>>>>> I think we are using the same dictionary + that I am using the
>>>>>> stopwords
>>>>>> file and a different affix file, because using the hunspell (ayaspell)
>>>>>> .aff
>>>>>> gives me this error :
>>>>>>
>>>>>> ERROR:  wrong affix file format for flag
>>>>>> CONTEXT:  line 42 of configuration file "C:/Program
>>>>>> Files/PostgreSQL/8.3/share/tsearch_data/hunarabic.affix": "PFX Aa Y 40
>>>>>>
>>>>>> / Moe
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Feb 2, 2009 at 12:13 PM, Daniel Chiaramello <
>>>>>> daniel.chiaramello@golog.net> wrote:
>>>>>>
>>>>>>  Hi Mohamed.
>>>>>>
>>>>>>>
>>>>>>> I don't know where you get the dictionary - I unsuccessfully tried the
>>>>>>> OpenOffice one by myself (the Ayaspell one), and I had no arabic
>>>>>>> stopwords
>>>>>>> file.
>>>>>>>
>>>>>>> Renaming the file is supposed to be enough (I did it successfully for
>>>>>>> Thailandese dictionary) - the ".aff'" file becoming the ".affix" one.
>>>>>>> When I tried to create the dictionary:
>>>>>>>
>>>>>>> CREATE TEXT SEARCH DICTIONARY ar_ispell (
>>>>>>>   TEMPLATE = ispell,
>>>>>>>   DictFile = ar_utf8,
>>>>>>>   AffFile = ar_utf8,
>>>>>>>   StopWords = english
>>>>>>> );
>>>>>>>
>>>>>>> I had an error:
>>>>>>>
>>>>>>> ERREUR:  mauvais format de fichier affixe pour le drapeau
>>>>>>> CONTEXTE : ligne 42 du fichier de configuration ?
>>>>>>> /usr/share/pgsql/tsearch_data/ar_utf8.affix ? : ? PFX Aa      Y
>>>>>>> 40
>>>>>>>
>>>>>>> (which means Bad format of Affix file for flag, line 42 of
>>>>>>> configuration
>>>>>>> file)
>>>>>>>
>>>>>>> Do you have an error when creating your dictionary?
>>>>>>>
>>>>>>> Daniel
>>>>>>>
>>>>>>> Mohamed a ?crit :
>>>>>>>
>>>>>>>
>>>>>>> I have ran into some problems here.
>>>>>>>  I am trying to implement arabic fulltext search on three columns.
>>>>>>>
>>>>>>>  To create a dictionary I have a hunspell dictionary and and arabic
>>>>>>> stop
>>>>>>> file.
>>>>>>>
>>>>>>>  CREATE TEXT SEARCH DICTIONARY hunspell_dic (
>>>>>>>   TEMPLATE = ispell,
>>>>>>>   DictFile = hunarabic,
>>>>>>>   AffFile = hunarabic,
>>>>>>>   StopWords = arabic
>>>>>>> );
>>>>>>>
>>>>>>>
>>>>>>>  1) The problem is that the hunspell contains a .dic and a .aff file
>>>>>>> but
>>>>>>> the configuration requeries a .dict and .affix file. I have tried to
>>>>>>> change
>>>>>>> the endings but with no success.
>>>>>>>
>>>>>>> 2) ts_lexize('hunspell_dic', 'ARABIC WORD') returns nothing
>>>>>>>
>>>>>>> 3) How can I convert my .dic and .aff to valid .dict and .affix ?
>>>>>>>
>>>>>>> 4) I have read that when using dictionaries, if a word is not
>>>>>>> recognized
>>>>>>> by
>>>>>>> any dictionary it will not be indexed. I find that troublesome. I
>>>>>>> would
>>>>>>> like
>>>>>>> everything but the stop words to be indexed. I guess this might be a
>>>>>>> step
>>>>>>> that I am not ready for yet, but just wanted to put it out there.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  Also I would like to know how the process of the fulltext search
>>>>>>> implementation looks like, from config to search.
>>>>>>>
>>>>>>>  Create dictionary, then a text configuration, add dic to
>>>>>>> configuration,
>>>>>>> index columns with gin or gist ...
>>>>>>>
>>>>>>>  How does a search look like? Does it match against the gin/gist
>>>>>>> index.
>>>>>>> Have that index been built up using the dictionary/configuration, or
>>>>>>> is
>>>>>>> the
>>>>>>> dictionary only used on search frases?
>>>>>>>
>>>>>>>  / Moe
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>        Regards,
>>>>>               Oleg
>>>>> _____________________________________________________________
>>>>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>>>>> Sternberg Astronomical Institute, Moscow University, Russia
>>>>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>>>>> phone: +007(495)939-16-83, +007(495)939-23-83
>>>>>
>>>>>
>>>>
>>>>
>>>
>>        Regards,
>>                Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

В списке pgsql-general по дате отправления:

Предыдущее
От: "Scot Kreienkamp"
Дата:
Сообщение: Re: Warm Standby question
Следующее
От: Oleg Bartunov
Дата:
Сообщение: Re: Fulltext search configuration