Re: Fulltext search configuration

Поиск
Список
Период
Сортировка
От Mohamed
Тема Re: Fulltext search configuration
Дата
Msg-id 861fed220902020701g7f2136e3w9e83a25f7517da1b@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Fulltext search configuration  (Oleg Bartunov <oleg@sai.msu.su>)
Ответы Re: Fulltext search configuration  (Oleg Bartunov <oleg@sai.msu.su>)
Список pgsql-general
Hehe, ok..

I don't know either but I took some lines from Al-Jazeera : http://aljazeera.net/portal

just made the change you said and created it successfully and tried this : 

select ts_lexize('ayaspell', 'استشهد فلسطيني وأصيب ثلاثة في غارة إسرائيلية جديدة')

but I got nothing... :(

Is there a way of making sure that words not recognized also gets indexed/searched for ? (Not that I think this is the problem)

/ Moe



On Mon, Feb 2, 2009 at 3:50 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:
Mohamed,

comment line in ar.affix
#FLAG   long
and creation of ispell dictionary will work. This is temp, solution. Teodor is working on fixing affix autorecognizing.

I can't say anything about testing, since somebody should provide
first test case. I don't know how to type arabic :)


Oleg

On Mon, 2 Feb 2009, Mohamed wrote:

Oleg, like I mentioned earlier. I have a different .affix file that I got
from Andrew with the stop file and I get no errors creating the dictionary
using that one but I get nothing out from ts_lexize.
The size on that one is : 406,219 bytes
And the size on the hunspell one (first) : 406,229 bytes

Little to close, don't you think ?

It might be that the arabic hunspell (ayaspell) affix file is damaged on
some lines and I got the fixed one from Andrew.

Just wanted to let you know.

/ Moe



On Mon, Feb 2, 2009 at 3:25 PM, Mohamed <mohamed5432154321@gmail.com> wrote:

Ok, thank you Oleg.
I have another dictionary package which is a conversion to hunspell
aswell:


http://wiki.services.openoffice.org/wiki/Dictionaries#Arabic_.28North_Africa_and_Middle_East.29
(Conversion of Buckwalter's Arabic morphological analyser) 2006-02-08

And running that gives me this error : (again the affix file)

ERROR:  wrong affix file format for flag
CONTEXT:  line 560 of configuration file "C:/Program
Files/PostgreSQL/8.3/share/tsearch_data/arabic_utf8_alias.affix": "PFX 1013
Y 6
"

/ Moe



On Mon, Feb 2, 2009 at 2:41 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:

Mohamed,

We are looking on the problem.

Oleg

On Mon, 2 Feb 2009, Mohamed wrote:

 No, I don't. But the ts_lexize don't return anything so I figured there
must
be an error somehow.
I think we are using the same dictionary + that I am using the stopwords
file and a different affix file, because using the hunspell (ayaspell)
.aff
gives me this error :

ERROR:  wrong affix file format for flag
CONTEXT:  line 42 of configuration file "C:/Program
Files/PostgreSQL/8.3/share/tsearch_data/hunarabic.affix": "PFX Aa Y 40

/ Moe




On Mon, Feb 2, 2009 at 12:13 PM, Daniel Chiaramello <
daniel.chiaramello@golog.net> wrote:

 Hi Mohamed.

I don't know where you get the dictionary - I unsuccessfully tried the
OpenOffice one by myself (the Ayaspell one), and I had no arabic
stopwords
file.

Renaming the file is supposed to be enough (I did it successfully for
Thailandese dictionary) - the ".aff'" file becoming the ".affix" one.
When I tried to create the dictionary:

CREATE TEXT SEARCH DICTIONARY ar_ispell (
  TEMPLATE = ispell,
  DictFile = ar_utf8,
  AffFile = ar_utf8,
  StopWords = english
);

I had an error:

ERREUR:  mauvais format de fichier affixe pour le drapeau
CONTEXTE : ligne 42 du fichier de configuration ?
/usr/share/pgsql/tsearch_data/ar_utf8.affix ? : ? PFX Aa      Y       40

(which means Bad format of Affix file for flag, line 42 of configuration
file)

Do you have an error when creating your dictionary?

Daniel

Mohamed a ?crit :


I have ran into some problems here.
 I am trying to implement arabic fulltext search on three columns.

 To create a dictionary I have a hunspell dictionary and and arabic stop
file.

 CREATE TEXT SEARCH DICTIONARY hunspell_dic (
  TEMPLATE = ispell,
  DictFile = hunarabic,
  AffFile = hunarabic,
  StopWords = arabic
);


 1) The problem is that the hunspell contains a .dic and a .aff file but
the configuration requeries a .dict and .affix file. I have tried to
change
the endings but with no success.

2) ts_lexize('hunspell_dic', 'ARABIC WORD') returns nothing

3) How can I convert my .dic and .aff to valid .dict and .affix ?

4) I have read that when using dictionaries, if a word is not recognized
by
any dictionary it will not be indexed. I find that troublesome. I would
like
everything but the stop words to be indexed. I guess this might be a
step
that I am not ready for yet, but just wanted to put it out there.



 Also I would like to know how the process of the fulltext search
implementation looks like, from config to search.

 Create dictionary, then a text configuration, add dic to configuration,
index columns with gin or gist ...

 How does a search look like? Does it match against the gin/gist index.
Have that index been built up using the dictionary/configuration, or is
the
dictionary only used on search frases?

 / Moe








      Regards,
              Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83





       Regards,
               Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

В списке pgsql-general по дате отправления:

Предыдущее
От: Oleg Bartunov
Дата:
Сообщение: Re: Fulltext search configuration
Следующее
От: Greg Stark
Дата:
Сообщение: Re: R: R: complex custom aggregate function