Обсуждение: fulltext search stemming/ spelling problems

Поиск
Список
Период
Сортировка

fulltext search stemming/ spelling problems

От
Corin
Дата:
Hi!

I'm using postgres 8.4.3 and try to get stemming/ wrong word correction
working.

I already installed the myspell dictionaries using apt-get and created
postgres dictionaries like this:

Fulltext search configuration »public.english_ispell«
Parser: »pg_catalog.default«
      Token      |            Dictionaries
-----------------+------------------------------------
 asciihword      | english_ispell,english_stem,simple
 asciiword       | english_ispell,english_stem,simple
 email           | simple
 file            | simple
 float           | simple
 host            | simple
 hword           | english_ispell,english_stem,simple
 hword_asciipart | english_ispell,english_stem,simple
 hword_numpart   | simple
 hword_part      | english_ispell,english_stem,simple
 int             | simple
 numhword        | simple
 numword         | simple
 sfloat          | simple
 uint            | simple
 url             | simple
 url_path        | simple
 version         | simple
 word            | english_ispell,english_stem,simple

But when I do, for example, SELECT to_tsvector('english_ispell',
'gitar') the result is only:
 'gitar':1

Shouldn't the word be corrected to 'guitar'?

SELECT plainto_tsquery('english_ispell','gitar') doesn't work neither:
 'gitar'

Thanks,
Corin

Re: fulltext search stemming/ spelling problems

От
Oleg Bartunov
Дата:
On Thu, 8 Apr 2010, Corin wrote:

> Hi!
>
> I'm using postgres 8.4.3 and try to get stemming/ wrong word correction
> working.
>
> I already installed the myspell dictionaries using apt-get and created
> postgres dictionaries like this:
>
> Fulltext search configuration ?public.english_ispell?
> Parser: ?pg_catalog.default?
>     Token      |            Dictionaries
> -----------------+------------------------------------
> asciihword      | english_ispell,english_stem,simple
> asciiword       | english_ispell,english_stem,simple
> email           | simple
> file            | simple
> float           | simple
> host            | simple
> hword           | english_ispell,english_stem,simple
> hword_asciipart | english_ispell,english_stem,simple
> hword_numpart   | simple
> hword_part      | english_ispell,english_stem,simple
> int             | simple
> numhword        | simple
> numword         | simple
> sfloat          | simple
> uint            | simple
> url             | simple
> url_path        | simple
> version         | simple
> word            | english_ispell,english_stem,simple
>
> But when I do, for example, SELECT to_tsvector('english_ispell', 'gitar') the
> result is only:
> 'gitar':1
>
> Shouldn't the word be corrected to 'guitar'?

english_ispell dictionary is a morphology kind of dictionary ! Read docs.
Also, simple dictionary will never invoked, since english_stem dictionary
recognizes everything !


>
> SELECT plainto_tsquery('english_ispell','gitar') doesn't work neither:
> 'gitar'

Better, use ts_debug() function or ts_dict() for testing.

>
> Thanks,
> Corin
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: fulltext search stemming/ spelling problems

От
Corin
Дата:
On 08.04.2010 20:15, Oleg Bartunov wrote:
> On Thu, 8 Apr 2010, Corin wrote:
>
> english_ispell dictionary is a morphology kind of dictionary ! Read docs.
> Also, simple dictionary will never invoked, since english_stem dictionary
> recognizes everything !
I'm not sure what you mean with 'morphology'. I sure read the docs but
couldn't find anything about 'morphology disctionaries'.

I created it myself with the following commands, after I installed the
ispell dictionaries using "apt-get":

CREATE TEXT SEARCH DICTIONARY english_ispell (
    TEMPLATE = ispell,
    DictFile = system_en_us,
    AffFile = system_en_us
);

CREATE TEXT SEARCH CONFIGURATION english_ispell ( COPY =
pg_catalog.english );
ALTER TEXT SEARCH CONFIGURATION english_ispell
  ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword,
hword_part WITH english_ispell, english_stem;

Thank's for the hint with simple dictionary. I'll remove it - but when
it's never triggered, I gues it won't solve my problem neither?
>
> Better, use ts_debug() function or ts_dict() for testing.
ts_debug shows:
SELECT ts_debug('english_ispell','gitar');
(asciiword,"Word, all
ASCII",gitar,"{english_ispell,english_stem}",english_stem,{gitar})
(1 line)

ts_dict does not seem to exist, I neither couldn't find it in the docs.
>
>     Regards,
>         Oleg
Thanks,
Corin


Re: fulltext search stemming/ spelling problems

От
Oleg Bartunov
Дата:
On Thu, 8 Apr 2010, Corin wrote:

> On 08.04.2010 20:15, Oleg Bartunov wrote:
>> On Thu, 8 Apr 2010, Corin wrote:
>>
>> english_ispell dictionary is a morphology kind of dictionary ! Read docs.
>> Also, simple dictionary will never invoked, since english_stem dictionary
>> recognizes everything !
> I'm not sure what you mean with 'morphology'. I sure read the docs but
> couldn't find anything about 'morphology disctionaries'.

it means, that (from
http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY)

12.6.5. Ispell Dictionary

The Ispell dictionary template supports morphological dictionaries, which can normalize many different linguistic forms
ofa word into the same lexeme. For example, an English Ispell dictionary can match all declensions and conjugations of
thesearch term bank, e.g., banking, banked, banks, banks', and bank's. 

you confused with the name !

>
> I created it myself with the following commands, after I installed the ispell
> dictionaries using "apt-get":
>
> CREATE TEXT SEARCH DICTIONARY english_ispell (
>   TEMPLATE = ispell,
>   DictFile = system_en_us,
>   AffFile = system_en_us
> );
>
> CREATE TEXT SEARCH CONFIGURATION english_ispell ( COPY = pg_catalog.english
> );
> ALTER TEXT SEARCH CONFIGURATION english_ispell
> ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword,
> hword_part WITH english_ispell, english_stem;
>
> Thank's for the hint with simple dictionary. I'll remove it - but when it's
> never triggered, I gues it won't solve my problem neither?
>>
>> Better, use ts_debug() function or ts_dict() for testing.
> ts_debug shows:
> SELECT ts_debug('english_ispell','gitar');
> (asciiword,"Word, all
> ASCII",gitar,"{english_ispell,english_stem}",english_stem,{gitar})
> (1 line)
>
> ts_dict does not seem to exist, I neither couldn't find it in the docs.

sorry, ts_lexize

>>
>>     Regards,
>>         Oleg
> Thanks,
> Corin
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: fulltext search stemming/ spelling problems

От
Corin
Дата:
On 08.04.2010 21:27, Oleg Bartunov wrote:
> it means, that (from
> http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY)
>
>
> 12.6.5. Ispell Dictionary
>
> The Ispell dictionary template supports morphological dictionaries,
> which can normalize many different linguistic forms of a word into the
> same lexeme. For example, an English Ispell dictionary can match all
> declensions and conjugations of the search term bank, e.g., banking,
> banked, banks, banks', and bank's.
I already read this but I don't know how to solve my problems with this
information.

SELECT ts_lexize('english_ispell','guitar');
{guitar}
(1 line)

SELECT ts_lexize('english_ispell','bank');
{bank}
(1 line)

SELECT ts_debug('english_ispell','bank');
(asciiword,"Word, all
ASCII",bank,"{english_ispell,english_stem}",english_ispell,{bank})
(1 line)

SELECT plainto_tsquery('english_ispell','bank');
'bank'
(1 line)
>     Regards,
>         Oleg
It would be very nice if you (or anyone else) could provide me with
concrete instructions or any howto. What can I do to find the error in
my setup? What output should I expect from the above comments if
everything worked correctly?

Thanks,
Corin