Re: Full text: Ispell dictionary

Поиск
Список
Период
Сортировка
От Tim van der Linden
Тема Re: Full text: Ispell dictionary
Дата
Msg-id 20140503064523.ed01ea1a8d5a530c6688964d@shisaa.jp
обсуждение исходный текст
Ответ на Re: Full text: Ispell dictionary  (Oleg Bartunov <obartunov@gmail.com>)
Ответы Re: Full text: Ispell dictionary
Список pgsql-general
On Fri, 2 May 2014 21:12:56 +0400
Oleg Bartunov <obartunov@gmail.com> wrote:

Hi Oleg

Thanks for the response!

> Yes, it's normal for ispell dictionary, think about morphological dictionary.

Hmm, I see, that makes sense. I thought the morphological aspect of the Ispell only dealt with splitting up compound
words,but it also deals with deriving the word to a more "stem" like form, correct? 

As a last question on this, is there a way to disable this dictionary to emit multiple lexemes?

The reason I am asking is because in my (fairly new) understanding of PostgreSQL's full text it is always best to have
asfew lexemes as possible saved in the vector. This to get smaller indexes and faster matching afterwards. Also, if you
runa tsquery afterwards to, you can still employ the power of these multiple lexemes to find a match. 

Or...probably answering my own question...if I do not desire this behavior I should maybe not use Ispell and simply use
anotherdictionary :) 

Thanks again.

Cheers,
Tim

> On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden <tim@shisaa.jp> wrote:
> > Good morning/afternoon all
> >
> > I am currently writing a few articles about PostgreSQL's full text capabilities and have a question about the
Ispelldictionary which I cannot seem to find an answer to. It is probably a very simple issue, so forgive my ignorance. 
> >
> > In one article I am explaining about dictionaries and I have setup a sample configuration which maps most token
categoriesto only use a Ispell dictionary (timusan_ispell) which has a default configuration: 
> >
> > CREATE TEXT SEARCH DICTIONARY timusan_ispell (
> >         TEMPLATE = ispell,
> >         DictFile = en_us,
> >         AffFile = en_us,
> >         StopWords = english
> > );
> >
> > When I run a simple query like "SELECT to_tsvector('timusan-ispell','smiling')" I get back the following tsvector:
> >
> > 'smile':1 'smiling':1
> >
> > As you can see I get two lexemes with the same pointer.
> > The question here is: why does this happen?
> >
> > Is it normal behavior for the Ispell dictionary to emit multiple lexemes for a single token? And if so, is this
efficient?I mean, why could it not simply save one lexeme 'smile' which (same as the snowball dictionary) would match
'smiling'as well if later matched with the accompanying tsquery? 
> >
> > Thanks!
> >
> > Cheers,
> > Tim
> >
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general


--
Tim van der Linden <tim@shisaa.jp>


В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: Timeouts after upgrade from 9.0 to 9.3
Следующее
От: Oleg Bartunov
Дата:
Сообщение: Re: Full text: Ispell dictionary