Re: making tsearch2 dictionaries

Поиск
Список
Период
Сортировка
От Oleg Bartunov
Тема Re: making tsearch2 dictionaries
Дата
Msg-id Pine.GSO.4.58.0402171940440.17553@ra.sai.msu.su
обсуждение исходный текст
Ответ на Re: making tsearch2 dictionaries  (Ben <bench@silentmedia.com>)
Ответы Re: making tsearch2 dictionaries  (Ben <bench@silentmedia.com>)
Список pgsql-general
On Tue, 17 Feb 2004, Ben wrote:

> On Tue, 2004-02-17 at 03:15, Oleg Bartunov wrote:
>
> > Do you want '100' or 'hundred' will be fully equivalent ? So,
> > if you search '100' you will find document with 'hundred'. Interesting,
> > that you will find '123', because '123' will be 'one hundred twenty three'.
>
> Yeah, for a general case of documents I'm not sure how accurate it would
> make things, but I'm trying to index music artist names and song titles,
> where I'd get things like "3 Dog Night".... or is that "Three Dog
> Night"? :)
>
> > What's the problem ? You may configure which dictionaries and in what order
> > should be used for given type of token (pg_ts_cfgmap table).
> > Aha, I got your problem:
>
> > Once word is recognized by synonym dictionary it will not pass to
> > next dictionary ! This is how tsearch2 is working with any dictionary.
>
> Yep, that's my problem. :) And it seems that if I could pass the normal
> words into an ispell dictionary before passing them on to the en_stem
> dictionary, I'd get spell checking for free. Unless there's a better way
> to give "did you mean: <your search spelled correctly>?" results....?
>

If ispell dictionary recognizes a word, that word will not pass to en_stem.
We know how to add "query spelling feature" to tsearch2, just waiting
for sponsorships :) meanwhile, you could use our trgm module, which
implements trigram based spelling correction. You need to maintain
separate table with all words of interests (say, from tsvectors) and
search query words in that table using bestmatch finction.

> I know doing this would increase the size of the generated ts_vector,
> but for my case, where what I'm indexing is generally only a few words
> anyway, that's not an issue. As it is, I'm already going to get rid of
> the stop words file, so that I can actually find things like "The Who."
>
> How hard do you think it would be to change up the behavior to make this
> happen? I
>
> > What do you want from parser ?
>
> I want to be able to recognize symbols, such as the degree (ТА) and
> vulgar half (ТН) symbols.

You mean '(TA)', '(TH)' ?  I think it's not very difficult. What'd be
a token type ( parenthesis_word :?)

>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

В списке pgsql-general по дате отправления:

Предыдущее
От: Ben
Дата:
Сообщение: Re: making tsearch2 dictionaries
Следующее
От: "scott.marlowe"
Дата:
Сообщение: Re: psql, 7.4, and the \d command