Re: making tsearch2 dictionaries

Поиск
Список
Период
Сортировка
От Oleg Bartunov
Тема Re: making tsearch2 dictionaries
Дата
Msg-id Pine.GSO.4.58.0402172046480.17553@ra.sai.msu.su
обсуждение исходный текст
Ответ на Re: making tsearch2 dictionaries  (Ben <bench@silentmedia.com>)
Ответы Re: making tsearch2 dictionaries  (Ben <bench@silentmedia.com>)
Список pgsql-general
On Tue, 17 Feb 2004, Ben wrote:

> On Tue, 17 Feb 2004, Oleg Bartunov wrote:
>
> > If ispell dictionary recognizes a word, that word will not pass to en_stem.
> > We know how to add "query spelling feature" to tsearch2, just waiting
> > for sponsorships :) meanwhile, you could use our trgm module, which
> > implements trigram based spelling correction. You need to maintain
> > separate table with all words of interests (say, from tsvectors) and
> > search query words in that table using bestmatch finction.
>
> Hm, I'll take a look at this approach. I take it you think piping
> dictionary output to more dictionaries in the chain is a bad idea? :)

it's unpredictable  and I still don't get your idea of pipilining, but
in general, I have nothing agains it.

>
> > > > What do you want from parser ?
> > >
> > > I want to be able to recognize symbols, such as the degree (ТА) and
> > > vulgar half (ТН) symbols.
> >
> > You mean '(TA)', '(TH)' ?  I think it's not very difficult. What'd be
> > a token type ( parenthesis_word :?)
>
> uh, not sure how you got (TA) and (TH)... if you look at the original
> message with utf-8 unicode encoding, the sympols come out fine. Or, maybe
> you'd just have better luck pointing a browser at a page like

Yup:)

> http://homepages.comnet.co.nz/~r-mahoney/bca_text/utf8.html. I want to be
> able to recognize a subset of these symbols, and I'd want another
> dictionary I'd make to handle the symbol token to return both the symbol
> and the common name as lexemes, in case people spell out the symbol
> instead of entering it.
>

Aha, the same way as we handle complex words with hyphen - we return
the whole word and its parts. So you need to introduce new type of token
in parser and use synonym dictionary which in one's turn will returns
the symbol token and human readable word.

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

В списке pgsql-general по дате отправления:

Предыдущее
От: Ben
Дата:
Сообщение: Re: making tsearch2 dictionaries
Следующее
От: andrew@pillette.com
Дата:
Сообщение: pg_dump and circular dependency