Re: processing urls with tsearch2

Поиск
Список
Период
Сортировка
От Laimonas Simutis
Тема Re: processing urls with tsearch2
Дата
Msg-id 2b3e22740709131341r6ceed867m4cb3beef27f874db@mail.gmail.com
обсуждение исходный текст
Ответ на Re: processing urls with tsearch2  (Oleg Bartunov <oleg@sai.msu.su>)
Ответы Re: processing urls with tsearch2  ("Laimonas Simutis" <laimis@gmail.com>)
Список pgsql-general
Any way to install the dictionary without the make? As in is there binary versions of it available? I am running postgresql on windows servers...

On 9/13/07, Oleg Bartunov <oleg@sai.msu.su> wrote:
On Thu, 13 Sep 2007, Laimonas Simutis wrote:

> Hey guys,
>
> maybe anyone using tsearch2 could advise on this. With the default
> installation, url, host and some other tokens are processed with the simple
> dictionary. Thus term like mywebsite.com gets stored as 'mywebsite.com'. The
> parser correctly assigns token id of type host to the term, but then the
> dictionary the terms gets routed through is simple and what gets stored is
> mywebsite.com
>
> The questions are:
>
> 1) is there a dictionary available that I could utilize that will remove
> .com, .net, .org, etc? I could write one myself, but after seeing some
> sample dictionary implementations and C code I try to avoid, I got scared a
> bit.

Yes, we have dict_regex, which was developed by Sergey Karpov, see details
http://lynx.sao.ru/~karpov/software/postgres_dict_regex.html
It uses pcre library and you need to know perl regexps.

>
> 2) has anyone else dealt with this maybe in a different way?

sure, preprocess text using prefered language before passing to ro_tsvector

>
>
> Thanks for any suggestions and help,
>
> Laimis
>

        Regards,
                Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

В списке pgsql-general по дате отправления:

Предыдущее
От: Erik Jones
Дата:
Сообщение: Re: pg_standby observation
Следующее
От: "Carlo Stonebanks"
Дата:
Сообщение: 8.2.4 error restoring dump because of gin__int_ops