full-text search question

Поиск
Список
Период
Сортировка
От Sabbiolina
Тема full-text search question
Дата
Msg-id 269b27950806180549k323833c7n38a0d9f434542bf2@mail.gmail.com
обсуждение исходный текст
Ответы Re: full-text search question  (Oleg Bartunov <oleg@sai.msu.su>)
Re: full-text search question  (Andrew Sullivan <ajs@commandprompt.com>)
Список pgsql-admin

Hello,

 

I've seen that the default parser for the full-text search can identify e-mail addresses, hosts, URLs… but I have a serious problem with it:

 

Suppose I index the following sentence "the search engine I use the most is www.google.com"

 

And I search "google" no result is found.

Instead if I search "www.google.com" the record is found correctly.

 

I guess the reason is because the parser treats www.google.com as a single token (of type 'host') but as everyone can easily see the result of this is a major problem. In fact the word "google" actually is in the above sentence, and the end-user of the database obviously asks me "why does your FTS not find that record when I can clearly see that my search term is there?"

 

Reading the docs I've seen that the parser can produce multiple tokens for the same word (for example the word "make-up" produces 4 tokens: make-up, make, -, up)… why not doing the same with URLs and e-mails? Why www.google.com is only treated as a unique word? Why not producing multiple tokens like www.google.com, www, ., google, ., com? (obviously www and . can be nulled or stopworded).

 Does anybody know of a better parser for Postgres? Or at least a trick to make its FTS find the record above by searching only a part of the URL?

В списке pgsql-admin по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Installation problems RH5 PostgreSQL 8.3.1
Следующее
От: Oleg Bartunov
Дата:
Сообщение: Re: full-text search question