Re: english parser in text search: support for multiple words in the same position
| От | Tom Lane |
|---|---|
| Тема | Re: english parser in text search: support for multiple words in the same position |
| Дата | |
| Msg-id | 15782.1280758804@sss.pgh.pa.us обсуждение |
| Ответ на | Re: english parser in text search: support for multiple words in the same position (Sushant Sinha <sushant354@gmail.com>) |
| Ответы |
Re: english parser in text search: support for multiple
words in the same position
|
| Список | pgsql-hackers |
Sushant Sinha <sushant354@gmail.com> writes:
>> This would needlessly increase the number of tokens. Instead you'd
>> better make it work like compound word support, having just "wikipedia"
>> and "org" as tokens.
> The current text parser already returns url and url_path. That already
> increases the number of unique tokens. I am only asking for adding of
> normal english words as well so that if someone types only "wikipedia"
> he gets a match.
The suggestion to make it work like compound words is still a good one,
ie given wikipedia.org you'd get back
host wikipedia.orghost-part wikipediahost-part org
not just the "host" token as at present.
Then the user could decide whether he needed to index hostname
components or not, by choosing whether to forward hostname-part
tokens to a dictionary or just discard them.
If you submit a patch that tries to force the issue by classifying
hostname parts as plain words, it'll probably get rejected out of
hand on backwards-compatibility grounds.
regards, tom lane
В списке pgsql-hackers по дате отправления: