Re: lexemes in prefix search going through dictionary modifications

Поиск
Список
Период
Сортировка
От Florian Pflug
Тема Re: lexemes in prefix search going through dictionary modifications
Дата
Msg-id 5A1A958A-6F52-4112-A28C-540B6AFBA34A@phlo.org
обсуждение исходный текст
Ответ на Re: lexemes in prefix search going through dictionary modifications  (Sushant Sinha <sushant354@gmail.com>)
Ответы Re: lexemes in prefix search going through dictionary modifications
Список pgsql-hackers
On Oct25, 2011, at 18:47 , Sushant Sinha wrote:
> On Tue, 2011-10-25 at 18:05 +0200, Florian Pflug wrote:
>> On Oct25, 2011, at 17:26 , Sushant Sinha wrote:
>>> I am currently using the prefix search feature in text search. I find
>>> that the prefix characters are treated the same as a normal lexeme and
>>> passed through stemming and stopword dictionaries. This seems like a bug
>>> to me.
>> 
>> Hm, I don't think so. If they don't pass through stopword dictionaries,
>> then queries containing stopwords will fail to find any rows - which is
>> probably not what one would expect.
> 
> I think what you are saying a feature is really a bug. I am fairly sure
> that when someone says to_tsquery('english', 's:*') one is looking for
> an entry that has a *non-stopword* word that starts with 's'. And
> specially so in a text search configuration that eliminates stop words.

But the whole idea of removing stopwords from the query is that users
*don't* need to be aware of the precise list of stopwords. The way I see
it, stopwords are simply an optimization that helps reduce the size of
your fulltext index.

Assume, for example, that the postgres mailing list archive search used
tsearch (which I think it does, but I'm not sure). It'd then probably make
sense to add "postgres" to the list of stopwords, because it's bound to 
appear in nearly every mail. But wouldn't you want searched which include
'postgres*' to turn up empty? Quite certainly not.

> Does it even make sense to stem, abbreviate, synonym for a few letters?
> It will be so unpredictable.

That depends on the language. In german (my native tongue), one can
concatenate nouns to form new nouns. It's this not entirely unreasonable
that one would want the prefix to be stemmed to it's singular form before
being matched.

Also, suppose you're using a dictionary which corrects common typos. Who
says you wouldn't want that to be applied to prefix queries?

best regards,
Florian Pflug



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: Review: [PL/pgSQL] %TYPE and array declaration - second patch
Следующее
От: Kerem Kat
Дата:
Сообщение: Re: (PATCH) Adding CORRESPONDING to Set Operations