Re: Limitation on number of positions (tsearch)

Поиск
Список
Период
Сортировка
От Teodor Sigaev
Тема Re: Limitation on number of positions (tsearch)
Дата
Msg-id 46E92622.5030601@sigaev.ru
обсуждение исходный текст
Ответ на Limitation on number of positions (tsearch)  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
Список pgsql-hackers
> Why is there a limitation of 256 positions per lexeme in a tsvector?
> There doesn't seem to be a technical reason for that. WordEntryPosVector
> uses a uint16 to store the number of positions, so it go up to 65535.

For two reasons:
- Ranking might become very slow if number of position is big
- From practice: if word is very frequent on document then with high probability  this is a stop word or (case of
internet-widesearch engines) document is a spam.
 

That's common practice of search engines to limit number of word's positions, 
because increasing it doesn't give advantage in term of ranking
and cause trouble from increasing of storage size.
-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Heikki Linnakangas"
Дата:
Сообщение: Limitation on number of positions (tsearch)
Следующее
От: Magnus Hagander
Дата:
Сообщение: Re: Preparation for PostgreSQL releases 8.2.5, 8.1.10, 8.0.14, 7.4.18, 7.3.20