Re: tsearch2 document and word limit

Поиск
Список
Период
Сортировка
От Teodor Sigaev
Тема Re: tsearch2 document and word limit
Дата
Msg-id 41F8FDA6.3060803@sigaev.ru
обсуждение исходный текст
Ответ на tsearch2 document and word limit  ("David Beavan" <davidbeavan@hotmail.com>)
Ответы Re: tsearch2 document and word limit  ("David Beavan" <davidbeavan@hotmail.com>)
Список pgsql-general
Sorry, but no way except patching sources of tsearch2....

Tsearch2 (not GiST) has pointed limitations  to save storage size mainly and to
reduce rank calculation time. Our (oleg and me) expirience in search engines
shows, that full positions info for long document hasn't a big importance to
ranking.
Did you try normalize rank by length of document?

http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch2-ref.html:
...
Both of these ranking functions take an integer normalization option that
specifies whether a document's length should impact its rank. This is often
desirable, since a hundred-word document with five instances of a search word is
probably more relevant than a thousand-word document with five instances. The
option can have the values:
     * 0 (the default) ignores document length.
     * 1 divides the rank by the logarithm of the length.
     * 2 divides the rank by the length itself.
...



David Beavan wrote:
> Hi
>
> I have been toying with the implementation of tsearch2 to index some
> large text documents. I have run into problems where I am up against
> limits:
>
> no more than 255 occurrences of a particular word are indexed.
> word positions greater than 16384 are added as position 16384 and end up
> as one occurrence.
>
> These are problematic because I need to rank based on number of word
> occurrences, and these limits are preventing this.
>
> Does anybody have any suggestions as to how this could be worked around,
> is the limit due to gist? would openfts help (im guessing not)?
>
> Failing that does anybody have experience of combining another text
> indexing package with postgresql?
>
> Dave
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

В списке pgsql-general по дате отправления:

Предыдущее
От: "Sandeep Gaikwad"
Дата:
Сообщение: URL activation through trigger
Следующее
От: "David Beavan"
Дата:
Сообщение: Re: tsearch2 document and word limit