reducing result set of tsvector @@ tsquery avoiding to use ts_rank

Поиск

Список

Период

Сортировка

От	Ivan Sergio Borgonovo
Тема	reducing result set of tsvector @@ tsquery avoiding to use ts_rank
Дата	2 февраля 2010 г. 00:13:12
Msg-id	20100202021113.7162622a@dawn.webthatworks.it обсуждение исходный текст
Список	pgsql-general

Дерево обсуждения

I've finally made some fruitful steps in writing C functions that
manipulate tsvectors.

I'd like to build up a simple system based on ts_rank to find
similarities between documents.

I've some documents containing 4 parts.

I build a tsvector the "usual way"

setweight(tsvector(field1), 'A') |
setweight(tsvector(field2), 'B') |

etc...

then I'd like to build a query similar to:

tsvector @@ to_tsquery(
  'field1_lexeme1':A | 'field1_lexeme2':A | ...
  'field2_lexeme2':B | 'field2_lexeme2':B | ...

Anyway so many OR are going to return a lot of rows and filtering on
rank is "too late" for performances.

One way to shrink the result set would be to build a query that
requires at least 2 lexemes to be present:

  'field1_lexeme1':A & ('field1_lexeme2':A | ...
  'field2_lexeme2':B | 'field2_lexeme2':B | ...
   ) |
   'field1_lexeme2':A & ('field1_lexeme1 | ...
   ) |

I don't have very long documents and this looks feasible but I'd
like to hear any other suggestion to shrink the result set further
before filtering on ts_rank... especially suggestions that will
exploit the index.

So any suggestion that could reduce the result set before filtering
on rank is welcome and I'll try to put them in practice in some
C functions that taken a tsvector build up a tsquery to be used to
find similar documents.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

В списке pgsql-general по дате отправления:

Предыдущее

От: Tom Lane
Дата: 01 февраля 2010 г., 23:31:31
Сообщение: Re: statement_timeout problem

Следующее

От: Greg Smith
Дата: 02 февраля 2010 г., 02:22:17
Сообщение: Re: Connect to Postgres problems

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

reducing result set of tsvector @@ tsquery avoiding to use ts_rank

Предыдущее

Следующее