Re: Queryplan within FTS/GIN index -search.

Поиск
Список
Период
Сортировка
От Kevin Grittner
Тема Re: Queryplan within FTS/GIN index -search.
Дата
Msg-id 4AEFF056020000250002C195@gw.wicourts.gov
обсуждение исходный текст
Ответ на Re: Queryplan within FTS/GIN index -search.  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Queryplan within FTS/GIN index -search.
Список pgsql-performance
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Any sane text search application is going to try to filter out
>>> common words as stopwords; it's only the failure to do that that's
>>> making this run slow.
>
>> I'd rather have the index used for the selective test, and apply
>> the remaining tests to the rows retrieved from the heap.
>
> Uh, that was exactly my point.  Indexing common words is a waste.

Perhaps I'm missing something.  My point was that there are words
which are too common to be useful for index searches, yet uncommon
enough to usefully limit the results.  These words could typically
benefit from tsearch2 style parsing and dictionaries; so declaring
them as stop words would be bad from a functional perspective, yet
searching an index for them would be bad from a performance
perspective.

One solution would be for the users to rigorously identify all of
these words, include them on one stop word list but not another,
include *two* tsvector columns in the table (with and without the
"iffy" words), index only the one with the larger stop word list, and
generate two tsquery values to search the two different columns.  Best
of both worlds.  Sort of.  The staff time to create and maintain such
a list would obviously be costly and writing the queries would be
error-prone.

Second best would be to somehow recognize the "iffy" words and exclude
them from the index and the index search phase, but apply the check
when the row is retrieved from the heap.  I really have a hard time
seeing how the conditional exclusion from the index could be
accomplished, though.  Next best would be to let them fall into the
index, but exclude top level ANDed values from the index search,
applying them only to the recheck when the row is read from the heap.
The seems, at least conceptually, like it could be done.

-Kevin

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Queryplan within FTS/GIN index -search.
Следующее
От: "Kevin Grittner"
Дата:
Сообщение: Re: Problem with database performance, Debian 4gb ram ?