Re: tsearch2, large data and indexes

Поиск
Список
Период
Сортировка
От Ivan Voras
Тема Re: tsearch2, large data and indexes
Дата
Msg-id CAF-QHFVYw5-2MwAVqCpZp8spyEfVP_k918naG=XV0ypTe=gOvA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: tsearch2, large data and indexes  (Jeff Janes <jeff.janes@gmail.com>)
Ответы Re: tsearch2, large data and indexes  (Matheus de Oliveira <matioli.matheus@gmail.com>)
Re: tsearch2, large data and indexes  (Sergey Konoplev <gray.ru@gmail.com>)
Список pgsql-performance
On 22 April 2014 17:58, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Tue, Apr 22, 2014 at 12:57 AM, Ivan Voras <ivoras@freebsd.org> wrote:
>>
>> On 22 April 2014 08:40, Heikki Linnakangas <hlinnakangas@vmware.com>
>> wrote:
>> > On 04/20/2014 02:15 AM, Ivan Voras wrote:
>> >> More details: after thinking about it some more, it might have
>> >> something to do with tsearch2 and indexes: the large data in this case
>> >> is a tsvector, indexed with GIN, and the query plan involves a
>> >> re-check condition.
>
>
> I think bitmap scans always insert a recheck, do to the possibility of
> bitmap overflow.
>
> But that doesn't mean that it ever got triggered.  In 9.4., explain
> (analyze) will report on the overflows.

Ok, I found out what is happening, quoting from the documentation:

"GIN indexes are not lossy for standard queries, but their performance
depends logarithmically on the number of unique words. (However, GIN
indexes store only the words (lexemes) oftsvector values, and not
their weight labels. Thus a table row recheck is needed when using a
query that involves weights.)"

My query doesn't have weights but the tsvector in the table has them -
I take it this is what is meant by "involves weights."

So... there's really no way for tsearch2 to produce results based on
the index alone, without recheck? This is... limiting.

>> Yes, I've read about tsearch2 and GIN indexes and there shouldn't be a
>> recheck condition - but there is.
>> This is the query:
>>
>> SELECT documents.id, title, raw_data, q, ts_rank(fts_data, q, 4) AS
>> rank, html_filename
>>             FROM documents, to_tsquery('document') AS q
>>             WHERE fts_data @@ q
>>          ORDER BY rank DESC  LIMIT 25;
>>
>> And here is the explain analyze: http://explain.depesz.com/s/4xm
>> It clearly shows a bitmap index scan operation is immediately followed
>> by a recheck operation AND that the recheck operation actually does
>> something, because it reduces the number of records from 61 to 58
>> (!!!).
>
>
> That could be ordinary visibility checking, not qual rechecking.

Visibility as in transaction-wise? It's not, this was the only client
connected to the dev server, and the only transaction(s) happening.


В списке pgsql-performance по дате отправления:

Предыдущее
От: hubert depesz lubaczewski
Дата:
Сообщение: Re: Best practices for update timestamp with/without triggers
Следующее
От: Matheus de Oliveira
Дата:
Сообщение: Re: tsearch2, large data and indexes