Re: bitmap-index-scan faster than seq-scan on full-table-scan (gin index)

Поиск
Список
Период
Сортировка
От Jesper Krogh
Тема Re: bitmap-index-scan faster than seq-scan on full-table-scan (gin index)
Дата
Msg-id 4C04257F.6010607@krogh.cc
обсуждение исходный текст
Ответ на Re: bitmap-index-scan faster than seq-scan on full-table-scan (gin index)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 2010-05-31 22:09, Tom Lane wrote:
> Jesper Krogh<jesper@krogh.cc>  writes:
>    
>> Conceptually searching for the "full dataset" would always be fastest
>> solved by a seq-scan. The query planner enforces this so much, so not
>> even "enable_seqscan=off" can convince it to to something else.
>> ...
>> Would it be possible to implement the "Filtering" using the gin-index and
>> a subsequent visibillity-check as on the index-scan?
>>      
> You're failing to make any sense whatsoever.  If you're reading the full
> dataset, there is no filter condition.  If there is a potentially
> indexable filter condition, the planner will certainly consider that.
>    
Yes, you're totally right on that (about making sense).

But that is because of the "simplified" use-case described. A more elaborate
description ..
I have a table with has a set of colums attached to it typically
used for "sorting" these columns may also be inferred on the table
by a typical join condition and a document that is fts-indexed.
So the actual use-case is that people query for:

"give me the 1000 most recent documents matching <term>"

Term may in some cases be hugely trivial, only filtering away 0.001% of the
dataset resulting in a "index-scan on a btree date index" filtering on the
tsvector column for <term>".

Term may also be something really specific only returning a single
or a few documents and just pushing a post-sorting to get the ordering.

But in the case where the query-planner falls over to a "index-scan"
on one of the btree-indices it ends up reading over from the TOAST data.

Will the planner consider doing the index-scan(forward or backwards)
on a btree-index and filter using a gin-index instead of filtering directly
on the tuple-data?
(I haven't been able to enforce an query-plan that looks like that).

> Personally I think the issue here has got more to do with the
> non-immutability of the single-argument form of to_tsquery, which means
> it gets re-evaluated at every row during a seqscan.  Do your results
> change if you work with to_tsquery('english', ...)  (or whatever your
> default TS config is)?
>    

It is english..  and yes it did indeed change the results. So the 
expensive case
dropped from ~60s to ~28s and the cheap case from ~7.3s to ~4.3s, that
is quite surprising that such "small" change can have that huge impact. The
single-argument version should be forbidden.

But the performance ratio between the two cases is still the same.

The test was actually run with the preliminary gincostestimate-patch from
Oleg Bartunov so the actual cost estimates match way better now, but that
should not impact the actual runtime.

Thanks

-- 
Jesper


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Unexpected page allocation behavior on insert-only tables
Следующее
От: Andy Balholm
Дата:
Сообщение: Re: dividing money by money