Re: GIN improvements part2: fast scan

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: GIN improvements part2: fast scan
Дата
Msg-id 51BF0A90.2000503@vmware.com
обсуждение исходный текст
Ответ на Re: GIN improvements part2: fast scan  (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы Re: GIN improvements part2: fast scan  (Alexander Korotkov <aekorotkov@gmail.com>)
Список pgsql-hackers
On 17.06.2013 15:55, Alexander Korotkov wrote:
> On Sat, Jun 15, 2013 at 2:55 AM, Alexander Korotkov<aekorotkov@gmail.com>wrote:
>
>> attached patch implementing "fast scan" technique for GIN. This is second
>> patch of GIN improvements, see the 1st one here:
>>
>> http://www.postgresql.org/message-id/CAPpHfduxv-iL7aedwPW0W5fXrWGAKfxijWM63_hZujaCRxnmFQ@mail.gmail.com
>> This patch allow to skip parts of posting trees when their scan is not
>> necessary. In particular, it solves "frequent_term&  rare_term" problem of
>> FTS.
>> It introduces new interface method pre_consistent which behaves like
>> consistent, but:
>> 1) allows false positives on input (check[])
>> 2) allowed to return false positives
>>
>> Some example: "frequent_term&  rare_term" becomes pretty fast.
>>
>> create table test as (select to_tsvector('english', 'bbb') as v from
>> generate_series(1,1000000));
>> insert into test (select to_tsvector('english', 'ddd') from
>> generate_series(1,10));
>> create index test_idx on test using gin (v);
>>
>> postgres=# explain analyze select * from test where v @@
>> to_tsquery('english', 'bbb&  ddd');
>>                                                        QUERY PLAN
>>
>>
-----------------------------------------------------------------------------------------------------------------------
>>   Bitmap Heap Scan on test  (cost=942.75..7280.63 rows=5000 width=17)
>> (actual time=0.458..0.461 rows=10 loops=1)
>>     Recheck Cond: (v @@ '''bbb''&  ''ddd'''::tsquery)
>>     ->   Bitmap Index Scan on test_idx  (cost=0.00..941.50 rows=5000
>> width=0) (actual time=0.449..0.449 rows=10 loops=1)
>>           Index Cond: (v @@ '''bbb''&  ''ddd'''::tsquery)
>>   Total runtime: 0.516 ms
>> (5 rows)
>>
>
> Attached version of patch has some refactoring and bug fixes.

Good timing, I just started looking at this.

I think you'll need to explain how this works. There are no docs, and 
almost no comments.

(and this shows how poorly I understand this, but) Why does this require 
the "additional information" patch? What extra information do you store 
on-disk, in the additional information?

The pre-consistent method is like the consistent method, but it allows 
false positives. I think that's because during the scan, before having 
scanned for all the keys, the gin AM doesn't yet know if the tuple 
contains all of the keys. So it passes the keys it doesn't yet know 
about as 'true' to pre-consistent. Could that be generalized, to pass a 
tri-state instead of a boolean for each key to the pre-consistent 
method? For each key, you would pass "true", "false", or "don't know". I 
think you could then also speed up queries like "!english & bbb".

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kevin Grittner
Дата:
Сообщение: Re: refresh materialized view concurrently
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Batch API for After Triggers