Re: Avoid full GIN index scan when possible

Поиск
Список
Период
Сортировка
От Nikita Glukhov
Тема Re: Avoid full GIN index scan when possible
Дата
Msg-id a6e1564e-98c8-e420-cfaf-0309316c4910@postgrespro.ru
обсуждение исходный текст
Ответ на Re: Avoid full GIN index scan when possible  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Avoid full GIN index scan when possible  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Attached 6th version of the patches.

On 01.08.2019 22:28, Tom Lane wrote:
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
On Thu, Aug 1, 2019 at 9:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
While I've not attempted to fix that here, I wonder whether we shouldn't
fix it by just forcing forcedRecheck to true in any case where we discard
an ALL qualifier.

+1 for setting forcedRecheck in any case we discard ALL qualifier.
ISTM, real life number of cases we can skip recheck here is
negligible.  And it doesn't justify complexity.
Yeah, that was pretty much what I was thinking --- by the time we got
it fully right considering nulls and multicolumn indexes, the cases
where not rechecking could actually do something useful would be
pretty narrow.  And a bitmap heap scan is always going to have to
visit the heap, IIRC, so how much could skipping the recheck really
save?
I have simplified patch #1 setting forcedRecheck for all discarded ALL quals.
(This solution is very close to the earliest unpublished version of the patch.)


More accurate recheck-forcing logic was moved into patch #2 (multicolumn 
indexes were fixed).  This patch also contains ginFillScanKey() refactoring
and new internal mode GIN_SEARCH_MODE_NOT_NULL that is used only for 
GinScanKey.xxxConsistentFn initialization and transformed into 
GIN_SEARCH_MODE_ALL before GinScanEntry initialization.


The cost estimation seems to be correct for both patch #1 and patch #2 and
left untouched since v05.


BTW, it's not particularly the fault of this patch, but: what does it
even mean to specify GIN_SEARCH_MODE_ALL with a nonzero number of keys?

It might mean we would like to see all the results, which don't
contain given key.
Ah, right, I forgot that the consistent-fn might look at the match
results.
Also I decided to go further and tried to optimize (patch #3) the case for
GIN_SEARCH_MODE_ALL with a nonzero number of keys.

Full GIN scan can be avoided in queries like this contrib/intarray query:
"arr @@ '1' AND arr @@ '!2'" (search arrays containing 1 and not containing 2).

Here we have two keys:- key '1' with GIN_SEARCH_MODE_DEFAULT- key '2' with GIN_SEARCH_MODE_ALL

Key '2' requires full scan that can be avoided with the forced recheck.

This query is equivalent to single-qual query "a @@ '1 & !2'" which
emits only one GIN key '1' with recheck.


Below is example for contrib/intarray operator @@:

=# CREATE EXTENSION intarray;
=# CREATE TABLE t (a int[]);
=# INSERT INTO t SELECT ARRAY[i] FROM generate_series(1, 1000000) i;
=# CREATE INDEX ON t USING gin (a gin__int_ops);
=# SET enable_seqscan = OFF;

-- master
=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM t WHERE a @@ '1' AND a @@ '!2';                                                        QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------Bitmap Heap Scan on t  (cost=16000095.45..16007168.16 rows=5019 width=24) (actual time=66.955..66.956 rows=1 loops=1)  Recheck Cond: ((a @@ '1'::query_int) AND (a @@ '!2'::query_int))  Heap Blocks: exact=1  Buffers: shared hit=6816  ->  Bitmap Index Scan on t_a_idx  (cost=0.00..16000094.19 rows=5019 width=0) (actual time=66.950..66.950 rows=1 loops=1)        Index Cond: ((a @@ '1'::query_int) AND (a @@ '!2'::query_int))        Buffers: shared hit=6815Planning Time: 0.086 msExecution Time: 67.076 ms
(9 rows)

-- equivalent single-qual query
=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM t WHERE a @@ '1 & !2';                                                    QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------Bitmap Heap Scan on t  (cost=78.94..7141.57 rows=5025 width=24) (actual time=0.019..0.019 rows=1 loops=1)  Recheck Cond: (a @@ '1 & !2'::query_int)  Heap Blocks: exact=1  Buffers: shared hit=8  ->  Bitmap Index Scan on t_a_idx  (cost=0.00..77.68 rows=5025 width=0) (actual time=0.014..0.014 rows=1 loops=1)        Index Cond: (a @@ '1 & !2'::query_int)        Buffers: shared hit=7Planning Time: 0.056 msExecution Time: 0.039 ms
(9 rows)

-- with patch #3
=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM t WHERE a @@ '1' AND a @@ '!2';                                                    QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------Bitmap Heap Scan on t  (cost=75.45..7148.16 rows=5019 width=24) (actual time=0.019..0.020 rows=1 loops=1)  Recheck Cond: ((a @@ '1'::query_int) AND (a @@ '!2'::query_int))  Heap Blocks: exact=1  Buffers: shared hit=6  ->  Bitmap Index Scan on t_a_idx  (cost=0.00..74.19 rows=5019 width=0) (actual time=0.011..0.012 rows=1 loops=1)        Index Cond: ((a @@ '1'::query_int) AND (a @@ '!2'::query_int))        Buffers: shared hit=5Planning Time: 0.059 msExecution Time: 0.040 ms
(9 rows)




Patch #3 again contains a similar ugly solution -- we have to remove already
initialized GinScanKeys with theirs GinScanEntries.  GinScanEntries can be 
shared, so the reference counting was added.


Also modifications of cost estimation in patch #3 are questionable. 
GinQualCounts  are simply not incremented when haveFullScan flag is set, 
because the counters anyway will be overwritten by the caller.


-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: POC: Cleaning up orphaned files using undo logs