Re: GIN improvements part2: fast scan

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: GIN improvements part2: fast scan
Дата
Msg-id 531F8994.3040008@fuzzy.cz
обсуждение исходный текст
Ответ на Re: GIN improvements part2: fast scan  (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы Re: GIN improvements part2: fast scan  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
Hi all,

a quick question that just occured to me - do you plan to tweak the cost
estimation fot GIN indexes, in this patch?

IMHO it would be appropriate, given the improvements and gains, but it
seems to me gincostestimate() was not touched by this patch.

I just ran into this while testing some jsonb stuff, and after creating
a GIN and GIST indexes on the same column, I get these two plans:

=======================================================================

db=# explain analyze select count(*) from messages_2 where headers ?
'x-virus-scanned';
                         QUERY PLAN
------------------------------------------------------------------------Aggregate  (cost=1068.19..1068.20 rows=1
width=0)(actual
 
time=400.149..400.150 rows=1 loops=1)  ->  Bitmap Heap Scan on messages_2  (cost=10.44..1067.50 rows=278
width=0) (actual time=27.974..395.840 rows=70499 loops=1)        Recheck Cond: (headers ? 'x-virus-scanned'::text)
 Rows Removed by Index Recheck: 33596        Heap Blocks: exact=40978        ->  Bitmap Index Scan on
messages_2_gist_idx (cost=0.00..10.37
 
rows=278 width=0) (actual time=21.762..21.762 rows=104095 loops=1)              Index Cond: (headers ?
'x-virus-scanned'::text)Planningtime: 0.052 msTotal runtime: 400.179 ms
 
(9 rows)

Time: 400,467 ms

db=# drop index messages_2_gist_idx;
DROP INDEX

db=# explain analyze select count(*) from messages_2 where headers ?
'x-virus-scanned';                         QUERY PLAN
------------------------------------------------------------------------Aggregate  (cost=1083.91..1083.92 rows=1
width=0)(actual
 
time=39.130..39.130 rows=1 loops=1)  ->  Bitmap Heap Scan on messages_2  (cost=26.16..1083.22 rows=278
width=0) (actual time=11.285..36.248 rows=70499 loops=1)        Recheck Cond: (headers ? 'x-virus-scanned'::text)
Heap Blocks: exact=23896        ->  Bitmap Index Scan on messages_2_gin_idx  (cost=0.00..26.09
 
rows=278 width=0) (actual time=7.974..7.974 rows=70499 loops=1)              Index Cond: (headers ?
'x-virus-scanned'::text)Planningtime: 0.064 msTotal runtime: 39.160 ms
 
(8 rows)

Time: 39,509 ms

=======================================================================

So while the GIN plans seems to be just slightly expensive than GIN,
it's actually way faster.

Granted, most won't have GIN and GIST index on the same column at the
same time, but bad cost estimate may cause other issues. Maybe I could
achieve this by tweaking the various cost GUCs, but ISTM that tweaking
the cost estimation would be appropriate.

regards
Tomas



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Johnston
Дата:
Сообщение: Re: The case against multixact GUCs
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: jsonb and nested hstore