Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Дата
Msg-id CAPpHfduBWP15Ufc6F_W=+VYxQmBdSdXbjz3EygM-Uc-5i60W8Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)  (Greg Smith <greg@2ndQuadrant.com>)
Ответы Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
Hi!

Studying this question little more I found that current approach of range indexing can be dramatically inefficient in some cases. It's not because of penalty or split implementation, but because of approach itself. Mapping intervals to two-dimensional space produce much better results in case of high-overlapping ranges and "@>", "<@" operators with low selectivity. 

There is a simple test case for proof of concept.

create table source as (select l, (l + s) as r from (select (random()*10000)::int as l, (random()*1000 + 1)::int s from generate_series(1,1000000) g) x);
create table range_test as (select int4range(l,r) as x from source);
create table point_test as (select point(l,r) as x from source);
create index range_test_idx on range_test using gist (x);
create index point_test_idx on point_test using gist (x);

test=# explain (analyze, buffers) select * from range_test where x <@ int4range(5000,5010);
                                                         QUERY PLAN                                 
                         
-----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on range_test  (cost=40.31..2585.65 rows=1000 width=32) (actual time=37.304..37.310 rows=2 loops=1)
   Recheck Cond: (x <@ '[5000,5010)'::int4range)
   Buffers: shared hit=767
   ->  Bitmap Index Scan on range_test_idx  (cost=0.00..40.06 rows=1000 width=0) (actual time=37.288..37.288 rows=2 loops=1)
         Index Cond: (x <@ '[5000,5010)'::int4range)
         Buffers: shared hit=765
 Total runtime: 37.385 ms
(7 rows)


test=# explain (analyze, buffers) select * from point_test where x <@ box(point(5000,5000),point(5010,5010));
                                                        QUERY PLAN                                  
                       
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on point_test  (cost=44.36..2589.69 rows=1000 width=16) (actual time=0.197..0.206 rows=2 loops=1)
   Recheck Cond: (x <@ '(5010,5010),(5000,5000)'::box)
   Buffers: shared hit=5
   ->  Bitmap Index Scan on point_test_idx  (cost=0.00..44.11 rows=1000 width=0) (actual time=0.182..0.182 rows=2 loops=1)
         Index Cond: (x <@ '(5010,5010),(5000,5000)'::box)
         Buffers: shared hit=3
 Total runtime: 0.265 ms
(7 rows)

test=# explain (analyze, buffers) select * from range_test where x @> int4range(5000,5990);
                                                        QUERY PLAN                                  
                       
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on range_test  (cost=40.31..2585.65 rows=1000 width=32) (actual time=4.578..4.603 
rows=5 loops=1)
   Recheck Cond: (x @> '[5000,5990)'::int4range)
   Buffers: shared hit=52
   ->  Bitmap Index Scan on range_test_idx  (cost=0.00..40.06 rows=1000 width=0) (actual time=4.561..4.561 rows=5 loops=1)
         Index Cond: (x @> '[5000,5990)'::int4range)
         Buffers: shared hit=47
 Total runtime: 4.669 ms
(7 rows)


test=# explain (analyze, buffers) select * from point_test where x <@ box(point('-inf'::float,5990),point(5000,'+inf'::float));
                                                        QUERY PLAN                                  
                       
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on point_test  (cost=44.36..2589.69 rows=1000 width=16) (actual time=0.328..0.353 rows=5 loops=1)
   Recheck Cond: (x <@ '(5000,inf),(-inf,5990)'::box)
   Buffers: shared hit=8
   ->  Bitmap Index Scan on point_test_idx  (cost=0.00..44.11 rows=1000 width=0) (actual time=0.312..0.312 rows=5 loops=1)
         Index Cond: (x <@ '(5000,inf),(-inf,5990)'::box)
         Buffers: shared hit=3
 Total runtime: 0.419 ms
(7 rows)

If you like to learn more information about such mapping you can start from here: http://www.comsis.org/ComSIS/Vol7No4/RegularPapers/paper16.pdf

Any thoughts?

-----
With best regards,
Alexander Korotkov.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nikhil Sontakke
Дата:
Сообщение: Re: Review: Non-inheritable check constraints
Следующее
От: Magnus Hagander
Дата:
Сообщение: Re: JSON for PG 9.2