Statistics and selectivity estimation for ranges
От | Alexander Korotkov |
---|---|
Тема | Statistics and selectivity estimation for ranges |
Дата | |
Msg-id | CAPpHfdtbdtM1_rhta+2cNdB4Pj7MH8EATQZHDPzK3s64qS61eg@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Statistics and selectivity estimation for ranges
|
Список | pgsql-hackers |
Hackers,
attached patch is for collecting statistics and selectivity estimation for ranges.
In order to make our estimations accurate for every distribution of ranges, we would collect 2d-distribution of lower and upper bounds of range into some kind of 2d-histogram. However, this patch use some simplification and assume distribution of lower bound and distribution of length to be independent. We can get distribution of lower bound from standard scalar statistics and thi patch additionally collect statistics for range length. This patch includes selectivity estimations for "&&", "@>" and "<@" operators on ranges. Linear interpolation is used in order to get more accurate results.
Some examples with test dataset where left bound of range is distributed by gaussian distribution and length of range is distributed by exponential distribution.
test=# CREATE TABLE range_test as (SELECT int4range(lb, lb + len) AS r FROM (SELECT (sqrt(-2*ln(random())) * sin(2*pi()*random()) * 1000000)::int as lb, (-10000*ln(1.0 - random()) + 1)::int as len FROM generate_series(1,1000000)) x);
SELECT 1000000
test=# ANALYZE range_test;
ANALYZE
test=# EXPLAIN ANALYZE SELECT * FROM range_test WHERE r && int4range(700000,710000);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Seq Scan on range_test (cost=0.00..17906.00 rows=7535 width=14) (actual time=0.119..403.494 rows=6138 loops=1)
Filter: (r && '[700000,710000)'::int4range)
Rows Removed by Filter: 993862
Total runtime: 403.945 ms
(4 rows)
test=# EXPLAIN ANALYZE SELECT * FROM range_test WHERE r && int4range(200000,300000);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Seq Scan on range_test (cost=0.00..17906.00 rows=42427 width=14) (actual time=0.100..401.079 rows=42725 loops=1)
Filter: (r && '[200000,300000)'::int4range)
Rows Removed by Filter: 957275
Total runtime: 403.055 ms
(4 rows)
test=# EXPLAIN ANALYZE SELECT * FROM range_test WHERE r <@ int4range(100000,150000);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Seq Scan on range_test (cost=0.00..17906.00 rows=15341 width=14) (actual time=0.129..382.114 rows=16014 loops=1)
Filter: (r <@ '[100000,150000)'::int4range)
Rows Removed by Filter: 983986
Total runtime: 382.985 ms
(4 rows)
test=# EXPLAIN ANALYZE SELECT * FROM range_test WHERE r <@ int4range(600000,603000);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Seq Scan on range_test (cost=0.00..17906.00 rows=122 width=14) (actual time=1.527..383.511 rows=127 loops=1)
Filter: (r <@ '[600000,603000)'::int4range)
Rows Removed by Filter: 999873
Total runtime: 383.586 ms
(4 rows)
test=# EXPLAIN ANALYZE SELECT * FROM range_test WHERE r @> int4range(100000,100400);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Seq Scan on range_test (cost=0.00..17906.00 rows=5166 width=14) (actual time=0.238..377.712 rows=3909 loops=1)
Filter: (r @> '[100000,100400)'::int4range)
Rows Removed by Filter: 996091
Total runtime: 378.018 ms
(4 rows)
test=# EXPLAIN ANALYZE SELECT * FROM range_test WHERE r @> int4range(500000,530000);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Seq Scan on range_test (cost=0.00..17906.00 rows=342 width=14) (actual time=11.796..382.986 rows=171 loops=1)
Filter: (r @> '[500000,530000)'::int4range)
Rows Removed by Filter: 999829
Total runtime: 383.066 ms
(4 rows)
With best regards,
Alexander Korotkov.
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Amit KapilaДата:
Сообщение: [WIP] Performance Improvement by reducing WAL for Update Operation
Следующее
От: Heikki LinnakangasДата:
Сообщение: Re: [WIP] Performance Improvement by reducing WAL for Update Operation