Re: Statistics and selectivity estimation for ranges

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: Statistics and selectivity estimation for ranges
Дата
Msg-id CAPpHfdudkYhJkGbhQtbjt6GG5Owp_v7_kpQKA=1Dz5Fb8iGteA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Statistics and selectivity estimation for ranges  (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы Re: Statistics and selectivity estimation for ranges
Список pgsql-hackers
On Mon, Aug 13, 2012 at 1:11 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Thu, Aug 9, 2012 at 12:44 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
My conclusion is so, that current errors are probably ok for selectivity estimation. But taking into attention that generated datasets ideally fits assumptions of estimation, there could be room for improvement. Especially, it's unclear why estimate for "<@" and "@>" have much greater error than estimate for "&&". Possibly, it's caused by some bugs.

ITSM, I found reason of inaccuracy. Implementation of linear interpolation was wrong. Fixed version is attached. Now, need to rerun tests, possible refactoring and comments rework.

After fixing few more bugs, I've a version with much more reasonable accuracy.

Statistics target = 100.

Relatively large result sets (>= 10)

test=# select operator, avg(estimate_count::float8/actual_count::float8) as avg_ratio, avg(exp(abs(ln(estimate_count::float8/actual_count::float8)))) - 1.0 as avg_error from datasets d join test_results tr on tr.test_id = d.id where d.stat_target = 100 and actual_count >= 10 group by operator; 
 operator |    avg_ratio     |     avg_error      
----------+------------------+--------------------
 <@       | 1.00404179116863 | 0.0504415454560903
 @>       | 1.06364108531688 |  0.105646077989812
 &&       | 1.00757984721409 | 0.0420984234933233
(3 rows)

Small result sets (1 - 9)

test=# select operator, avg(estimate_count::float8/actual_count::float8) as avg_ratio, avg(exp(abs(ln(estimate_count::float8/actual_count::float8)))) - 1.0 as avg_error from datasets d join test_results tr on tr.test_id = d.id where d.stat_target = 100 and actual_count between 1 and 9 group by operator;
 operator |    avg_ratio     |     avg_error     
----------+------------------+-------------------
 <@       | 1.31530838062865 | 0.654886592410495
 @>       | 2.78708078320147 |  1.94124123003433
 &&       | 1.93268112525538 |  1.09904919063335
(3 rows)

Empty result sets

test=# select operator, avg(estimate_count) as avg_estimate, count(*) as tests_count from datasets d join test_results tr on tr.test_id = d.id where d.stat_target = 100 and actual_count = 0 group by operator;
 operator |    avg_estimate    | tests_count 
----------+--------------------+-------------
 <@       | 1.1437670609645132 |        1099
 @>       | 1.0479430126460701 |       87458
(2 rows)

Statistics target = 1000.

Relatively large result sets (>= 10)

test=# select operator, avg(estimate_count::float8/actual_count::float8) as avg_ratio, avg(exp(abs(ln(estimate_count::float8/actual_count::float8)))) - 1.0 as avg_error from datasets d join test_results tr on tr.test_id = d.id where d.stat_target = 1000 and actual_count >= 10 group by operator;
 operator |    avg_ratio     |     avg_error      
----------+------------------+--------------------
 <@       | 1.00073999445381 |  0.045099762607524
 @>       | 1.05296320350853 | 0.0907489633452971
 &&       | 1.00217602359039 | 0.0353421159150165
(3 rows)

Small result sets (1 - 9)

test=# select operator, avg(estimate_count::float8/actual_count::float8) as avg_ratio, avg(exp(abs(ln(estimate_count::float8/actual_count::float8)))) - 1.0 as avg_error from datasets d join test_results tr on tr.test_id = d.id where d.stat_target = 1000 and actual_count between 1 and 9 group by operator;
 operator |    avg_ratio     |     avg_error     
----------+------------------+-------------------
 <@       | 1.26946358795998 | 0.577803898836364
 @>       | 2.69000633430211 |  1.83165424646645
 &&       | 1.48715184186882 | 0.577998652291105
(3 rows)

Empty result sets

test=# select operator, avg(estimate_count) as avg_estimate, count(*) as tests_count from datasets d join test_results tr on tr.test_id = d.id where d.stat_target = 1000 and actual_count = 0 group by operator;
 operator |    avg_estimate    | tests_count 
----------+--------------------+-------------
 <@       | 1.0887096774193548 |        1364
 @>       | 1.0423876983771183 |       89224
 &&       | 5.0000000000000000 |           1
(3 rows)

------
With best regards,
Alexander Korotkov.
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Kevin Grittner"
Дата:
Сообщение: Re: default_isolation_level='serializable' crashes on Windows
Следующее
От: Pavel Stehule
Дата:
Сообщение: patch: shared session variables