Re: a wrong index choose when statistics is out of date
От | Andrei Lepikhov |
---|---|
Тема | Re: a wrong index choose when statistics is out of date |
Дата | |
Msg-id | 701d2097-2c5b-41e2-8629-734e3c8ba613@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: a wrong index choose when statistics is out of date (Andy Fan <zhihuifan1213@163.com>) |
Ответы |
Re: a wrong index choose when statistics is out of date
Re: a wrong index choose when statistics is out of date |
Список | pgsql-hackers |
On 5/3/2024 19:56, Andy Fan wrote: > I think it is OK for a design review, for the implementaion side, the > known issue includes: > > 1. Support grap such infromation from its parent for partitioned table > if the child doesn't have such information. > 2. builtin document and testing. > > Any feedback is welcome. Thanks for your efforts. I was confused when you showed the problem connected to clauses like "Var op Const" and "Var op Param". As far as I know, the estimation logic of such clauses uses MCV and number-distinct statistics. So, being out of MCV values, it becomes totally insensitive to any internal skew in data and any data outside the statistics boundaries. Having studied the example you provided with the patch, I think it is not a correct example: Difference between var_eq_const and var_eq_non_const quite obvious: In the second routine, you don't have information about the const value and can't use MCV for estimation. Also, you can't exclude MCV values from the estimation. And it is just luck that you've got the right answer. I think if you increased the weight of the unknown part, you would get a bad result, too. I would like to ask David why the var_eq_const estimator doesn't have an option for estimation with a histogram. Having that would relieve a problem with skewed data. Detecting the situation with incoming const that is out of the covered area would allow us to fall back to ndistinct estimation or something else. At least, histogram usage can be restricted by the reltuples value and ratio between the total number of MCV values and the total number of distinct values in the table. Just for demo: demonstration of data skew issue: CREATE EXTENSION tablefunc; CREATE TABLE norm_test AS SELECT abs(r::integer) AS val FROM normal_rand(1E7::integer, 5.::float8, 300.::float8) AS r; ANALYZE norm_test; -- First query is estimated with MCV quite precisely: EXPLAIN ANALYZE SELECT * FROM norm_test WHERE val = 100; -- result: planned rows=25669, actual rows=25139 -- Here we have numdistinct estimation, mostly arbitrary: EXPLAIN ANALYZE SELECT * FROM norm_test WHERE val = 200; -- result: planned rows=8604, actual rows=21239 EXPLAIN ANALYZE SELECT * FROM norm_test WHERE val = 500; -- result: planned rows=8604, actual rows=6748 EXPLAIN ANALYZE SELECT * FROM norm_test WHERE val = 600; -- result: planned rows=8604, actual rows=3501 EXPLAIN ANALYZE SELECT * FROM norm_test WHERE val = 700; -- result: planned rows=8604, actual rows=1705 EXPLAIN ANALYZE SELECT * FROM norm_test WHERE val = 1000; -- result: planned rows=8604, actual rows=91 -- regards, Andrei Lepikhov Postgres Professional
В списке pgsql-hackers по дате отправления: