Add support for (Var op Var) clause in extended MCV statistics

Поиск
Список
Период
Сортировка
От Ilia Evdokimov
Тема Add support for (Var op Var) clause in extended MCV statistics
Дата
Msg-id 46d09d66-1692-4a12-a106-06981a47cb28@tantorlabs.com
обсуждение исходный текст
Ответы Re: Add support for (Var op Var) clause in extended MCV statistics
Re: Add support for (Var op Var) clause in extended MCV statistics
Список pgsql-hackers

Hi hackers,

I'd like to submit a patch that improves the estimated rows for queries containing (Var op Var) clauses by applying extended MCV statistics.

New functions:

  • mcv_clauselist_selectivity_var_op_var() - calculates the selectivity for (Var op Var) clauses.
  • is_opclause_var_op_var() - Checks whether a clause is of the (Var op Var) form.

Implementation Details:

  • A new 'if' statement was added to the 'clause_selectivity_ext()' function to handle (Var op Var) clauses. This allows the process to locate matching MCV extended statistics and calculate selectivity using the newly introduced function.
  • Additionally, I added 'if' statement in statext_is_compatible_clause_internal() function to determine which columns are included in the clause, find matching extended statistics, and then calculate selectivity through the new function. I did the same in mcv_get_match_bitmap() to check what values are true for (Var op Var).
  • To support this, I created a new enum type to differentiate between OR/AND and (Var op Var) clauses.

Examples:

create table t (a int, b int);
insert into t select mod(i,10), mod(i,10)+1 from generate_series(1,100000) s(i);
analyze t;
explain select * from t where a < b;
`
    Estimated:   33333
    Actual:       100000

explain select * from t where a > b;
`
    Estimated:   33333
    Actual:       100000

create statistics s (mcv) on a,b from t;
analyze t;
explain select * from t where a < b;
`
    Estimated without patch:  33333
    Estimated with patch:     100000  
    Actual:                             100000

explain select * from t where a > b;
`
    Estimated without patch:  33333
    Estimated with patch:     100000  
    Actual:                             100000


If you want to see more examples, see regress tests in the patch.

Previous thread:

This feature was originally developed two years ago in [1], and at that time, the approach was almost the same. My implementation uses dedicated functions and 'if' statements directly for better readability and maintainability. Additionally, there was a bug in the previous approach that has been resolved with my patch. Here’s an example of the bug and its fix:

CREATE TABLE foo (a int, b int);
INSERT INTO foo SELECT x/10+1, x FROM generate_series(1,10000) g(x);
ANALYZE foo;
EXPLAIN ANALYZE SELECT * FROM foo WHERE a = 1 OR (b > 0 AND b < 10);
`
    Estimated:   18
    Actual:           9

CREATE STATISTICS foo_s (mcv) ON a,b FROM foo;
ANALYZE foo;
EXPLAIN ANALYZE SELECT * FROM foo WHERE a = 1 OR (b > 0 AND b < 10);
`
    Estimated previous patch:  18
    Estimated current patch:      9
    Actual:                                  9


[1]: https://www.postgresql.org/message-id/flat/9e0a12e0-c05f-b193-ed3d-fe88bc1e5fe1%40enterprisedb.com

I look forward to any feedback or suggestions from the community.

Best regars,
Ilia Evdokimov
Tantor Labs LLC.

Вложения

В списке pgsql-hackers по дате отправления: