Re: Allow to collect statistics on virtual generated columns
От | Yugo Nagata |
---|---|
Тема | Re: Allow to collect statistics on virtual generated columns |
Дата | |
Msg-id | 20250801002830.143b25971fb9594b89d96aee@sraoss.co.jp обсуждение исходный текст |
Ответ на | Re: Allow to collect statistics on virtual generated columns (Yugo Nagata <nagata@sraoss.co.jp>) |
Список | pgsql-hackers |
Hi, On Tue, 24 Jun 2025 17:05:33 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > Instead, I'm thinking of an alternative approach: expanding the expression > at the time statistics are collected. I've attached a new patch in this approache. This allows to collect statistics on virtual generated columns. During ANALYZE, generation expressions are expanded, and statistics are computed using compute_expr_stats(). To support this, both compute_expr_stats() and AnlExprData are now exported from extended_stats.c. However, since they are no longer specific to extended statistics, it might be better to move them to analyze.c and vacuum.h. To enable the optimizer to make use of these statistics, a new field named virtual_gencols is added to RelOptInfo. This field holds the expressions of virtual generated columns in the table. In examine_variable(), if an expression in a WHERE clause matches a virtual generated column, the corresponding statistics are used for that expression. Example: - Before applying the patch, the cardinality estimate is erroneous. test=# create table t (i int, j int generated always as (i*10) virtual); CREATE TABLE test=# insert into t select generate_series(1,1000); INSERT 0 1000 test=# insert into t select 1 from generate_series(1,1000); INSERT 0 1000 test=# analyze t; ANALYZE test=# explain analyze select * from t where j = 10; QUERY PLAN ------------------------------------------------------------------------------------------------- Seq Scan on t (cost=0.00..0.02 rows=1 width=8) (actual time=0.031..0.806 rows=1001.00 loops=1) Filter: ((i * 10) = 10) Rows Removed by Filter: 999 Buffers: shared hit=9 Planning: Buffers: shared hit=10 Planning Time: 0.299 ms Execution Time: 0.948 ms (8 rows) - After applying the patch, the cardinality estimate is correct. test=# analyze t; ANALYZE test=# explain analyze select * from t where j = 10; QUERY PLAN ----------------------------------------------------------------------------------------------------- Seq Scan on t (cost=0.00..41.50 rows=1001 width=8) (actual time=0.034..0.871 rows=1001.00 loops=1) Filter: ((i * 10) = 10) Rows Removed by Filter: 999 Buffers: shared hit=9 Planning: Buffers: shared hit=6 Planning Time: 0.374 ms Execution Time: 1.028 ms (8 rows) Note that the patch is still a work in progress, so documentation and tests are not included. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Вложения
В списке pgsql-hackers по дате отправления: