Обсуждение: support create index on virtual generated column.
hi. attached patch for implementing $subject feature. * internally such index will be transformed into expression index. for example, an index on (b int GENERATED ALWAYS AS (a * 2) VIRTUAL) will be converted into an expression index on ((a * 2)). * in pageinspect module, add some test to check the index content of virtual generated column. * primary key, unique index over virtual generated column are not supported. not sure they make sense or not. * expression index and predicate index over virtual generated columns are currently not supported. * virtual generated column can not be in "include column" * all types of indexes are supported, and a hash index, gist test has been added. * To support ALTER TABLE SET EXPRESSION, in pg_index, we need to track the original virtual generated column attribute number, so ALTER TABLE SET EXPRESSION can identify which index needs to be rebuilt. * ALTER COLUMN SET DATA TYPE will also cause table rewrite, so we really need to track the virtual generated column attribute number that index was built on.
Вложения
On Wed, 26 Mar 2025 at 12:15, jian he <jian.universality@gmail.com> wrote:
>
> hi.
> attached patch for implementing $subject feature.
>
> * internally such index will be transformed into expression index.
> for example, an index on (b int GENERATED ALWAYS AS (a * 2) VIRTUAL) will be
> converted into an expression index on ((a * 2)).
> * in pageinspect module, add some test to check the index content of
> virtual generated column.
> * primary key, unique index over virtual generated column are not supported.
> not sure they make sense or not.
> * expression index and predicate index over virtual generated columns are
> currently not supported.
> * virtual generated column can not be in "include column"
> * all types of indexes are supported, and a hash index, gist test has
> been added.
> * To support ALTER TABLE SET EXPRESSION, in pg_index, we need to track
> the original
> virtual generated column attribute number, so ALTER TABLE SET EXPRESSION can
> identify which index needs to be rebuilt.
> * ALTER COLUMN SET DATA TYPE will also cause table rewrite, so we really
> need to track the virtual generated column attribute number that
> index was built on.
Hi!
patch applies with warns
```
Applying: support create index on virtual generated column.
.git/rebase-apply/patch:250: trailing whitespace.
* updated correctly, and they don't seem useful anyway.
.git/rebase-apply/patch:271: trailing whitespace.
* Also check for system used in expressions or predicates.
warning: 2 lines add whitespace errors.
```
consider this case:
```
reshke=# CREATE TABLE xx (a int, b int GENERATED ALWAYS AS (a * 2) VIRTUAL) ;
CREATE TABLE
reshke=# create index on xx (b);
CREATE INDEX
reshke=#
reshke=# \d+ xx
Table "public.xx"
Column | Type | Collation | Nullable | Default
| Storage | Compression | Stats target | Description
--------+---------+-----------+----------+-----------------------------+---------+-------------+--------------+-------------
a | integer | | |
| plain | | |
b | integer | | | generated always as (a * 2)
| plain | | |
Indexes:
"xx_b_idx" btree (b)
Access method: heap
reshke=# alter table xx drop column b;
ALTER TABLE
reshke=# \d+ xx
Table "public.xx"
Column | Type | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
a | integer | | | | plain |
| |
Indexes:
"xx_b_idx" btree ("........pg.dropped.2........" int4_ops)
Access method: heap
reshke=#
```
with regular columns we have different behaviour - with drop column we
drop the index
--
Best regards,
Kirill Reshke
On Wed, Mar 26, 2025 at 5:36 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
> reshke=# CREATE TABLE xx (a int, b int GENERATED ALWAYS AS (a * 2) VIRTUAL) ;
> CREATE TABLE
> reshke=# create index on xx (b);
> CREATE INDEX
> reshke=#
> reshke=# \d+ xx
> Table "public.xx"
> Column | Type | Collation | Nullable | Default
> | Storage | Compression | Stats target | Description
>
--------+---------+-----------+----------+-----------------------------+---------+-------------+--------------+-------------
> a | integer | | |
> | plain | | |
> b | integer | | | generated always as (a * 2)
> | plain | | |
> Indexes:
> "xx_b_idx" btree (b)
> Access method: heap
>
> reshke=# alter table xx drop column b;
> ALTER TABLE
> reshke=# \d+ xx
> Table "public.xx"
> Column | Type | Collation | Nullable | Default | Storage |
> Compression | Stats target | Description
> --------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
> a | integer | | | | plain |
> | |
> Indexes:
> "xx_b_idx" btree ("........pg.dropped.2........" int4_ops)
> Access method: heap
>
> reshke=#
> ```
>
> with regular columns we have different behaviour - with drop column we
> drop the index
>
I was wrong about dependency.
when creating an index on a virtual generated column, it will have
dependency with
virtual generated column attribute and the generation expression
associated attribute.
new patch attached. Now,
ALTER TABLE DROP COLUMN works fine.
ALTER INDEX ATTACH PARTITION works fine.
creating such an index on a partitioned table works just fine.
for table inheritance: create index on parent table will not cascade
to child table,
so we don't need to worry about this.
Вложения
On Mon, 14 Apr 2025 at 16:10, jian he <jian.universality@gmail.com> wrote: > > new patch attached. Now, > ALTER TABLE DROP COLUMN works fine. > ALTER INDEX ATTACH PARTITION works fine. > creating such an index on a partitioned table works just fine. > for table inheritance: create index on parent table will not cascade > to child table, > so we don't need to worry about this. Hi! I reviewed v2, and it seems to be working now. But there are tests that are comment-out, what is their purpose? I note that commit 83ea6c5 also included some commented tests, so perhaps there's a reason I'm not aware of. ``` ALTER TABLE gtest22c DROP COLUMN e; \d gtest22c -- EXPLAIN (COSTS OFF) SELECT * FROM gtest22c WHERE b * 3 = 6; -- SELECT * FROM gtest22c WHERE b * 3 = 6; -- EXPLAIN (COSTS OFF) SELECT * FROM gtest22c WHERE a = 1 AND b > 0; -- SELECT * FROM gtest22c WHERE a = 1 AND b > 0; ``` -- Best regards, Kirill Reshke
On Mon, Apr 14, 2025 at 8:05 PM Kirill Reshke <reshkekirill@gmail.com> wrote: > > On Mon, 14 Apr 2025 at 16:10, jian he <jian.universality@gmail.com> wrote: > > > > new patch attached. Now, > > ALTER TABLE DROP COLUMN works fine. > > ALTER INDEX ATTACH PARTITION works fine. > > creating such an index on a partitioned table works just fine. > > for table inheritance: create index on parent table will not cascade > > to child table, > > so we don't need to worry about this. > > Hi! I reviewed v2, and it seems to be working now. > > But there are tests that are comment-out, what is their purpose? I > note that commit 83ea6c5 also included some commented tests, so > perhaps there's a reason I'm not aware of. > > ``` > ALTER TABLE gtest22c DROP COLUMN e; > \d gtest22c > > -- EXPLAIN (COSTS OFF) SELECT * FROM gtest22c WHERE b * 3 = 6; > -- SELECT * FROM gtest22c WHERE b * 3 = 6; > -- EXPLAIN (COSTS OFF) SELECT * FROM gtest22c WHERE a = 1 AND b > 0; > -- SELECT * FROM gtest22c WHERE a = 1 AND b > 0; > ``` comment out tests are for to be implemented feature. There are some test changes that are indeed not necessary, I restored it back, please check attached.
Вложения
On Tue, Apr 15, 2025 at 4:36 PM jian he <jian.universality@gmail.com> wrote: > > comment out tests are for to be implemented feature. > There are some test changes that are indeed not necessary, I restored it back, > please check attached. hi. refactor and rebase.
Вложения
On Tue, Jul 8, 2025 at 2:37 PM jian he <jian.universality@gmail.com> wrote: > > On Tue, Apr 15, 2025 at 4:36 PM jian he <jian.universality@gmail.com> wrote: > > > > comment out tests are for to be implemented feature. > > There are some test changes that are indeed not necessary, I restored it back, > > please check attached. > > hi. > refactor and rebase. fix the regress tests failure in v4.
Вложения
> hi.
> refactor and rebase.
fix the regress tests failure in v4.
This may need another rebase, as it doesn't apply to master.
I'm interested in this feature, specifically whether the optimizer uses the index in situations where the expression is used rather than the virtual column name.
For example:
CREATE TABLE example (
regular_name text,
lowecase_name text GENERATED ALWAYS AS lower(regular_name) VIRTUAL
);
);
CREATE INDEX example_b ON example(b);
EXPLAIN SELECT regular_name FROM example WHERE lowercase_name = 'john q smith';
EXPLAIN SELECT regular_name FROM example WHERE lower(regular_name) = 'john q smith';
Corey Huinker <corey.huinker@gmail.com> writes:
> I'm interested in this feature, specifically whether the optimizer uses the
> index in situations where the expression is used rather than the virtual
> column name.
Hmm, I kinda think we should not do this. The entire point of a
virtual column is that its values are not stored and so you can
(for example) change the generation expression "for free".
If it's referenced in an index that advantage goes out the window
because we'll have to rebuild the index.
Besides, this does nothing you haven't been able to do for
decades with expression indexes.
regards, tom lane
On Wed, Jul 23, 2025 at 4:54 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Corey Huinker <corey.huinker@gmail.com> writes: > > I'm interested in this feature, specifically whether the optimizer uses the > > index in situations where the expression is used rather than the virtual > > column name. > > Hmm, I kinda think we should not do this. The entire point of a > virtual column is that its values are not stored and so you can > (for example) change the generation expression "for free". > If it's referenced in an index that advantage goes out the window > because we'll have to rebuild the index. > > Besides, this does nothing you haven't been able to do for > decades with expression indexes. > hi. CREATE TABLE example (regular_name text, lowecase_name text GENERATED ALWAYS AS (lower(regular_name)) VIRTUAL); CREATE INDEX example_b ON example(lowecase_name); CREATE INDEX example_c ON example(lower(regular_name)); select distinct indnatts,indnkeyatts,indisunique, indnullsnotdistinct,indisprimary,indisexclusion,indimmediate,indisclustered,indisvalid, indcheckxmin,indisready,indislive,indisreplident,indkey,indcollation,indclass,indoption, indexprs from pg_index where indrelid ='example'::regclass; will return one row, meaning catalog table pg_index stored almost all the same information. For indexes example_b and example_c, the only difference lies in the new column indattrgenerated. In example_b, indattrgenerated is not null, whereas in example_c, it is null. This column (indattrgenerated) is needed to track dependencies on generated columns, which is important for index rebuild. obviously, get_relation_info will collect the same information for example_b, example_c. which means the optimizer will use the same information to make the decision. --------------------------------- set enable_seqscan to off; set enable_bitmapscan to off; CREATE TABLE example (regular_name text, lowecase_name text GENERATED ALWAYS AS (lower(regular_name)) VIRTUAL); CREATE INDEX example_b ON example(lowecase_name); EXPLAIN(COSTS OFF) SELECT regular_name FROM example WHERE lowecase_name = 'john q smith'; EXPLAIN(COSTS OFF) SELECT regular_name FROM example WHERE lower(regular_name) = 'john q smith'; So current implementation, the above two query plans will produce the same query plan. the generation expression or virtual generated column data type changes will cause the index to rebuild. Is this we want? Or should changing the generation expression or data type of a virtual generated column mark the associated index as invalid, without triggering a rebuild?
Вложения
On Tue, Jul 22, 2025 at 4:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Corey Huinker <corey.huinker@gmail.com> writes:
> I'm interested in this feature, specifically whether the optimizer uses the
> index in situations where the expression is used rather than the virtual
> column name.
Hmm, I kinda think we should not do this. The entire point of a
virtual column is that its values are not stored and so you can
(for example) change the generation expression "for free".
If it's referenced in an index that advantage goes out the window
because we'll have to rebuild the index.
Don't we already have dependencies on altering a column if it's indexed? Wouldn't that be all the barrier we'd need?
Besides, this does nothing you haven't been able to do for
decades with expression indexes.
What I'm hoping for with this feature is a smoother transition when people start introducing virtual columns. If there was already an index on that expression, and some queries start using the new virtual column (with the same expression), will that cause those queries to miss the index we already have? If it doesn't, then a customer can roll out the query changes at will and not need to do some cut-over from using the expression to using the virtual column.