Обсуждение: support create index on virtual generated column.
hi. attached patch for implementing $subject feature. * internally such index will be transformed into expression index. for example, an index on (b int GENERATED ALWAYS AS (a * 2) VIRTUAL) will be converted into an expression index on ((a * 2)). * in pageinspect module, add some test to check the index content of virtual generated column. * primary key, unique index over virtual generated column are not supported. not sure they make sense or not. * expression index and predicate index over virtual generated columns are currently not supported. * virtual generated column can not be in "include column" * all types of indexes are supported, and a hash index, gist test has been added. * To support ALTER TABLE SET EXPRESSION, in pg_index, we need to track the original virtual generated column attribute number, so ALTER TABLE SET EXPRESSION can identify which index needs to be rebuilt. * ALTER COLUMN SET DATA TYPE will also cause table rewrite, so we really need to track the virtual generated column attribute number that index was built on.
Вложения
On Wed, 26 Mar 2025 at 12:15, jian he <jian.universality@gmail.com> wrote:
>
> hi.
> attached patch for implementing $subject feature.
>
> * internally such index will be transformed into expression index.
> for example, an index on (b int GENERATED ALWAYS AS (a * 2) VIRTUAL) will be
> converted into an expression index on ((a * 2)).
> * in pageinspect module, add some test to check the index content of
> virtual generated column.
> * primary key, unique index over virtual generated column are not supported.
> not sure they make sense or not.
> * expression index and predicate index over virtual generated columns are
> currently not supported.
> * virtual generated column can not be in "include column"
> * all types of indexes are supported, and a hash index, gist test has
> been added.
> * To support ALTER TABLE SET EXPRESSION, in pg_index, we need to track
> the original
> virtual generated column attribute number, so ALTER TABLE SET EXPRESSION can
> identify which index needs to be rebuilt.
> * ALTER COLUMN SET DATA TYPE will also cause table rewrite, so we really
> need to track the virtual generated column attribute number that
> index was built on.
Hi!
patch applies with warns
```
Applying: support create index on virtual generated column.
.git/rebase-apply/patch:250: trailing whitespace.
* updated correctly, and they don't seem useful anyway.
.git/rebase-apply/patch:271: trailing whitespace.
* Also check for system used in expressions or predicates.
warning: 2 lines add whitespace errors.
```
consider this case:
```
reshke=# CREATE TABLE xx (a int, b int GENERATED ALWAYS AS (a * 2) VIRTUAL) ;
CREATE TABLE
reshke=# create index on xx (b);
CREATE INDEX
reshke=#
reshke=# \d+ xx
Table "public.xx"
Column | Type | Collation | Nullable | Default
| Storage | Compression | Stats target | Description
--------+---------+-----------+----------+-----------------------------+---------+-------------+--------------+-------------
a | integer | | |
| plain | | |
b | integer | | | generated always as (a * 2)
| plain | | |
Indexes:
"xx_b_idx" btree (b)
Access method: heap
reshke=# alter table xx drop column b;
ALTER TABLE
reshke=# \d+ xx
Table "public.xx"
Column | Type | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
a | integer | | | | plain |
| |
Indexes:
"xx_b_idx" btree ("........pg.dropped.2........" int4_ops)
Access method: heap
reshke=#
```
with regular columns we have different behaviour - with drop column we
drop the index
--
Best regards,
Kirill Reshke
On Wed, Mar 26, 2025 at 5:36 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
> reshke=# CREATE TABLE xx (a int, b int GENERATED ALWAYS AS (a * 2) VIRTUAL) ;
> CREATE TABLE
> reshke=# create index on xx (b);
> CREATE INDEX
> reshke=#
> reshke=# \d+ xx
> Table "public.xx"
> Column | Type | Collation | Nullable | Default
> | Storage | Compression | Stats target | Description
>
--------+---------+-----------+----------+-----------------------------+---------+-------------+--------------+-------------
> a | integer | | |
> | plain | | |
> b | integer | | | generated always as (a * 2)
> | plain | | |
> Indexes:
> "xx_b_idx" btree (b)
> Access method: heap
>
> reshke=# alter table xx drop column b;
> ALTER TABLE
> reshke=# \d+ xx
> Table "public.xx"
> Column | Type | Collation | Nullable | Default | Storage |
> Compression | Stats target | Description
> --------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
> a | integer | | | | plain |
> | |
> Indexes:
> "xx_b_idx" btree ("........pg.dropped.2........" int4_ops)
> Access method: heap
>
> reshke=#
> ```
>
> with regular columns we have different behaviour - with drop column we
> drop the index
>
I was wrong about dependency.
when creating an index on a virtual generated column, it will have
dependency with
virtual generated column attribute and the generation expression
associated attribute.
new patch attached. Now,
ALTER TABLE DROP COLUMN works fine.
ALTER INDEX ATTACH PARTITION works fine.
creating such an index on a partitioned table works just fine.
for table inheritance: create index on parent table will not cascade
to child table,
so we don't need to worry about this.
Вложения
On Mon, 14 Apr 2025 at 16:10, jian he <jian.universality@gmail.com> wrote: > > new patch attached. Now, > ALTER TABLE DROP COLUMN works fine. > ALTER INDEX ATTACH PARTITION works fine. > creating such an index on a partitioned table works just fine. > for table inheritance: create index on parent table will not cascade > to child table, > so we don't need to worry about this. Hi! I reviewed v2, and it seems to be working now. But there are tests that are comment-out, what is their purpose? I note that commit 83ea6c5 also included some commented tests, so perhaps there's a reason I'm not aware of. ``` ALTER TABLE gtest22c DROP COLUMN e; \d gtest22c -- EXPLAIN (COSTS OFF) SELECT * FROM gtest22c WHERE b * 3 = 6; -- SELECT * FROM gtest22c WHERE b * 3 = 6; -- EXPLAIN (COSTS OFF) SELECT * FROM gtest22c WHERE a = 1 AND b > 0; -- SELECT * FROM gtest22c WHERE a = 1 AND b > 0; ``` -- Best regards, Kirill Reshke
On Mon, Apr 14, 2025 at 8:05 PM Kirill Reshke <reshkekirill@gmail.com> wrote: > > On Mon, 14 Apr 2025 at 16:10, jian he <jian.universality@gmail.com> wrote: > > > > new patch attached. Now, > > ALTER TABLE DROP COLUMN works fine. > > ALTER INDEX ATTACH PARTITION works fine. > > creating such an index on a partitioned table works just fine. > > for table inheritance: create index on parent table will not cascade > > to child table, > > so we don't need to worry about this. > > Hi! I reviewed v2, and it seems to be working now. > > But there are tests that are comment-out, what is their purpose? I > note that commit 83ea6c5 also included some commented tests, so > perhaps there's a reason I'm not aware of. > > ``` > ALTER TABLE gtest22c DROP COLUMN e; > \d gtest22c > > -- EXPLAIN (COSTS OFF) SELECT * FROM gtest22c WHERE b * 3 = 6; > -- SELECT * FROM gtest22c WHERE b * 3 = 6; > -- EXPLAIN (COSTS OFF) SELECT * FROM gtest22c WHERE a = 1 AND b > 0; > -- SELECT * FROM gtest22c WHERE a = 1 AND b > 0; > ``` comment out tests are for to be implemented feature. There are some test changes that are indeed not necessary, I restored it back, please check attached.
Вложения
On Tue, Apr 15, 2025 at 4:36 PM jian he <jian.universality@gmail.com> wrote: > > comment out tests are for to be implemented feature. > There are some test changes that are indeed not necessary, I restored it back, > please check attached. hi. refactor and rebase.
Вложения
On Tue, Jul 8, 2025 at 2:37 PM jian he <jian.universality@gmail.com> wrote: > > On Tue, Apr 15, 2025 at 4:36 PM jian he <jian.universality@gmail.com> wrote: > > > > comment out tests are for to be implemented feature. > > There are some test changes that are indeed not necessary, I restored it back, > > please check attached. > > hi. > refactor and rebase. fix the regress tests failure in v4.
Вложения
> hi.
> refactor and rebase.
fix the regress tests failure in v4.
This may need another rebase, as it doesn't apply to master.
I'm interested in this feature, specifically whether the optimizer uses the index in situations where the expression is used rather than the virtual column name.
For example:
CREATE TABLE example (
regular_name text,
lowecase_name text GENERATED ALWAYS AS lower(regular_name) VIRTUAL
);
);
CREATE INDEX example_b ON example(b);
EXPLAIN SELECT regular_name FROM example WHERE lowercase_name = 'john q smith';
EXPLAIN SELECT regular_name FROM example WHERE lower(regular_name) = 'john q smith';
Corey Huinker <corey.huinker@gmail.com> writes:
> I'm interested in this feature, specifically whether the optimizer uses the
> index in situations where the expression is used rather than the virtual
> column name.
Hmm, I kinda think we should not do this. The entire point of a
virtual column is that its values are not stored and so you can
(for example) change the generation expression "for free".
If it's referenced in an index that advantage goes out the window
because we'll have to rebuild the index.
Besides, this does nothing you haven't been able to do for
decades with expression indexes.
regards, tom lane
On Wed, Jul 23, 2025 at 4:54 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Corey Huinker <corey.huinker@gmail.com> writes: > > I'm interested in this feature, specifically whether the optimizer uses the > > index in situations where the expression is used rather than the virtual > > column name. > > Hmm, I kinda think we should not do this. The entire point of a > virtual column is that its values are not stored and so you can > (for example) change the generation expression "for free". > If it's referenced in an index that advantage goes out the window > because we'll have to rebuild the index. > > Besides, this does nothing you haven't been able to do for > decades with expression indexes. > hi. CREATE TABLE example (regular_name text, lowecase_name text GENERATED ALWAYS AS (lower(regular_name)) VIRTUAL); CREATE INDEX example_b ON example(lowecase_name); CREATE INDEX example_c ON example(lower(regular_name)); select distinct indnatts,indnkeyatts,indisunique, indnullsnotdistinct,indisprimary,indisexclusion,indimmediate,indisclustered,indisvalid, indcheckxmin,indisready,indislive,indisreplident,indkey,indcollation,indclass,indoption, indexprs from pg_index where indrelid ='example'::regclass; will return one row, meaning catalog table pg_index stored almost all the same information. For indexes example_b and example_c, the only difference lies in the new column indattrgenerated. In example_b, indattrgenerated is not null, whereas in example_c, it is null. This column (indattrgenerated) is needed to track dependencies on generated columns, which is important for index rebuild. obviously, get_relation_info will collect the same information for example_b, example_c. which means the optimizer will use the same information to make the decision. --------------------------------- set enable_seqscan to off; set enable_bitmapscan to off; CREATE TABLE example (regular_name text, lowecase_name text GENERATED ALWAYS AS (lower(regular_name)) VIRTUAL); CREATE INDEX example_b ON example(lowecase_name); EXPLAIN(COSTS OFF) SELECT regular_name FROM example WHERE lowecase_name = 'john q smith'; EXPLAIN(COSTS OFF) SELECT regular_name FROM example WHERE lower(regular_name) = 'john q smith'; So current implementation, the above two query plans will produce the same query plan. the generation expression or virtual generated column data type changes will cause the index to rebuild. Is this we want? Or should changing the generation expression or data type of a virtual generated column mark the associated index as invalid, without triggering a rebuild?
Вложения
On Tue, Jul 22, 2025 at 4:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Corey Huinker <corey.huinker@gmail.com> writes:
> I'm interested in this feature, specifically whether the optimizer uses the
> index in situations where the expression is used rather than the virtual
> column name.
Hmm, I kinda think we should not do this. The entire point of a
virtual column is that its values are not stored and so you can
(for example) change the generation expression "for free".
If it's referenced in an index that advantage goes out the window
because we'll have to rebuild the index.
Don't we already have dependencies on altering a column if it's indexed? Wouldn't that be all the barrier we'd need?
Besides, this does nothing you haven't been able to do for
decades with expression indexes.
What I'm hoping for with this feature is a smoother transition when people start introducing virtual columns. If there was already an index on that expression, and some queries start using the new virtual column (with the same expression), will that cause those queries to miss the index we already have? If it doesn't, then a customer can roll out the query changes at will and not need to do some cut-over from using the expression to using the virtual column.
On Thu, Jul 31, 2025 at 2:24 AM Corey Huinker <corey.huinker@gmail.com> wrote: > >> >> Besides, this does nothing you haven't been able to do for >> decades with expression indexes. > > > What I'm hoping for with this feature is a smoother transition when people start introducing virtual columns. If therewas already an index on that expression, and some queries start using the new virtual column (with the same expression),will that cause those queries to miss the index we already have? If it doesn't, then a customer can roll outthe query changes at will and not need to do some cut-over from using the expression to using the virtual column. > hi. I am not sure whether this concern has been addressed, as I am still somewhat confused by the above paragraph. As noted earlier, creating an index on a virtual generated column results in a new index, and that index behaves the same as a regular expression index. An updated and polished patch is attached. The regress tests are quite verbose at the moment, since I make it covered all index types (btree, gist, spgist, hash, gin, and brin). -- jian https://www.enterprisedb.com/
Вложения
Hi all, On Thu, Feb 19, 2026 at 5:18 PM jian he <jian.universality@gmail.com> wrote: > > On Thu, Jul 31, 2025 at 2:24 AM Corey Huinker <corey.huinker@gmail.com> wrote: > > > >> > >> Besides, this does nothing you haven't been able to do for > >> decades with expression indexes. > > > > > > What I'm hoping for with this feature is a smoother transition when people start introducing virtual columns. If therewas already an index on that expression, and some queries start using the new virtual column (with the same expression),will that cause those queries to miss the index we already have? If it doesn't, then a customer can roll outthe query changes at will and not need to do some cut-over from using the expression to using the virtual column. > > > > hi. > I am not sure whether this concern has been addressed, as I am still somewhat > confused by the above paragraph. > > As noted earlier, creating an index on a virtual generated column results in a > new index, and that index behaves the same as a regular expression index. > > An updated and polished patch is attached. The regress tests are quite verbose > at the moment, since I make it covered all index types (btree, gist, spgist, > hash, gin, and brin). > I went through the discussions and have reviewed the patch. During testing, I encountered a segmentation fault during initdb which occurred due to a crash in bootstrap mode when ComputeIndexAttrs() was invoked with pstate == NULL while system catalog indexes were being created. The virtual generated column handling logic was executing unconditionally, which is unsafe during bootstrap because catalog and syscache state are not fully initialized. I fixed this by guarding the virtual generated column logic to skip the logic when IsBootstrapProcessingMode() is true so that it can prevent the generated-column rewrite from executing during bootstrap. After the fix, initdb no longer crashes in bootstrap mode, the full regression test suite passes cleanly, the server starts normally and key behaviors like index creation on virtual generated columns, planner rewrite, dependency tracking, partitioned table support, and dump/restore roundtrips are all correct. Also I performed some additional validations to ensure consistent behaviours. With the bootstrap guard in place, the patch seems functionally correct, catalog-safe, and dump/restore safe. Kindly review the patch. Looking forward to more feedback. Regards, Soumya
Вложения
Hi all, While validating the bootstrap crash fix for the virtual generated column index patch, I reviewed the current regression coverage around this area. During this process, I noticed that the existing regression suite verifies the functional restrictions on virtual generated columns, but there is no dedicated regression test specifically covering index-related restrictions. As this patch touches index creation logic and also exposes a bootstrap-time crash during testing, it may be useful to add explicit regression coverage for these cases. Having a focused test would help ensure that the current restrictions remain stable and that any future changes affecting this area are detected early by the regression framework and may strengthen long-term test coverage. As a small enhancement in this direction, I prepared a regression test that verifies the expected errors for the following scenarios: 1. CREATE INDEX on a virtual generated column 2. Partial index on a virtual generated column 3. PRIMARY KEY on a virtual generated column 4. Expression index referencing a virtual generated column This ensures the current restrictions remain protected by the regression framework and helps detect unintended changes in the future. The attached patch only adds the regression test and expected output. If this finds useful, kindly verify the patch attached herewith and please let me know the thoughts on this. I would be happy to refine or extend the test further. Looking forward to more feedback. Regards, Soumya
Вложения
On 08.01.26 07:16, jian he wrote:
> On Thu, Jul 31, 2025 at 2:24 AM Corey Huinker <corey.huinker@gmail.com> wrote:
>>
>>>
>>> Besides, this does nothing you haven't been able to do for
>>> decades with expression indexes.
>>
>>
>> What I'm hoping for with this feature is a smoother transition when people start introducing virtual columns. If
therewas already an index on that expression, and some queries start using the new virtual column (with the same
expression),will that cause those queries to miss the index we already have? If it doesn't, then a customer can roll
outthe query changes at will and not need to do some cut-over from using the expression to using the virtual column.
>>
>
> hi.
> I am not sure whether this concern has been addressed, as I am still somewhat
> confused by the above paragraph.
>
> As noted earlier, creating an index on a virtual generated column results in a
> new index, and that index behaves the same as a regular expression index.
>
> An updated and polished patch is attached. The regress tests are quite verbose
> at the moment, since I make it covered all index types (btree, gist, spgist,
> hash, gin, and brin).
I think you could do a much simpler initial version of this if you just
supported virtual generated columns in expression indexes. And then
prohibit SET EXPRESSION if the column is used in an index. Then you
don't need to worry about index rebuilding, ALTER TABLE recursion, new
catalog columns, and all that.
But there is a comment in DefineIndex():
/*
* XXX Virtual generated columns in index expressions or predicates
* could be supported, but it needs support in
* RelationGetIndexExpressions() and RelationGetIndexPredicate().
*/
which you delete, but you don't make any changes to those mentioned
functions. Maybe the comment is wrong, in which case, let's discuss
that and fix it. (If the comment is indeed wrong, then the feature
might even be very easy.)
On Fri, Mar 13, 2026 at 10:01 PM Peter Eisentraut <peter@eisentraut.org> wrote: > > I think you could do a much simpler initial version of this if you just > supported virtual generated columns in expression indexes. And then > prohibit SET EXPRESSION if the column is used in an index. Then you > don't need to worry about index rebuilding, ALTER TABLE recursion, new > catalog columns, and all that. > CREATE TABLE gtest22c (a int, b int GENERATED ALWAYS AS (a) VIRTUAL); CREATE INDEX gtest22c_a_idx ON gtest22c (a); CREATE INDEX gtest22c_b_idx ON gtest22c (b); If we don't add a new catalog column (just a single boolean indisvirtual is not enough, i think), how can we distinguish between the gtest22c_a_idx and gtest22c_b_idx indexes in the example above? If CREATE INDEX simply expands the virtual generated column expression without dependency tracking, that would be quite easy, see the attached v8. If so, we need to explicitly document that SET EXPRESSION has no effect on existing indexes that originally referenced the virtual generated column when CREATE INDEX was used. > But there is a comment in DefineIndex(): > > /* > * XXX Virtual generated columns in index expressions or predicates > * could be supported, but it needs support in > * RelationGetIndexExpressions() and RelationGetIndexPredicate(). > */ > > which you delete, but you don't make any changes to those mentioned > functions. Maybe the comment is wrong, in which case, let's discuss > that and fix it. (If the comment is indeed wrong, then the feature > might even be very easy.) > I don't think it's a good idea to store the virtual generated columns as is in Form_pg_index->indkey because IndexInfo->ii_IndexAttrNumbers and Form->pg_index->indkey are referenced in too many places (see BuildIndexInfo). For every single occurrence of ndkey.values[i], we need to consider whether it's ok for it be a virtual generated column. Instead, Anum_pg_index_indexprs and Anum_pg_index_indpred store the expressions after the virtual generated columns expansion, then we don't need to worry about Form_pg_index->indkey.values[i] is virtual generated column or not. Therefore, i think RelationGetIndexExpressions and RelationGetIndexPredicate don't need to deal with virtual generated column expressions at all. Overall, I think the comment above is wrong. -- jian https://www.enterprisedb.com/