Обсуждение: support create index on virtual generated column.

Поиск
Список
Период
Сортировка

support create index on virtual generated column.

От
jian he
Дата:
hi.
attached patch for implementing $subject feature.

* internally such index will be transformed into expression index.
  for example, an index on (b int GENERATED ALWAYS AS (a * 2) VIRTUAL) will be
  converted into an expression index on ((a * 2)).
* in pageinspect module, add some test to check the index content of
virtual generated column.
* primary key, unique index over virtual generated column are not supported.
  not sure they make sense or not.
* expression index and predicate index over virtual generated columns are
  currently not supported.
* virtual generated column can not be in "include column"
* all types of indexes are supported, and a hash index, gist test has
been added.
* To support ALTER TABLE SET EXPRESSION, in pg_index, we need to track
the original
  virtual generated column attribute number, so ALTER TABLE SET EXPRESSION can
  identify which index needs to be rebuilt.
* ALTER COLUMN SET DATA TYPE will also cause table rewrite, so we really
  need to track the virtual generated column attribute number that
index was built on.

Вложения

Re: support create index on virtual generated column.

От
Kirill Reshke
Дата:
On Wed, 26 Mar 2025 at 12:15, jian he <jian.universality@gmail.com> wrote:
>
> hi.
> attached patch for implementing $subject feature.
>
> * internally such index will be transformed into expression index.
>   for example, an index on (b int GENERATED ALWAYS AS (a * 2) VIRTUAL) will be
>   converted into an expression index on ((a * 2)).
> * in pageinspect module, add some test to check the index content of
> virtual generated column.
> * primary key, unique index over virtual generated column are not supported.
>   not sure they make sense or not.
> * expression index and predicate index over virtual generated columns are
>   currently not supported.
> * virtual generated column can not be in "include column"
> * all types of indexes are supported, and a hash index, gist test has
> been added.
> * To support ALTER TABLE SET EXPRESSION, in pg_index, we need to track
> the original
>   virtual generated column attribute number, so ALTER TABLE SET EXPRESSION can
>   identify which index needs to be rebuilt.
> * ALTER COLUMN SET DATA TYPE will also cause table rewrite, so we really
>   need to track the virtual generated column attribute number that
> index was built on.


Hi!
patch applies with warns
```
Applying: support create index on virtual generated column.
.git/rebase-apply/patch:250: trailing whitespace.
     * updated correctly, and they don't seem useful anyway.
.git/rebase-apply/patch:271: trailing whitespace.
     * Also check for system used in expressions or predicates.
warning: 2 lines add whitespace errors.
```



consider this case:


```

reshke=# CREATE TABLE xx (a int, b int GENERATED ALWAYS AS (a * 2) VIRTUAL) ;
CREATE TABLE
reshke=# create index on xx (b);
CREATE INDEX
reshke=#
reshke=# \d+ xx
                                                     Table "public.xx"
 Column |  Type   | Collation | Nullable |           Default
| Storage | Compression | Stats target | Description

--------+---------+-----------+----------+-----------------------------+---------+-------------+--------------+-------------
 a      | integer |           |          |
| plain   |             |              |
 b      | integer |           |          | generated always as (a * 2)
| plain   |             |              |
Indexes:
    "xx_b_idx" btree (b)
Access method: heap

reshke=# alter table xx drop column b;
ALTER TABLE
reshke=# \d+ xx
                                           Table "public.xx"
 Column |  Type   | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
 a      | integer |           |          |         | plain   |
    |              |
Indexes:
    "xx_b_idx" btree ("........pg.dropped.2........" int4_ops)
Access method: heap

reshke=#
```

with regular columns we have different behaviour - with drop column we
drop the index


-- 
Best regards,
Kirill Reshke



Re: support create index on virtual generated column.

От
jian he
Дата:
On Wed, Mar 26, 2025 at 5:36 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
> reshke=# CREATE TABLE xx (a int, b int GENERATED ALWAYS AS (a * 2) VIRTUAL) ;
> CREATE TABLE
> reshke=# create index on xx (b);
> CREATE INDEX
> reshke=#
> reshke=# \d+ xx
>                                                      Table "public.xx"
>  Column |  Type   | Collation | Nullable |           Default
> | Storage | Compression | Stats target | Description
>
--------+---------+-----------+----------+-----------------------------+---------+-------------+--------------+-------------
>  a      | integer |           |          |
> | plain   |             |              |
>  b      | integer |           |          | generated always as (a * 2)
> | plain   |             |              |
> Indexes:
>     "xx_b_idx" btree (b)
> Access method: heap
>
> reshke=# alter table xx drop column b;
> ALTER TABLE
> reshke=# \d+ xx
>                                            Table "public.xx"
>  Column |  Type   | Collation | Nullable | Default | Storage |
> Compression | Stats target | Description
> --------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
>  a      | integer |           |          |         | plain   |
>     |              |
> Indexes:
>     "xx_b_idx" btree ("........pg.dropped.2........" int4_ops)
> Access method: heap
>
> reshke=#
> ```
>
> with regular columns we have different behaviour - with drop column we
> drop the index
>

I was wrong about dependency.
when creating an index on a virtual generated column, it will have
dependency with
virtual generated column attribute and the generation expression
associated attribute.

new patch attached. Now,
ALTER TABLE DROP COLUMN works fine.
ALTER INDEX ATTACH PARTITION works fine.
creating such an index on a partitioned table works just fine.
for table inheritance: create index on parent table will not cascade
to child table,
so we don't need to worry about this.

Вложения

Re: support create index on virtual generated column.

От
Kirill Reshke
Дата:
On Mon, 14 Apr 2025 at 16:10, jian he <jian.universality@gmail.com> wrote:
>
> new patch attached. Now,
> ALTER TABLE DROP COLUMN works fine.
> ALTER INDEX ATTACH PARTITION works fine.
> creating such an index on a partitioned table works just fine.
> for table inheritance: create index on parent table will not cascade
> to child table,
> so we don't need to worry about this.

Hi! I reviewed v2, and it seems to be working now.

But there are tests that are comment-out, what is their purpose? I
note that commit 83ea6c5 also included some commented tests, so
perhaps there's a reason I'm not aware of.

```
ALTER TABLE gtest22c DROP COLUMN e;
\d gtest22c

-- EXPLAIN (COSTS OFF) SELECT * FROM gtest22c WHERE b * 3 = 6;
-- SELECT * FROM gtest22c WHERE b * 3 = 6;
-- EXPLAIN (COSTS OFF) SELECT * FROM gtest22c WHERE a = 1 AND b > 0;
-- SELECT * FROM gtest22c WHERE a = 1 AND b > 0;
```


-- 
Best regards,
Kirill Reshke



Re: support create index on virtual generated column.

От
jian he
Дата:
On Mon, Apr 14, 2025 at 8:05 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
>
> On Mon, 14 Apr 2025 at 16:10, jian he <jian.universality@gmail.com> wrote:
> >
> > new patch attached. Now,
> > ALTER TABLE DROP COLUMN works fine.
> > ALTER INDEX ATTACH PARTITION works fine.
> > creating such an index on a partitioned table works just fine.
> > for table inheritance: create index on parent table will not cascade
> > to child table,
> > so we don't need to worry about this.
>
> Hi! I reviewed v2, and it seems to be working now.
>
> But there are tests that are comment-out, what is their purpose? I
> note that commit 83ea6c5 also included some commented tests, so
> perhaps there's a reason I'm not aware of.
>

> ```
> ALTER TABLE gtest22c DROP COLUMN e;
> \d gtest22c
>
> -- EXPLAIN (COSTS OFF) SELECT * FROM gtest22c WHERE b * 3 = 6;
> -- SELECT * FROM gtest22c WHERE b * 3 = 6;
> -- EXPLAIN (COSTS OFF) SELECT * FROM gtest22c WHERE a = 1 AND b > 0;
> -- SELECT * FROM gtest22c WHERE a = 1 AND b > 0;
> ```

comment out tests are for to be implemented feature.
There are some test changes that are indeed not necessary, I restored it back,
please check attached.

Вложения

Re: support create index on virtual generated column.

От
jian he
Дата:
On Tue, Apr 15, 2025 at 4:36 PM jian he <jian.universality@gmail.com> wrote:
>
> comment out tests are for to be implemented feature.
> There are some test changes that are indeed not necessary, I restored it back,
> please check attached.

hi.
refactor and rebase.

Вложения

Re: support create index on virtual generated column.

От
jian he
Дата:
On Tue, Jul 8, 2025 at 2:37 PM jian he <jian.universality@gmail.com> wrote:
>
> On Tue, Apr 15, 2025 at 4:36 PM jian he <jian.universality@gmail.com> wrote:
> >
> > comment out tests are for to be implemented feature.
> > There are some test changes that are indeed not necessary, I restored it back,
> > please check attached.
>
> hi.
> refactor and rebase.

fix the regress tests failure in v4.

Вложения

Re: support create index on virtual generated column.

От
Corey Huinker
Дата:
> hi.
> refactor and rebase.

fix the regress tests failure in v4.

This may need another rebase, as it doesn't apply to master.

I'm interested in this feature, specifically whether the optimizer uses the index in situations where the expression is used rather than the virtual column name.

For example:

CREATE TABLE example (
   regular_name text,
   lowecase_name text GENERATED ALWAYS AS lower(regular_name) VIRTUAL
);

CREATE INDEX example_b ON example(b);

EXPLAIN SELECT regular_name FROM example WHERE lowercase_name = 'john q smith';

EXPLAIN SELECT regular_name FROM example WHERE lower(regular_name) = 'john q smith';

Re: support create index on virtual generated column.

От
Tom Lane
Дата:
Corey Huinker <corey.huinker@gmail.com> writes:
> I'm interested in this feature, specifically whether the optimizer uses the
> index in situations where the expression is used rather than the virtual
> column name.

Hmm, I kinda think we should not do this.  The entire point of a
virtual column is that its values are not stored and so you can
(for example) change the generation expression "for free".
If it's referenced in an index that advantage goes out the window
because we'll have to rebuild the index.

Besides, this does nothing you haven't been able to do for
decades with expression indexes.

            regards, tom lane



Re: support create index on virtual generated column.

От
jian he
Дата:
On Wed, Jul 23, 2025 at 4:54 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Corey Huinker <corey.huinker@gmail.com> writes:
> > I'm interested in this feature, specifically whether the optimizer uses the
> > index in situations where the expression is used rather than the virtual
> > column name.
>
> Hmm, I kinda think we should not do this.  The entire point of a
> virtual column is that its values are not stored and so you can
> (for example) change the generation expression "for free".
> If it's referenced in an index that advantage goes out the window
> because we'll have to rebuild the index.
>
> Besides, this does nothing you haven't been able to do for
> decades with expression indexes.
>

hi.

CREATE TABLE example (regular_name text, lowecase_name text GENERATED
ALWAYS AS (lower(regular_name)) VIRTUAL);
CREATE INDEX example_b ON example(lowecase_name);
CREATE INDEX example_c ON example(lower(regular_name));

select distinct indnatts,indnkeyatts,indisunique,
indnullsnotdistinct,indisprimary,indisexclusion,indimmediate,indisclustered,indisvalid,
indcheckxmin,indisready,indislive,indisreplident,indkey,indcollation,indclass,indoption,
indexprs
from pg_index
where indrelid ='example'::regclass;

will return one row, meaning catalog table pg_index stored almost all
the same information.

For indexes example_b and example_c, the only difference lies in the new column
indattrgenerated. In example_b, indattrgenerated is not null, whereas in
example_c, it is null. This column (indattrgenerated) is needed to track
dependencies on generated columns, which is important for index
rebuild.

obviously, get_relation_info will collect the same information for
example_b, example_c.
which means the optimizer will use the same information to make the decision.
---------------------------------
set enable_seqscan to off;
set enable_bitmapscan to off;
CREATE TABLE example (regular_name text, lowecase_name text GENERATED
ALWAYS AS (lower(regular_name)) VIRTUAL);
CREATE INDEX example_b ON example(lowecase_name);

EXPLAIN(COSTS OFF) SELECT regular_name FROM example WHERE
lowecase_name = 'john q smith';
EXPLAIN(COSTS OFF) SELECT regular_name FROM example WHERE
lower(regular_name) = 'john q smith';

So current implementation, the above two query plans will produce the same query
plan. the generation expression or virtual generated column data type changes
will cause the index to rebuild.

Is this we want?
Or should changing the generation expression or data type of a virtual generated
column mark the associated index as invalid, without triggering a rebuild?

Вложения

Re: support create index on virtual generated column.

От
Corey Huinker
Дата:

On Tue, Jul 22, 2025 at 4:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Corey Huinker <corey.huinker@gmail.com> writes:
> I'm interested in this feature, specifically whether the optimizer uses the
> index in situations where the expression is used rather than the virtual
> column name.

Hmm, I kinda think we should not do this.  The entire point of a
virtual column is that its values are not stored and so you can
(for example) change the generation expression "for free".
If it's referenced in an index that advantage goes out the window
because we'll have to rebuild the index.

Don't we already have dependencies on altering a column if it's indexed? Wouldn't that be all the barrier we'd need?
 

Besides, this does nothing you haven't been able to do for
decades with expression indexes.

What I'm hoping for with this feature is a smoother transition when people start introducing virtual columns. If there was already an index on that expression, and some queries start using the new virtual column (with the same expression), will that cause those queries to miss the index we already have? If it doesn't, then a customer can roll out the query changes at will and not need to do some cut-over from using the expression to using the virtual column.
 

Re: support create index on virtual generated column.

От
jian he
Дата:
On Thu, Jul 31, 2025 at 2:24 AM Corey Huinker <corey.huinker@gmail.com> wrote:
>
>>
>> Besides, this does nothing you haven't been able to do for
>> decades with expression indexes.
>
>
> What I'm hoping for with this feature is a smoother transition when people start introducing virtual columns. If
therewas already an index on that expression, and some queries start using the new virtual column (with the same
expression),will that cause those queries to miss the index we already have? If it doesn't, then a customer can roll
outthe query changes at will and not need to do some cut-over from using the expression to using the virtual column. 
>

hi.
I am not sure whether this concern has been addressed, as I am still somewhat
confused by the above paragraph.

As noted earlier, creating an index on a virtual generated column results in a
new index, and that index behaves the same as a regular expression index.

An updated and polished patch is attached. The regress tests are quite verbose
at the moment, since I make it covered all index types (btree, gist, spgist,
hash, gin, and brin).


--
jian
https://www.enterprisedb.com/

Вложения

Re: support create index on virtual generated column.

От
Soumya S Murali
Дата:
Hi all,

On Thu, Feb 19, 2026 at 5:18 PM jian he <jian.universality@gmail.com> wrote:
>
> On Thu, Jul 31, 2025 at 2:24 AM Corey Huinker <corey.huinker@gmail.com> wrote:
> >
> >>
> >> Besides, this does nothing you haven't been able to do for
> >> decades with expression indexes.
> >
> >
> > What I'm hoping for with this feature is a smoother transition when people start introducing virtual columns. If
therewas already an index on that expression, and some queries start using the new virtual column (with the same
expression),will that cause those queries to miss the index we already have? If it doesn't, then a customer can roll
outthe query changes at will and not need to do some cut-over from using the expression to using the virtual column. 
> >
>
> hi.
> I am not sure whether this concern has been addressed, as I am still somewhat
> confused by the above paragraph.
>
> As noted earlier, creating an index on a virtual generated column results in a
> new index, and that index behaves the same as a regular expression index.
>
> An updated and polished patch is attached. The regress tests are quite verbose
> at the moment, since I make it covered all index types (btree, gist, spgist,
> hash, gin, and brin).
>

I went through the discussions and have reviewed the patch.
During testing, I encountered a segmentation fault during initdb which
occurred due to a crash in bootstrap mode when ComputeIndexAttrs() was
invoked with pstate == NULL while system catalog indexes were being
created. The virtual generated column handling logic was executing
unconditionally, which is unsafe during bootstrap because catalog and
syscache state are not fully initialized. I fixed this by guarding the
virtual generated column logic to skip the logic when
IsBootstrapProcessingMode() is true so that it can prevent the
generated-column rewrite from executing during bootstrap. After the
fix, initdb no longer crashes in bootstrap mode, the full regression
test suite passes cleanly, the server starts normally and key
behaviors like index creation on virtual generated columns, planner
rewrite, dependency tracking, partitioned table support, and
dump/restore roundtrips are all correct. Also I performed some
additional validations to ensure consistent behaviours. With the
bootstrap guard in place, the patch seems functionally correct,
catalog-safe, and dump/restore safe.
Kindly review the patch. Looking forward to more feedback.

Regards,
Soumya

Вложения

Re: support create index on virtual generated column.

От
Soumya S Murali
Дата:
Hi all,

While validating the bootstrap crash fix for the virtual generated
column index patch, I reviewed the current regression coverage around
this area. During this process, I noticed that the existing regression
suite verifies the functional restrictions on virtual generated
columns, but there is no dedicated regression test specifically
covering index-related restrictions. As this patch touches index
creation logic and also exposes a bootstrap-time crash during testing,
it may be useful to add explicit regression coverage for these cases.
Having a focused test would help ensure that the current restrictions
remain stable and that any future changes affecting this area are
detected early by the regression framework and may strengthen
long-term test coverage.
As a small enhancement in this direction, I prepared a regression test
that verifies the expected errors for the following scenarios:

1. CREATE INDEX on a virtual generated column
2. Partial index on a virtual generated column
3. PRIMARY KEY on a virtual generated column
4. Expression index referencing a virtual generated column

This ensures the current restrictions remain protected by the
regression framework and helps detect unintended changes in the
future. The attached patch only adds the regression test and expected
output.
If this finds useful, kindly verify the patch attached herewith and
please let me know the thoughts on this. I would be happy to refine or
extend the test further.
Looking forward to more feedback.

Regards,
Soumya

Вложения

Re: support create index on virtual generated column.

От
Peter Eisentraut
Дата:
On 08.01.26 07:16, jian he wrote:
> On Thu, Jul 31, 2025 at 2:24 AM Corey Huinker <corey.huinker@gmail.com> wrote:
>>
>>>
>>> Besides, this does nothing you haven't been able to do for
>>> decades with expression indexes.
>>
>>
>> What I'm hoping for with this feature is a smoother transition when people start introducing virtual columns. If
therewas already an index on that expression, and some queries start using the new virtual column (with the same
expression),will that cause those queries to miss the index we already have? If it doesn't, then a customer can roll
outthe query changes at will and not need to do some cut-over from using the expression to using the virtual column.
 
>>
> 
> hi.
> I am not sure whether this concern has been addressed, as I am still somewhat
> confused by the above paragraph.
> 
> As noted earlier, creating an index on a virtual generated column results in a
> new index, and that index behaves the same as a regular expression index.
> 
> An updated and polished patch is attached. The regress tests are quite verbose
> at the moment, since I make it covered all index types (btree, gist, spgist,
> hash, gin, and brin).

I think you could do a much simpler initial version of this if you just 
supported virtual generated columns in expression indexes.  And then 
prohibit SET EXPRESSION if the column is used in an index.  Then you 
don't need to worry about index rebuilding, ALTER TABLE recursion, new 
catalog columns, and all that.

But there is a comment in DefineIndex():

     /*
      * XXX Virtual generated columns in index expressions or predicates
      * could be supported, but it needs support in
      * RelationGetIndexExpressions() and RelationGetIndexPredicate().
      */

which you delete, but you don't make any changes to those mentioned 
functions.  Maybe the comment is wrong, in which case, let's discuss 
that and fix it.  (If the comment is indeed wrong, then the feature 
might even be very easy.)





Re: support create index on virtual generated column.

От
jian he
Дата:
On Fri, Mar 13, 2026 at 10:01 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> I think you could do a much simpler initial version of this if you just
> supported virtual generated columns in expression indexes.  And then
> prohibit SET EXPRESSION if the column is used in an index.  Then you
> don't need to worry about index rebuilding, ALTER TABLE recursion, new
> catalog columns, and all that.
>

CREATE TABLE gtest22c (a int, b int GENERATED ALWAYS AS (a) VIRTUAL);
CREATE INDEX gtest22c_a_idx ON gtest22c (a);
CREATE INDEX gtest22c_b_idx ON gtest22c (b);

If we don't add a new catalog column (just a single boolean
indisvirtual is not enough, i think),
how can we distinguish between the gtest22c_a_idx and gtest22c_b_idx
indexes in the example above?

If CREATE INDEX simply expands the virtual generated column expression
without dependency tracking, that would be quite easy, see the
attached v8.
If so, we need to explicitly document that SET EXPRESSION has no
effect on existing indexes that originally referenced the virtual
generated column when CREATE INDEX was used.

> But there is a comment in DefineIndex():
>
>      /*
>       * XXX Virtual generated columns in index expressions or predicates
>       * could be supported, but it needs support in
>       * RelationGetIndexExpressions() and RelationGetIndexPredicate().
>       */
>
> which you delete, but you don't make any changes to those mentioned
> functions.  Maybe the comment is wrong, in which case, let's discuss
> that and fix it.  (If the comment is indeed wrong, then the feature
> might even be very easy.)
>

I don't think it's a good idea to store the virtual generated columns
as is in Form_pg_index->indkey
because IndexInfo->ii_IndexAttrNumbers and Form->pg_index->indkey are
referenced in too many places (see BuildIndexInfo).
For every single occurrence of ndkey.values[i], we need to consider
whether it's ok for it be a virtual generated column.
Instead, Anum_pg_index_indexprs and Anum_pg_index_indpred store the
expressions after the virtual generated columns expansion,
then we don't need to worry about Form_pg_index->indkey.values[i] is
virtual generated column or not.

Therefore, i think RelationGetIndexExpressions and
RelationGetIndexPredicate don't need to
deal with virtual generated column expressions at all.
Overall, I think the comment above is wrong.



--
jian
https://www.enterprisedb.com/

Вложения