Обсуждение: Non-Stored Generated Columns

Поиск
Список
Период
Сортировка

Non-Stored Generated Columns

От
Dominique Devienne
Дата:
Hi.

From https://www.postgresql.org/docs/16/ddl-generated-columns.html:
> PostgreSQL currently implements only stored generated columns

We use generated columns extensively.
And we have foreign-keys attached to those generated columns.
The fact they are always Stored thus wastes space in our case.
Any chance PostgreSQL might gain actual virtual / non-stored generated columns soon? Ever?

For reference, both Oracle and SQLite have virtual / non-stored columns.
And both support FKs and indexes on those too.
Would be great to have feature parity on this particular point, eventually.

Thanks, --DD

Re: Non-Stored Generated Columns

От
Laurenz Albe
Дата:
On Wed, 2024-02-28 at 16:09 +0100, Dominique Devienne wrote:
> From https://www.postgresql.org/docs/16/ddl-generated-columns.html:
> > PostgreSQL currently implements only stored generated columns
>
> We use generated columns extensively.
> And we have foreign-keys attached to those generated columns.
> The fact they are always Stored thus wastes space in our case.
> Any chance PostgreSQL might gain actual virtual / non-stored generated columns soon? Ever?
>
> For reference, both Oracle and SQLite have virtual / non-stored columns.
> And both support FKs and indexes on those too.
> Would be great to have feature parity on this particular point, eventually.

Isn't almost all of that functionality covered by a view?

Yours,
Laurenz Albe



Re: Non-Stored Generated Columns

От
Dominique Devienne
Дата:
On Wed, Feb 28, 2024 at 5:59 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Wed, 2024-02-28 at 16:09 +0100, Dominique Devienne wrote:
> From https://www.postgresql.org/docs/16/ddl-generated-columns.html:
> > PostgreSQL currently implements only stored generated columns
>
> We use generated columns extensively.
> And we have foreign-keys attached to those generated columns.
> The fact they are always Stored thus wastes space in our case.
> Any chance PostgreSQL might gain actual virtual / non-stored generated columns soon? Ever?
>
> For reference, both Oracle and SQLite have virtual / non-stored columns.
> And both support FKs and indexes on those too.
> Would be great to have feature parity on this particular point, eventually.

Isn't almost all of that functionality covered by a view?

Views can have foreign-keys?
Generated view columns be indexed? (for efficient CASCADE of the FKs)

AFAIK, no. Plus that would double the relations in schemas too. --DD

Re: Non-Stored Generated Columns

От
Tom Lane
Дата:
Dominique Devienne <ddevienne@gmail.com> writes:
> Views can have foreign-keys?

Surely you'd put the FK on the underlying table.

> Generated view columns be indexed?

You want an index on a virtual column?  Sure, just build an expression
index (on the underlying table) that matches it.

I agree with Laurenz that it's hard to see much use-case here
that's not sufficiently covered already.

            regards, tom lane



Re: Non-Stored Generated Columns

От
Ron Johnson
Дата:
On Wed, Feb 28, 2024 at 2:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dominique Devienne <ddevienne@gmail.com> writes:
> Views can have foreign-keys?

Surely you'd put the FK on the underlying table.

> Generated view columns be indexed?

You want an index on a virtual column?  Sure, just build an expression
index (on the underlying table) that matches it.

I agree with Laurenz that it's hard to see much use-case here
that's not sufficiently covered already.

Oracle and SQLite have them, so, by definition, *many* more than one person (you don't get a feature added to Oracle that easily...) has seen sufficient reasons for doing it instead of modifying the underlying table, or building an expression index.

My guess as to why they do it is "clarity of design".

Re: Non-Stored Generated Columns

От
Dominique Devienne
Дата:
On Wed, Feb 28, 2024 at 8:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dominique Devienne <ddevienne@gmail.com> writes:
> Views can have foreign-keys?

Surely you'd put the FK on the underlying table.

Again, the FKs are on the *generated* columns. So 
 
> Generated view columns be indexed?

[...[ it's hard to see much use-case here

The use-case is Polymorphic Foreign Key (aka PFKs).
I've already outlined it on this list, in the past.

For NULL'able PFKs, you have two read-write concrete columns, the ID and the CODE (aka TYPE) of the relation.
Then you have 1 generated column (that I call XArc) per possible CODE/TYPE value (for that specific relation), which is either ID or NULL, depending on the CODE.
And the "regular" FK is on that generated (and ideally Non-Stored, the whole point of this thread), since it points to a single table now.
And since these FKs are CASCADE'ing, you want them INDEX'ed of course.

For NOT NULL PFKs, that more of a PITA, because you cannot SET NULL a generated column (there's no INSTEAD OF trigger on generated columns).
So instead we need a BEFORE INSERT/UPDATE trigger on the ID column, to dispatch to the proper per-TYPE-value column, which must now be concrete.
And make sure those now-non-generated XArc columns are always in sync with ID/CODE.

That's a PITA to implement, I really wish this was built-in to PostgreSQL (albeit non-standard SQL).
Especially given that PostgreSQL has table inheritance, but which don't work with FKs.
(in our case, some PFKs are not inheritance based, even though we do have inheritance in our logical models).
But at least it's fully automated in our case, since the physical schemas are machine generated from logical ones.

In one of our smaller schemas, 25 tables and 293 columns, we have 18 PFK virtual columns, i.e. 6% of the columns. So small, but not "that small".
We have PFK cardinalities (number of XArcs) that can go up to 8, in that one small schema.
When the cardinality goes too high, we explicitly choose to disable referential integrity for specific PFKs, sometimes.

So has Ron says. If SQLite and Oracle has them, that's not an accident.
And there's a real concrete use-case being it. Albeit an usual one.
Any OO language writer (Java, C++, etc...) can recognize the polymorphism pattern (yes, often abused).
And in our case, it's a long established pattern (20 years old) in our data models, transcribed to SQL and relational.
When this was started with SQLite, it wasn't enforced at the relational model, but the mid-tier C++ level.
But now that we move to PostgreSQL with direct SQL access (two tier), it must be PostgreSQL server-side enforced.

So, to conclude, it works with PostgreSQL now. But it's more wasteful that it could/should be, because of the STORED only current limitation.

Thanks, --DD

PS: Since we are on this subject, I'd like to take an opportunity to ask a question I've had for a long time, but didn't have time to research.
For a given PFK, all its XArc columns are mutually-exclusive (by design, see above). So their indexes are full of NULLs.
Are the NULLs stored in these indexes, consuming extra space? In our use case, we don't care about NULLs.
So should be use an expression index to explicitly not index the NULLs?
The sum of indexes for a given PFK should in theory be the same as a single index, whatever the cardinality of the PFK, "logically".

Re: Non-Stored Generated Columns

От
Laurenz Albe
Дата:
On Thu, 2024-02-29 at 08:55 +0100, Dominique Devienne wrote:
> On Wed, Feb 28, 2024 at 8:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Dominique Devienne <ddevienne@gmail.com> writes:
> > > Views can have foreign-keys?
> >
> > Surely you'd put the FK on the underlying table.
>
> Again, the FKs are on the *generated* columns. So 
>  
> > > Generated view columns be indexed?
> >
> > [...[ it's hard to see much use-case here
>
> The use-case is Polymorphic Foreign Key (aka PFKs).
>
> For NULL'able PFKs, you have two read-write concrete columns, the ID
> and the CODE (aka TYPE) of the relation.
> Then you have 1 generated column (that I call XArc) per possible CODE/TYPE
> value (for that specific relation), which is either ID or NULL, depending
> on the CODE.
> And the "regular" FK is on that generated (and ideally Non-Stored, the
> whole point of this thread), since it points to a single table now.
> And since these FKs are CASCADE'ing, you want them INDEX'ed of course.

I concede that that is a borderline use case.

But you may as well have the foreign key columns as actual columns
(or as stored generated columns).
If that's a handful or so, it shouldn't be a big problem.
If you have hundred types (hundred referenced tables), you'd end up
with hundreds of indexes, and that already looks like a very bad idea
(both DML and query planning time will be affected).

Polymorphic Foreign Keys are nigh impossible to model well in SQL,
and I doubt that non-stored generated columns will solve that.

> For NOT NULL PFKs, that more of a PITA [...]

See?

> So has Ron says. If SQLite and Oracle has them, that's not an accident.

Oracle has a lot of things that are not enviable...

I am sure there are some use cases for "virtual" generated columns, and
I don't think that a patch that tries to implement them will be rejected
out of hand.  It's just that I don't think it is a very important feature.

> PS: Since we are on this subject, I'd like to take an opportunity to
> ask a question I've had for a long time, but didn't have time to research.
> For a given PFK, all its XArc columns are mutually-exclusive (by design,
> see above). So their indexes are full of NULLs.
> Are the NULLs stored in these indexes, consuming extra space? In our use
> case, we don't care about NULLs.
> So should be use an expression index to explicitly not index the NULLs?
> The sum of indexes for a given PFK should in theory be the same as a
> single index, whatever the cardinality of the PFK, "logically".

Yes, NULLs are stored in indexes, just like everything else.

You could use conditional indexes, but then you have to make sure that
the optimizer knows it can use these indexes.

The sum of the sizes of these indexes shouldn't exceed the size of an
unconditional index by much, but they would still be more expensive:
each statement would have to look through all the indexes to decide
which ones it can use and which ones not.

Yours,
Laurenz Albe



Re: Non-Stored Generated Columns

От
Dominique Devienne
Дата:
On Thu, Feb 29, 2024 at 10:03 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Thu, 2024-02-29 at 08:55 +0100, Dominique Devienne wrote:
Polymorphic Foreign Keys are nigh impossible to model well in SQL,
and I doubt that non-stored generated columns will solve that.

It is modelled. It works. As I already wrote above.
It's not for everyone, no doubt about it. it has its costs. But we definitely must use it.

Non-stored generated columns simply makes that implementation more costly in space that it could/should be.
Honestly, I'm not sure why supporting the non-stored variant of generated columns is so controversial...
 
> For NOT NULL PFKs, that more of a PITA [...]

See?

And?  I'm not sure I see you point Laurenz.
The alternative to NOT implementing PFKs is NOT having referential integrity.
Which is a big no-no for us. The "costs" of PFKs are outweighed by lack of RI.
 
> So has Ron says. If SQLite and Oracle has them, that's not an accident.

Oracle has a lot of things that are not enviable...

The same can't be said for SQLite :)
And yet it has them too!
 
I am sure there are some use cases for "virtual" generated columns, and
I don't think that a patch that tries to implement them will be rejected
out of hand.  It's just that I don't think it is a very important feature.

Fair enough. And a reaction I expected when I first posted.
The outright rejection of it ever being useful, that's what surprised me.
I'm not a PostgreSQL server developer. So a patch won't be coming from me though...
I'm "only" a PostgreSQL / LIBPQ user, albeit maybe an "above average" one I dare say.
 
> Are the NULLs stored in these indexes, consuming extra space?

Yes, NULLs are stored in indexes, just like everything else.

OK, that confirms my worry. Thanks.
 
You could use conditional indexes, but then you have to make sure that
the optimizer knows it can use these indexes.

I'm not following. Are you saying the planner is not good at that on its own?
I need to do something from my end???
 
The sum of the sizes of these indexes shouldn't exceed the size of an
unconditional index by much, but they would still be more expensive:
each statement would have to look through all the indexes to decide
which ones it can use and which ones not.

Something I maybe didn't make clear. The XArc virtual columns are never accessed.
Only the ID and CODE concrete columns are read and written.
The XArcs are an implementation detail solely to ensure RI of PFKs.
The only reason we index them is for efficient CASCADE'ing of the FKs (on XArcs).

Surely the fact XArc "FK" indexes are partial/expression-based,
shouldn't prevent the PostgreSQL-internal CASCADE from using them, no?

Otherwise that would defeat having indexed FK columns, no?
Or did I misunderstand you last point?

Thanks again for taking time on this particular thread. Appreciated. --DD

Re: Non-Stored Generated Columns

От
Alvaro Herrera
Дата:
On 2024-Feb-29, Dominique Devienne wrote:

> Honestly, I'm not sure why supporting the non-stored variant of generated
> columns is so controversial...

I don't think there's anything controversial about virtual generated
columns, really ... it's just that it's tricky to implement and we don't
have it yet.  I have interest in working on something related that might
allow us to implement virtual generated columns later, but it's not for
the near future.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Most hackers will be perfectly comfortable conceptualizing users as entropy
 sources, so let's move on."                               (Nathaniel Smith)
      https://mail.gnu.org/archive/html/monotone-devel/2007-01/msg00080.html



Re: Non-Stored Generated Columns

От
Peter Eisentraut
Дата:
On 28.02.24 16:09, Dominique Devienne wrote:
> We use generated columns extensively.
> And we have foreign-keys attached to those generated columns.
> The fact they are always Stored thus wastes space in our case.
> Any chance PostgreSQL might gain actual virtual / non-stored generated 
> columns soon? Ever?

I plan to work on this for PG18.



Re: Non-Stored Generated Columns

От
Dominique Devienne
Дата:
On Thu, Feb 29, 2024 at 11:26 AM Peter Eisentraut <peter@eisentraut.org> wrote:
On 28.02.24 16:09, Dominique Devienne wrote:
> Any chance PostgreSQL might gain actual virtual / non-stored generated
> columns soon? Ever?

I plan to work on this for PG18.

Thanks for the update, Peter and Alvaro. --DD 

Re: Non-Stored Generated Columns

От
Laurenz Albe
Дата:
On Thu, 2024-02-29 at 10:55 +0100, Dominique Devienne wrote:
> On Thu, Feb 29, 2024 at 10:03 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> Honestly, I'm not sure why supporting the non-stored variant of generated
> columns is so controversial...
>  
> > I am sure there are some use cases for "virtual" generated columns, and
> > I don't think that a patch that tries to implement them will be rejected
> > out of hand.  It's just that I don't think it is a very important feature.
>
> Fair enough. And a reaction I expected when I first posted.
> The outright rejection of it ever being useful, that's what surprised me.


I don't think anybody rejected the idea unconditionally.

But there is always a trade-off between the maintenance burden and the
usefulness of a feature.  All that was said is that the usefulness is low.

> > You could use conditional indexes, but then you have to make sure that
> > the optimizer knows it can use these indexes.
>
> I'm not following. Are you saying the planner is not good at that on its own?
> I need to do something from my end???

I wasn't sure, but now I tested: a conditional index can also be used
by a cascading delete or update.  So that's not a worry.

> > The sum of the sizes of these indexes shouldn't exceed the size of an
> > unconditional index by much, but they would still be more expensive:
> > each statement would have to look through all the indexes to decide
> > which ones it can use and which ones not.
>
> Something I maybe didn't make clear. The XArc virtual columns are never accessed.

Yes, they are.  The query planner considers all indexes.

Yours,
Laurenz Albe



Re: Non-Stored Generated Columns

От
Dominique Devienne
Дата:
On Thu, Feb 29, 2024 at 11:58 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Thu, 2024-02-29 at 10:55 +0100, Dominique Devienne wrote:
> On Thu, Feb 29, 2024 at 10:03 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > You could use conditional indexes, but then you have to make sure that
> > the optimizer knows it can use these indexes.
>
> I'm not following. Are you saying the planner is not good at that on its own?
> I need to do something from my end???

I wasn't sure, but now I tested: a conditional index can also be used
by a cascading delete or update.  So that's not a worry.

Great. Thanks a bunch for confirming for me!
 
> Something I maybe didn't make clear. The XArc virtual columns are never accessed.

Yes, they are.  The query planner considers all indexes.

Not sure if I'm missing some terminology or something, to understand your point.
The XArc columns are never explicitly SELECT'd by our code.
Nor are they used in any of our WHERE clauses.

So yes, the additional indexes our PFK pattern introduces make the pool of indexes considered larger.
But I'm sure indexes on columns "not used at all in a statement" are eliminated early and easily/cheaply,
w/o even getting into more complex consideration like statistics and co. Aren't they? Thanks, --DD

Re: Non-Stored Generated Columns

От
Laurenz Albe
Дата:
On Thu, 2024-02-29 at 12:07 +0100, Dominique Devienne wrote:
> But I'm sure indexes on columns "not used at all in a statement" are eliminated early and easily/cheaply,
> w/o even getting into more complex consideration like statistics and co. Aren't they?

You may want a "SELECT count(*) FROM tab" to consider an index-only
scan on a small index, right?  I'm not saying that it is a large
overhead, but if you actually have dozens of indexes, it can
make processing slower.

Yours,
Laurenz Albe