Обсуждение: Is there a good way to handle sum types (or tagged unions) in PostgreSQL?

Поиск
Список
Период
Сортировка

Is there a good way to handle sum types (or tagged unions) in PostgreSQL?

От
Victor Nordam Suadicani
Дата:
Hi,

Is there any nice way to handle sum types (aka tagged unions) in a PostgreSQL database? I've searched far and wide and have not reached any satisfying answer.

As a (somewhat contrived) example, say I have the following enum in Rust:

enum TaggedUnion {
    Variant1(String),
    Variant2(i32),
    Variant3(f64),
}

How might I best save this data in a PostgreSQL database? There is to my knowledge unfortunately no way to "natively" handle sum types like this.

One method would be to have 3 different tables, one for each variant. This is not a great solution as you can't (as far as I know) easily query for all variants at once (for instance, to serialize into a Vec<TaggedUnion> on the Rust side).

Another method would be to use PostgreSQL table inheritance. This has the same problem as the above workaround but also has the issue that you could put something into the supertable without having a corresponding entry in any of the subtables (basically a value not inhabiting any of the variants, which is nonsense).

A third method would be to save all fields of all variants into a single table, with all fields being nullable. So you'd have a nullable text field, nullable integer and nullable double precision field. You'd then need an additional tag field to indicate which variant of the union is used and you'd have to write check constraints for each variant to ensure that all the fields in that variant are not null and all the fields not in that variant are null. This almost works, but has two major problems:

1. It wastes space. In Rust, an enum is only as big as its largest variant. Using this method, a table row would be as big as the sum of all the variants.

2. Querying the data is very cumbersome, as there is no way to indicate (beyond check constraints) that, given a specific tag, certain other fields must be filled while certain other fields must not be. For instance, the nullability of fields can be used to serialize into the Option type in Rust. There is no "nice" way to tell the host language that the nullability of the variant fields is hinged on the value of the extra tag field.

Both of these problems get bigger and bigger as you add more variants - it doesn't scale well.

Does anyone know of better methods? I realize the example enum is contrived but this kind of thing comes up a lot in all kinds of places. For instance different kinds of messages that all have various different associated data. Or different events that record different additional data. Sum types are ubiquitous in languages like Rust and Haskell.

If there are no good methods of handling this, is there any way PostgreSQL could be extended with capabilities for this? I have no idea how this would be done in practice though. Perhaps SQL itself is just unsuited for data of this kind? I don't really see why it should be though.

Thanks,
Victor Nordam Suadicani

Re: Is there a good way to handle sum types (or tagged unions) in PostgreSQL?

От
Adrian Klaver
Дата:
On 5/18/23 05:27, Victor Nordam Suadicani wrote:
> Hi,
> 
> Is there any nice way to handle sum types (aka tagged unions) in a 
> PostgreSQL database? I've searched far and wide and have not reached any 
> satisfying answer.
> 
> As a (somewhat contrived) example, say I have the following enum in Rust:
> 
> enum TaggedUnion {
>      Variant1(String),
>      Variant2(i32),
>      Variant3(f64),
> }
> 

> If there are no good methods of handling this, is there any way 
> PostgreSQL could be extended with capabilities for this? I have no idea 
> how this would be done in practice though. Perhaps SQL itself is just 
> unsuited for data of this kind? I don't really see why it should be though.

Composite type?:

https://www.postgresql.org/docs/current/rowtypes.html
> 
> Thanks,
> Victor Nordam Suadicani

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Is there a good way to handle sum types (or tagged unions) in PostgreSQL?

От
Victor Nordam Suadicani
Дата:
A composite type is a product type, not a sum type. PostgreSQL currently has great support for product types, but basically no support for sum types. From the perspective of algebraic data types, this feels like a "missing link" in the type system. I'm not sure why SQL or the underlying relational model has never addressed this deficiency. Would greatly appreciate any insight anyone may have.

On Thu, 18 May 2023 at 16:35, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 5/18/23 05:27, Victor Nordam Suadicani wrote:
> Hi,
>
> Is there any nice way to handle sum types (aka tagged unions) in a
> PostgreSQL database? I've searched far and wide and have not reached any
> satisfying answer.
>
> As a (somewhat contrived) example, say I have the following enum in Rust:
>
> enum TaggedUnion {
>      Variant1(String),
>      Variant2(i32),
>      Variant3(f64),
> }
>

> If there are no good methods of handling this, is there any way
> PostgreSQL could be extended with capabilities for this? I have no idea
> how this would be done in practice though. Perhaps SQL itself is just
> unsuited for data of this kind? I don't really see why it should be though.

Composite type?:

https://www.postgresql.org/docs/current/rowtypes.html
>
> Thanks,
> Victor Nordam Suadicani

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Is there a good way to handle sum types (or tagged unions) in PostgreSQL?

От
Dominique Devienne
Дата:
On Thu, May 18, 2023 at 2:28 PM Victor Nordam Suadicani <v.n.suadicani@gmail.com> wrote:
Is there any nice way to handle sum types (aka tagged unions) in a PostgreSQL database? [...]
A third method would be to save all fields of all variants into a single table, with all fields being nullable.
So you'd have a nullable text field, nullable integer and nullable double precision field.
 
Yes, we do that. That's IMHO the only sane way to do it.
And if some of those alternatives are FKs (relations), that's the only choice.

You'd then need an additional tag field to indicate which variant of the union is used

No, you don't need it. That's implicit from the NULL'ability of the alternative columns.
If you want, you can have it as a generated column, thus read-only.
Worse, having it as an explicit column would make it denormalized, and possibly out of sync.
 
and you'd have to write check constraints for each variant to ensure that all the fields in that variant are not null and all the fields not in that variant are null.

Yes indeed. 
 
This almost works, but has two major problems:

1. It wastes space. In Rust, an enum is only as big as its largest variant. Using this method, a table row would be as big as the sum of all the variants.

Not really, or not to a point it matters that much.
I don't know about the actual physical bytes on disk for PostgreSQL, but as an example in SQLite,
all columns have *at least* 1 "header" byte per value, and NULL values (and 0 and 1) have no "payload" bytes.
In PostgreSQL (which is more "rigidly typed" as DRH would say :)) you may waste space for primitive types,
but not for text and bytea, which is where it really matters IMHO.
 
2. Querying the data is very cumbersome, [...].

Sure, it's cumbersome. But I don't get your point here. NULL handling is part of SQL.
And sum-types (variants) implemented via exclusive NULL'ness is just a special case.
You "dispatch" to the proper column on writes. You read all alternatives and assign the one (if any) NOT NULL to the variant.
 
Both of these problems get bigger and bigger as you add more variants - it doesn't scale well.

ORMs cannot magically resolve the impedence mismatch between SQL and OO-based or sum-type based type systems a la Rust (and co).
If you need SQL, you need to design for SQL for the get go. Not shoehorn your Rust data model into SQL.

My $0.02.

Re: Is there a good way to handle sum types (or tagged unions) in PostgreSQL?

От
Victor Nordam Suadicani
Дата:
On Fri, 19 May 2023 at 12:44, Dominique Devienne <ddevienne@gmail.com> wrote:
On Thu, May 18, 2023 at 2:28 PM Victor Nordam Suadicani <v.n.suadicani@gmail.com> wrote:
Is there any nice way to handle sum types (aka tagged unions) in a PostgreSQL database? [...]
A third method would be to save all fields of all variants into a single table, with all fields being nullable.
So you'd have a nullable text field, nullable integer and nullable double precision field.
 
Yes, we do that. That's IMHO the only sane way to do it.
And if some of those alternatives are FKs (relations), that's the only choice.

You'd then need an additional tag field to indicate which variant of the union is used

No, you don't need it. That's implicit from the NULL'ability of the alternative columns.
If you want, you can have it as a generated column, thus read-only.
Worse, having it as an explicit column would make it denormalized, and possibly out of sync.
 
and you'd have to write check constraints for each variant to ensure that all the fields in that variant are not null and all the fields not in that variant are null.

Yes indeed. 
 
This almost works, but has two major problems:

1. It wastes space. In Rust, an enum is only as big as its largest variant. Using this method, a table row would be as big as the sum of all the variants.

Not really, or not to a point it matters that much.
I don't know about the actual physical bytes on disk for PostgreSQL, but as an example in SQLite,
all columns have *at least* 1 "header" byte per value, and NULL values (and 0 and 1) have no "payload" bytes.
In PostgreSQL (which is more "rigidly typed" as DRH would say :)) you may waste space for primitive types,
but not for text and bytea, which is where it really matters IMHO.
 
2. Querying the data is very cumbersome, [...].

Sure, it's cumbersome. But I don't get your point here. NULL handling is part of SQL.
And sum-types (variants) implemented via exclusive NULL'ness is just a special case.
You "dispatch" to the proper column on writes. You read all alternatives and assign the one (if any) NOT NULL to the variant.
 
Both of these problems get bigger and bigger as you add more variants - it doesn't scale well.

ORMs cannot magically resolve the impedence mismatch between SQL and OO-based or sum-type based type systems a la Rust (and co).
If you need SQL, you need to design for SQL for the get go. Not shoehorn your Rust data model into SQL.

My $0.02.

Thanks for the perspective :)

> If you need SQL, you need to design for SQL for the get go. Not shoehorn your Rust data model into SQL.

Sometimes the data in the domain really does fit a sum type and then a sum type is the right tool to use (whether you use Rust or Haskell or whatever language). Trying to shoehorn your domain data model into a data format that doesn't fit isn't the way to go either. I feel like it's a deficiency in SQL that there is no support for sum types. I would guess this is influenced by the fact that SQL was developed in a time when there were no major programming languages with sum type support either.

But really it's not that I "need" SQL per se, it's just that SQL databases are the most developed and used at this time. Do you know of any other production-grade databases that actually support sum types in a better way than SQL? I'd be very curious cause I haven't really found any.

Re: Is there a good way to handle sum types (or tagged unions) in PostgreSQL?

От
"Peter J. Holzer"
Дата:
On 2023-05-19 16:55:00 +0200, Victor Nordam Suadicani wrote:
> Thanks for the perspective :)
>
> > If you need SQL, you need to design for SQL for the get go. Not shoehorn your
> >  Rust data model into SQL.
>
> Sometimes the data in the domain really does fit a sum type and then a sum type
> is the right tool to use (whether you use Rust or Haskell or whatever
> language). Trying to shoehorn your domain data model into a data format that
> doesn't fit isn't the way to go either. I feel like it's a deficiency in SQL
> that there is no support for sum types. I would guess this is influenced by the
> fact that SQL was developed in a time when there were no major programming
> languages with sum type support either.

That may have been a reason. It is worth noting, however that (at least
some versions of) entity-relationship diagrams do have graphical
symbols for "exactly/at most one of these attributes must be set". This
is very similar to the example you gave, where each of the three
variants had a name (and very different to e.g. TypeScript, where you
can just define a variable to be of type «string | number | Date»)

The way this is implemented is quite straightforward, as Dominique
wrote: You have several nullable columns and a constraint which ensures
that only one is not null.

For example

    create table alternatetest (
        id serial primary key,
        name text,
        i int, f float, t text, -- these three form a union/sum type
        check (
            (i is not null)::int + (f is not null)::int + (t is not null)::int = 1
        )
    );

The space overhead is very small (in PostgreSQL it's a single bit per
nullable column).

An ORM for Rust should be able to read the three columns and stuff them
into a single variable of sum type. I don't know Rust well enough if the
other direction is possible automatically (does the compiler and/or the
run-time system keep track which variant is currently valid?), but even
if it isn't, surely the application must have that information and be
able to pass it into the ORM.

If you aren't using an ORM but using SQL directly (as I prefer to do as
I find that ORMs are just another leaky abstraction layer which makes
stuff harder instead of easier) then you have to code that yourself, but
you can probably centralize that somewhere and the rest of your code
will be blissfully unaware.

(Of course you can stuff those values in a single column of JSONB type.
But I don't think this is better.)

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения