Re: WIP: System Versioned Temporal Table

Поиск
Список
Период
Сортировка
От Trevor Gross
Тема Re: WIP: System Versioned Temporal Table
Дата
Msg-id CACEJhhUZN_QgQUHSWGjSuW-RcL3exDeFxwBCHWk5dpuTL4YNDg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WIP: System Versioned Temporal Table  (Daniel Gustafsson <daniel@yesql.se>)
Ответы Re: WIP: System Versioned Temporal Table  (Corey Huinker <corey.huinker@gmail.com>)
Список pgsql-hackers
Chiming in as a user, not so much a developer - I've been using system
versioned tables in MariaDB for about half a year now, would just like
to add some feedback about what they did right and wrong and how PG
could learn from their mistakes & successes.

> 2. Putting data in a side table. This makes DROP SYSTEM VERSIONING
> fairly trivial, but it complicates many DDL commands (please make a
> list?) and requires the optimizer to know about this and cater to it,
> possibly complicating plans. Neither issue is insurmountable, but it
> becomes more intrusive.

I'd vouch for this being the way to go; you completely sidestep issues
like partitioning, unique constraints, optimization, etc. Especially
true when 90% of the time, SELECTs will only be looking at
currently-active data. MDB seems to have gone with the single-table
approach (unless you partition) and I've run into a bug where I can't
add a unique constraint because historical data fails.

#### System versioning & Application versioning
I saw that there is an intent to harmonize system versioning with
application versioning. Haven't read the AV thread so not positive if
that meant intending to split tables by application versioning and
system versioning both: to me it seems like maybe it would be good to
use a separate table for SV, but keep AV in the same table. Reasons
include:

- ISO states only one AV config per table, but there's no reason this
always has to be the case; maybe you're storing products that are
active for a period of time, EOL for a period of time, and obsolete
for a period of time. If ISO sometime decides >1 AV config is OK,
there would be a mess trying to split that into tables.
- DB users who are allowed to change AV items likely won't be allowed
to rewrite history by changing SV items. My proposed schema would keep
these separate.
- Table schemas change, and all (SV active) AV items would logically
need to fit the active schema or be updated to do so. Different story
for SV, nothing there should ever need to be changed.
- Partitioning for AV tables isn't as clear as with SV and is likely
better to be user-defined

Sorry for acronyms, SV=system versioning, AV=application versioning

In general, I think AV should be treated literally as extra rows in
the main DB, plus the extra PK element and shortcut functions. SV
though, needs to have a lot more nuance.

#### ALTER TABLE
On to ideas about how ALTER TABLE could work. I don't think the
question was ever answered "Do schema changes need to be tracked?" I'm
generally in favor of saying that it should be possible to recreate
the table exactly as it was, schema and all, at a specific period of
time (perhaps for a view) using a fancy combination of SELECT ... AS
and such - but it doesn't need to be straightforward. In any case, no
data should ever be deleted by ALTER TABLE. As someone pointed out
earlier, speed and storage space of ALTER TABLE are likely low
considerations for system versioned tables.

- ADD COLUMN easy, add the column to both the current and historical
table, all null in historical
- DROP COLUMN delete the column from the current table. Historical is
difficult, because what happens if a new column with the same name is
added? Maybe `DROP COLUMN col1` would rename col1 to _col1_1642929683
(epoch time) in the historical table or something like that.
- RENAME COLUMN is a bit tricky too - from a usability standpoint, the
historical table should be renamed as well. A quick thought is maybe
`RENAME col1 TO new_name` would perform the rename in the historical
table, but also create _col1_1642929683 as an alias to new_name to
track that there was a change. I don't think there would be any name
violations in the history table because there would never be a column
name in history that isn't in current (because of the rename described
with DROP).
- Changing column data type: ouch. This needs to be mainly planned for
cases where data types are incompatible, possibly optimized for times
when they are compatible. Seems like another _col1_1642929683 rename
would be in order, and a new col1 created with the new datatype, and a
historical SELECT would automatically merge the two. Possible
optimization: if the old type fits into the new type, just change the
data type in history and make _col1_1642929683 an alias to it.
- Change defaults, nullability, constraints, etc: I think these can
safely be done for the current table only. Realistically, historical
tables could probably skip all checks, always (except their tuple PK),
since trying to enforce them would just be opening the door to bugs.
Trying to think of any times this isn't true.
- FKs: I'm generally in the same boat as above, thinking that these
don't need to affect historical tables. Section 2.5 in the paper I
link below discusses period joins, but I don't think any special
behavior is needed for now. Perhaps references could be kept in
history but not enforced
- Changing PK / adding/removing more columns to PK: Annoying and not
easily dealt with. Maybe just disallow
- Triggers: no affect on historical
- DROP TABLE bye bye, history & all

Things like row level security add extra complication but can probably
be disregarded. Maybe just have a `select history` permission or
similar.

An interesting idea could be to automatically add system versioning to
information_schema whenever it is added to a table. This would provide
a way to easily query historical DDL. It would also help solve how to
keep historical FKs. This would make it possible to perfectly recreate
system versioned parts of your database at any period of time, schema
and data both.

#### Partitioning
Allowing for partitioning and automatic rotation seems like a good
idea, should be possible with current syntax but maybe worth adding
some shortcuts like maria has.

#### Permissions
- MDB has the new 'delete history' schema privilege that defines who
can delete historical data before a certain time or drop system
versioning, seems like a good idea to implement. They also require
`@@system_versioning_alter_history=keep;` to be set before doing
anything ALTER TABLE; doesn't do much outside of serving as a reminder
that changing system versioned tables can be dangerous.¯\_(ツ)_/¯
- This part sucks and goes against everything ISO is going for, but
IMO there needs to be a way to insert/update/delete historical data.
Maybe there needs to be a new superduperuser role to do it and you
need to type the table name backwards to verify you want to insert,
but situations like data migration, fixing incorrectly stored data, or
removing accidental sensitive information demand it. This isn't a
priority though, and basic system versioning can be shipped without
it.

#### Misc
- Seems like a good idea to include MDB's option to exclude columns
from versioning (`WITHOUT SYSTEM VERSIONING` as a column argument).
This is relatively nuanced and I'm not sure if it's officially part of
ISO, but probably helpful for frequently updating small data in rows
with BLOBs. Easy enough to implement, just forget the column in the
historical table.
- I thought I saw somewhere that somebody was discussing adding both
row_start and row_end to the PK. Why would this be? Row_end should be
all that's needed to keep unique, but maybe I misread.

#### Links
- I haven't seen it linked here yet but this paper does a phenomenal
deep dive into SV and AV
https://sigmodrecord.org/publications/sigmodRecord/1209/pdfs/07.industry.kulkarni.pdf
- It's not perfect, but MDB's system versioning is pretty well thought
out. You get a good idea of their thought process going through this
page, worth a read
https://mariadb.com/kb/en/system-versioned-tables/#excluding-columns-from-versioning

#### Finally, the end
There's a heck of a lot of thought that could go into this thing,
probably worth making sure there's a formal agreement on what to be
done before coding starts (PGEP for postgres enhancement proposal,
like PEP? Not sure if something like that exists but it probably
should.). Large parts of the existing patch could likely be reused for
whatever is decided.

Best,
Trevor


On Sun, Jan 23, 2022 at 2:47 AM Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 15 Nov 2021, at 11:50, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>
> > I have no plans on taking this patch further, but will give some help
> > to anyone that wishes to do that.
> >
> > I suggest we Return with Feedback.
>
> Fair enough, done that way.
>
> --
> Daniel Gustafsson               https://vmware.com/
>
>
>
>
>



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrey Borodin
Дата:
Сообщение: Re: [PATCH] reduce page overlap of GiST indexes built using sorted method
Следующее
От: Yura Sokolov
Дата:
Сообщение: Re: Fix BUG #17335: Duplicate result rows in Gather node