Обсуждение: WIP: System Versioned Temporal Table
Hi all ,
Temporal table is one of the main new features added in sql standard 2011. From that I will like to implement system versioned temporal table which allows to keep past and present data so old data can be queried. Am propose to implement it like below
CREATE
In create table only one table is create and both historical and current data will be store in it. In order to make history and current data co-exist row end time column will be added implicitly to primary key. Regarding performance one can partition the table by row end time column order to make history data didn't slowed performance.
INSERT
In insert row start time column and row end time column behave like a kind of generated stored column except they store current transaction time and highest value supported by the data type which is +infinity respectively.
DELETE and UPDATE
The old data is inserted with row end time column seated to current transaction time
SELECT
If the query didn’t contain a filter condition that include system time column, a filter condition will be added in early optimization that filter history data.
Attached is WIP patch that implemented just the above and done on top of commit b8e19b932a99a7eb5a. Temporal clause didn’t implemented yet so one can use regular filter condition for the time being
NOTE: I implement sql standard syntax except it is PERIOD FOR SYSTEM TIME rather than PERIOD FOR SYSTEM_TIME in CREATE TABLE statement and system time is not selected unless explicitly asked
Any enlightenment?
regards
Surafel
Вложения
On 23/10/2019 17:56, Surafel Temesgen wrote: > > Hi all , > > Temporal table is one of the main new features added in sql standard > 2011. From that I will like to implement system versioned temporal > table which allows to keep past and present data so old data can be > queried. > Excellent! I've been wanting this feature for a long time now. We're the last major database to not have it. I tried my hand at doing it in core, but ended up having better success at an extension: https://github.com/xocolatl/periods/ > Am propose to implement it like below > > CREATE > > In create table only one table is create and both historical and > current data will be store in it. In order to make history and current > data co-exist row end time column will be added implicitly to primary > key. Regarding performance one can partition the table by row end time > column order to make history data didn't slowed performance. > If we're going to be implicitly adding stuff to the PK, we also need to add that stuff to the other unique constraints, no? And I think it would be better to add both the start and the end column to these keys. Most of the temporal queries will be accessing both. > INSERT > > In insert row start time column and row end time column behave like a > kind of generated stored column except they store current transaction > time and highest value supported by the data type which is +infinity > respectively. > You're forcing these columns to be timestamp without time zone. If you're going to force a datatype here, it should absolutely be timestamp with time zone. However, I would like to see it handle both kinds of timestamps as well as a simple date. > DELETE and UPDATE > > The old data is inserted with row end time column seated to current > transaction time > I don't see any error handling for transaction anomalies. In READ COMMITTED, you can easily end up with a case where the end time comes before the start time. I don't even see anything constraining start time to be strictly inferior to the end time. Such a constraint will be necessary for application-time periods (which your patch doesn't address at all but that's okay). > SELECT > > If the query didn’t contain a filter condition that include system > time column, a filter condition will be added in early optimization > that filter history data. > > Attached is WIP patch that implemented just the above and done on top > of commit b8e19b932a99a7eb5a. Temporal clause didn’t implemented yet > so one can use regular filter condition for the time being > > NOTE: I implement sql standard syntax except it is PERIOD FOR SYSTEM > TIME rather than PERIOD FOR SYSTEM_TIME in CREATE TABLE statement and > system time is not selected unless explicitly asked > Why aren't you following the standard syntax here? > Any enlightenment? > There are quite a lot of typos and other things that aren't written "the Postgres way". But before I comment on any of that, I'd like to see the features be implemented correctly according to the SQL standard.
If we're going to be implicitly adding stuff to the PK, we also need to
add that stuff to the other unique constraints, no? And I think it
would be better to add both the start and the end column to these keys.
Most of the temporal queries will be accessing both.
Why aren't you following the standard syntax here?
> Any enlightenment?
>
There are quite a lot of typos and other things that aren't written "the
Postgres way". But before I comment on any of that, I'd like to see the
features be implemented correctly according to the SQL standard.
On 24/10/2019 16:54, Surafel Temesgen wrote: > > hi Vik, > On Wed, Oct 23, 2019 at 9:02 PM Vik Fearing > <vik.fearing@2ndquadrant.com <mailto:vik.fearing@2ndquadrant.com>> wrote: > > > > If we're going to be implicitly adding stuff to the PK, we also > need to > add that stuff to the other unique constraints, no? And I think it > would be better to add both the start and the end column to these > keys. > Most of the temporal queries will be accessing both. > > > yes it have to be added to other constraint too but adding both system > time > to PK will violate constraint because it allow multiple data in > current data I don't understand what you mean by this. > > > > Why aren't you following the standard syntax here? > > > > because we do have TIME and SYSTEM_P as a key word and am not sure of > whether > its a right thing to add other keyword that contain those two word > concatenated Yes, we have to do that. > > > > > Any enlightenment? > > > > There are quite a lot of typos and other things that aren't > written "the > Postgres way". But before I comment on any of that, I'd like to > see the > features be implemented correctly according to the SQL standard. > > > it is almost in sql standard syntax except the above small difference. > i can correct it > and post more complete patch soon. I don't mean just the SQL syntax, I also mean the behavior.
On 24/10/2019 16:54, Surafel Temesgen wrote:
>
> hi Vik,
> On Wed, Oct 23, 2019 at 9:02 PM Vik Fearing
> <vik.fearing@2ndquadrant.com <mailto:vik.fearing@2ndquadrant.com>> wrote:
>
>
>
> If we're going to be implicitly adding stuff to the PK, we also
> need to
> add that stuff to the other unique constraints, no? And I think it
> would be better to add both the start and the end column to these
> keys.
> Most of the temporal queries will be accessing both.
>
>
> yes it have to be added to other constraint too but adding both system
> time
> to PK will violate constraint because it allow multiple data in
> current data
I don't understand what you mean by this.
On 25/10/2019 11:56, Surafel Temesgen wrote: > > > On Thu, Oct 24, 2019 at 6:49 PM Vik Fearing > <vik.fearing@2ndquadrant.com <mailto:vik.fearing@2ndquadrant.com>> wrote: > > On 24/10/2019 16:54, Surafel Temesgen wrote: > > > > hi Vik, > > On Wed, Oct 23, 2019 at 9:02 PM Vik Fearing > > <vik.fearing@2ndquadrant.com > <mailto:vik.fearing@2ndquadrant.com> > <mailto:vik.fearing@2ndquadrant.com > <mailto:vik.fearing@2ndquadrant.com>>> wrote: > > > > > > > > If we're going to be implicitly adding stuff to the PK, we also > > need to > > add that stuff to the other unique constraints, no? And I > think it > > would be better to add both the start and the end column to > these > > keys. > > Most of the temporal queries will be accessing both. > > > > > > yes it have to be added to other constraint too but adding both > system > > time > > to PK will violate constraint because it allow multiple data in > > current data > > > I don't understand what you mean by this. > > > > The primary purpose of adding row end time to primary key is to allow > duplicate value to be inserted into a table with keeping constraint in > current data but it can be duplicated in history data. Adding row > start time column to primary key will eliminate this uniqueness for > current data which is not correct How? The primary/unique keys must always be unique at every point in time.
>
> I don't understand what you mean by this.
>
>
>
> The primary purpose of adding row end time to primary key is to allow
> duplicate value to be inserted into a table with keeping constraint in
> current data but it can be duplicated in history data. Adding row
> start time column to primary key will eliminate this uniqueness for
> current data which is not correct
How? The primary/unique keys must always be unique at every point in time.
From user prospect it is acceptable to delete and reinsert a record with the same key value multiple time which means there will be multiple record with the same key value in a history data but there is only one values in current data as a table without system versioning do .I add row end time column to primary key to allow user supplied primary key values to be duplicated in history data which is acceptable
On 28/10/2019 13:48, Surafel Temesgen wrote: > > > On Fri, Oct 25, 2019 at 10:45 PM Vik Fearing > <vik.fearing@2ndquadrant.com <mailto:vik.fearing@2ndquadrant.com>> wrote: > > > > > I don't understand what you mean by this. > > > > > > > > The primary purpose of adding row end time to primary key is to > allow > > duplicate value to be inserted into a table with keeping > constraint in > > current data but it can be duplicated in history data. Adding row > > start time column to primary key will eliminate this uniqueness for > > current data which is not correct > > > How? The primary/unique keys must always be unique at every point > in time. > > > From user prospect it is acceptable to delete and reinsert a record > with the same key value multiple time which means there will be > multiple record with the same key value in a history data but there is > only one values in current data as a table without system versioning > do .I add row end time column to primary key to allow user supplied > primary key values to be duplicated in history data which is acceptable > Yes, I understand that. I'm saying you should also add the row start time column.
Вложения
On 01/01/2020 11:50, Surafel Temesgen wrote: > > > Hi, > Attached is a complete patch and also contain a fix for your comments > This does not compile against current head (0ce38730ac). gram.y: error: shift/reduce conflicts: 6 found, 0 expected -- Vik Fearing
This does not compile against current head (0ce38730ac).
gram.y: error: shift/reduce conflicts: 6 found, 0 expected
Вложения
On 03/01/2020 11:57, Surafel Temesgen wrote: > > > On Thu, Jan 2, 2020 at 12:12 AM Vik Fearing > <vik.fearing@2ndquadrant.com <mailto:vik.fearing@2ndquadrant.com>> wrote: > > This does not compile against current head (0ce38730ac). > > > gram.y: error: shift/reduce conflicts: 6 found, 0 expected > > > Rebased and conflict resolved i hope it build clean this time > It does but you haven't included your tests file so `make check` fails. It seems clear to me that you haven't tested it at all anyway. The temporal conditions do not return the correct results, and the syntax is wrong, too. Also, none of my previous comments have been addressed except for "system versioning" instead of "system_versioning". Why? -- Vik Fearing
>
> Rebased and conflict resolved i hope it build clean this time
>
It does but you haven't included your tests file so `make check` fails.
It seems clear to me that you haven't tested it at all anyway. The
temporal conditions do not return the correct results, and the syntax is
wrong, too. Also, none of my previous comments have been addressed
except for "system versioning" instead of "system_versioning". Why?
On 28/10/2019 13:48, Surafel Temesgen wrote:
>
>
> On Fri, Oct 25, 2019 at 10:45 PM Vik Fearing
> <vik.fearing@2ndquadrant.com <mailto:vik.fearing@2ndquadrant.com>> wrote:
>
> >
> > I don't understand what you mean by this.
> >
> >
> >
> > The primary purpose of adding row end time to primary key is to
> allow
> > duplicate value to be inserted into a table with keeping
> constraint in
> > current data but it can be duplicated in history data. Adding row
> > start time column to primary key will eliminate this uniqueness for
> > current data which is not correct
>
>
> How? The primary/unique keys must always be unique at every point
> in time.
>
>
> From user prospect it is acceptable to delete and reinsert a record
> with the same key value multiple time which means there will be
> multiple record with the same key value in a history data but there is
> only one values in current data as a table without system versioning
> do .I add row end time column to primary key to allow user supplied
> primary key values to be duplicated in history data which is acceptable
>
Yes, I understand that. I'm saying you should also add the row start
time column.
On 05/01/2020 11:16, Surafel Temesgen wrote: > > > On Fri, Jan 3, 2020 at 4:22 PM Vik Fearing > <vik.fearing@2ndquadrant.com <mailto:vik.fearing@2ndquadrant.com>> wrote: > > > > > Rebased and conflict resolved i hope it build clean this time > > > > It does but you haven't included your tests file so `make check` > fails. > > > > what tests file? Exactly. > i add system_versioned_table.sql and system_versioned_table.out > test files Those are not included in the patch. <checks again> Okay, that was user error on my side. I apologize. > > > It seems clear to me that you haven't tested it at all anyway. The > temporal conditions do not return the correct results, and the > syntax is > wrong, too. Also, none of my previous comments have been addressed > except for "system versioning" instead of "system_versioning". Why? > > > I also correct typo and add row end column time to unique > key that make it unique for current data. As you mentioned > other comment is concerning about application-time periods > which the patch not addressing . - For performance, you must put the start column in the indexes also. - You only handle timestamp when you should also handle timestamptz and date. - You don't throw 2201H for anomalies > i refer sql 2011 standard for > syntax can you tell me which syntax you find it wrong? Okay, now that I see your tests, I understand why everything is broken. You only test FROM-TO and with a really wide interval. There are no tests for AS OF and no tests for BETWEEN-AND. As for the syntax, you have: select a from for stest0 system_time from '2000-01-01 00:00:00.00000' to 'infinity' ORDER BY a; when you should have: select a from stest0 for system_time from '2000-01-01 00:00:00.00000' to 'infinity' ORDER BY a; That is, the FOR should be on the other side of the table name. In addition, there are many rules in the standard that are not respected here. For example, this query works and should not: CREATE TABLE t (system_time integer) WITH SYSTEM VERSIONING; This syntax is not supported: ALTER TABLE t ADD PERIOD FOR SYSTEM_TIME (s, e) ADD COLUMN s timestamp ADD COLUMN e timestamp; psql's \d does not show that the table is system versioned, and doesn't show the columns of the system_time period. I can drop columns used in the period. Please don't hesitate to take inspiration from my extension that does this. The extension is under the PostgreSQL license for that reason. Take from it whatever you need. https://github.com/xocolatl/periods/ -- Vik Fearing
Vik Fearing-4 wrote > On 05/01/2020 11:16, Surafel Temesgen wrote: >> >> >> On Fri, Jan 3, 2020 at 4:22 PM Vik Fearing >> < > vik.fearing@ > <mailto: > vik.fearing@ > >> wrote: >> > > [...] > > You only test FROM-TO and with a really wide interval. There are no > tests for AS OF and no tests for BETWEEN-AND. > > > As for the syntax, you have: > > > select a from for stest0 system_time from '2000-01-01 00:00:00.00000' to > 'infinity' ORDER BY a; > > > when you should have: > > > select a from stest0 for system_time from '2000-01-01 00:00:00.00000' to > 'infinity' ORDER BY a; > > > That is, the FOR should be on the other side of the table name. > > [...] > > Vik Fearing Hello, I though that standard syntax was "AS OF SYSTEM TIME" as discussed here https://www.postgresql.org/message-id/flat/A254CDC3-D308-4822-8928-8CC584E0CC71%40elusive.cx#06c5dbffd5cfb9a20cdeec7a54dc657f , also explaining how to parse such a syntax . Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
On 05/01/2020 16:01, legrand legrand wrote: > >> As for the syntax, you have: >> >> >> select a from for stest0 system_time from '2000-01-01 00:00:00.00000' to >> 'infinity' ORDER BY a; >> >> >> when you should have: >> >> >> select a from stest0 for system_time from '2000-01-01 00:00:00.00000' to >> 'infinity' ORDER BY a; >> >> >> That is, the FOR should be on the other side of the table name. >> >> [...] >> >> Vik Fearing > Hello, > > I though that standard syntax was "AS OF SYSTEM TIME" > as discussed here > https://www.postgresql.org/message-id/flat/A254CDC3-D308-4822-8928-8CC584E0CC71%40elusive.cx#06c5dbffd5cfb9a20cdeec7a54dc657f > , also explaining how to parse such a syntax . No, that is incorrect. The standard syntax is: FROM tablename FOR SYSTEM_TIME AS OF '...' FROM tablename FOR SYSTEM_TIME BETWEEN '...' AND '...' FROM tablename FOR SYSTEM_TIME FROM '...' TO '...' -- Vik Fearing
Vik Fearing-4 wrote > On 05/01/2020 16:01, legrand legrand wrote: > > > No, that is incorrect. The standard syntax is: > > > FROM tablename FOR SYSTEM_TIME AS OF '...' > > FROM tablename FOR SYSTEM_TIME BETWEEN '...' AND '...' > > FROM tablename FOR SYSTEM_TIME FROM '...' TO '...' > > -- > > Vik Fearing oups, I re-read links and docs and I'm wrong. Sorry for the noise Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Hello. Isn't this patch somehow broken? At Mon, 28 Oct 2019 16:36:09 +0100, Vik Fearing <vik.fearing@2ndquadrant.com> wrote in > On 28/10/2019 13:48, Surafel Temesgen wrote: > > > > > > On Fri, Oct 25, 2019 at 10:45 PM Vik Fearing > > <vik.fearing@2ndquadrant.com <mailto:vik.fearing@2ndquadrant.com>> wrote: > > > > > > > > I don't understand what you mean by this. > > > > > > > > > > > > The primary purpose of adding row end time to primary key is to > > allow > > > duplicate value to be inserted into a table with keeping > > constraint in > > > current data but it can be duplicated in history data. Adding row > > > start time column to primary key will eliminate this uniqueness for > > > current data which is not correct > > > > > > How? The primary/unique keys must always be unique at every point > > in time. > > > > > > From user prospect it is acceptable to delete and reinsert a record > > with the same key value multiple time which means there will be > > multiple record with the same key value in a history data but there is > > only one values in current data as a table without system versioning > > do .I add row end time column to primary key to allow user supplied > > primary key values to be duplicated in history data which is acceptable > > > > Yes, I understand that. I'm saying you should also add the row start > time column. I think that the start and end timestamps represent the period where that version of the row was active. So UPDATE should modify the start timestamp of the new version to the same value with the end timestamp of the old version to the updated time. Thus, I don't think adding start timestamp to PK doesn't work as expected. That hinders us from rejecting rows with the same user-defined unique key because start timestams is different each time of inserts. I think what Surafel is doing in the current patch is correct. Only end_timestamp = +infinity rejects another non-historical (= live) version with the same user-defined unique key. I'm not sure why the patch starts from "0002, but anyway it applied on e369f37086. Then I ran make distclean, ./configure then make all, make install, initdb and started server after all of them. First, I tried to create a temporal table. When I used timestamptz as the type of versioning columns, ALTER, CREATE commands ended with server crash. "CREATE TABLE t (a int, s timestamptz GENERATED ALWAYS AS ROW START, e timestamptz GENERATED ALWAYS AS ROW END);" (CREATE TABLE t (a int);) "ALTER TABLE t ADD COLUMN s timestamptz GENERATED ALWAYS AS ROW START" "ALTER TABLE t ADD COLUMN s timestamptz GENERATED ALWAYS AS ROW START, ADD COLUMN e timestamptz GENERATED ALWAYS AS ROWEND" If I added the start/end timestamp columns to an existing table, it returns uncertain error. ALTER TABLE t ADD COLUMN s timestamp(6) GENERATED ALWAYS AS ROW START; ERROR: column "s" contains null values ALTER TABLE t ADD COLUMN s timestamp(6) GENERATED ALWAYS AS ROW START, ADD COLUMN e timestamp(6) GENERATED ALWAYS AS ROWEND; ERROR: column "s" contains null values When I defined only start column, SELECT on the table crashed. "CREATE TABLE t (s timestamp(6) GENERATED ALWAYS AS ROW START);" "SELECT * from t;" (crashed) The following command ended with ERROR which I cannot understand the cause, but I expected the command to be accepted. ALTER TABLE t ADD COLUMN start timestamp(6) GENERATED ALWAYS AS ROW START, ADD COLUMN end timestamp(6) GENERATED ALWAYSAS ROW END; ERROR: syntax error at or near "end" I didin't examined further but the syntax part doesn't seem designed well, and the body part seems vulnerable to unexpected input. I ran a few queries: SELECT * shows the timestamp columns, don't we need to hide the period timestamp columns from this query? I think UPDATE needs to update the start timestamp, but it doesn't. As the result the timestamps doesn't represent the correct lifetime of the row version and we wouldn't be able to pick up correct versions of a row that exprerienced updates. (I didn't confirmed that because I couldn't do "FOR SYSTEM_TIME AS OF" query due to syntax error..) (Sorry in advance for possible pointless comments due to my lack of access to the SQL11 standard.) If we have the period-timestamp columns before the last two columns, INSERT in a common way on the table fails, which doesn't seem to me to be expected behavior: CREATE TABLE t (s timestamp(6) GENERATED ALWAYS AS ROW START, e timestamp(6) GENERATED ALWAYS AS ROW END, a int) WITH SYSTEMVERSIONING; INSERT INTO t (SELECT a FROM generate_series(0, 99) a); ERROR: column "s" is of type timestamp without time zone but expression is of type integer Some queries using SYSTEM_TIME which I think should be accepted ends with error. Is the grammar part missing something? SELECT * FROM t FOR SYSTEM_TIME AS OF '2020-01-07 09:57:55'; ERROR: syntax error at or near "system_time" LINE 1: SELECT * FROM t FOR SYSTEM_TIME AS OF '2020-01-07 09:57:55'; SELECT * FROM t FOR SYSTEM_TIME BETWEEN '2020-01-07 0:00:00' AND '2020-01-08 0:00:00'; ERROR: syntax error at or near "system_time" LINE 1: SELECT * FROM t FOR SYSTEM_TIME BETWEEN '2020-01-07 0:00:00'... Other random comments (sorry for it not being comprehensive): The patch at worst loops over all columns at every parse time. It is quite ineffecient if there are many columns. We can store the column indexes in relcache. If I'm not missing anything, alter table doesn't properly modify existing data in the target table. AddSystemVersioning should fill in start/end_timestamp with proper values and DropSystemVersioning shuold remove rows no longer useful. +makeAndExpr(Node *lexpr, Node *rexpr, int location) I believe that planner flattenes nested AND/ORs in eval_const_expressions(). Shouldn't we leave the work to the planner? +makeConstraint(ConstrType type) We didn't use such a function to make that kind of nodes. Maybe we should use makeNode directly, or we need to change similar coding into that using the function. Addition to that, setting .location to 1 is wrong. "Unknown" location is -1. Separately from that, temporal clauses is not restriction of a table. So it seems wrong to me to use constraint mechamism for this purpose. +makeSystemColumnDef(char *name) "system column (or attribute)" is a column specially treated outside of tuple descriptor. The temporal-period columns are not system columns in that sense. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 05/01/2020 13:50, Vik Fearing wrote: > Okay, now that I see your tests, I understand why everything is broken. > You only test FROM-TO and with a really wide interval. There are no > tests for AS OF and no tests for BETWEEN-AND. I have started working on some better test cases for you. The attached .sql and .out tests should pass, and they are some of the tests that I'll be putting your next version through. There are many more tests that need to be added. Once all the desired functionality is there, I'll start reviewing the code itself. Keep up the good work, and let me know if I can do anything to help you. -- Vik Fearing
Вложения
Hi Surafel, On 1/3/20 5:57 AM, Surafel Temesgen wrote: > Rebased and conflict resolved i hope it build clean this time This patch no longer applies according to cfbot and there are a number of review comments that don't seem to have been addressed yet. The patch is not exactly new for this CF but since the first version was posted 2020-01-01 and there have been no updates (except a rebase) since then it comes pretty close. Were you planning to work on this for PG13? If so we'll need to see a rebased and updated patch pretty soon. My recommendation is that we move this patch to PG14. Regards, -- -David david@pgmasters.net
On 03/03/2020 19:33, David Steele wrote: > Hi Surafel, > > On 1/3/20 5:57 AM, Surafel Temesgen wrote: >> Rebased and conflict resolved i hope it build clean this time > > This patch no longer applies according to cfbot and there are a number > of review comments that don't seem to have been addressed yet. > > The patch is not exactly new for this CF but since the first version was > posted 2020-01-01 and there have been no updates (except a rebase) since > then it comes pretty close. > > Were you planning to work on this for PG13? If so we'll need to see a > rebased and updated patch pretty soon. My recommendation is that we > move this patch to PG14. I strongly second that motion. -- Vik Fearing
Hello.
Isn't this patch somehow broken?
First, I tried to create a temporal table.
When I used timestamptz as the type of versioning columns, ALTER,
CREATE commands ended with server crash.
"CREATE TABLE t (a int, s timestamptz GENERATED ALWAYS AS ROW START, e timestamptz GENERATED ALWAYS AS ROW END);"
(CREATE TABLE t (a int);)
"ALTER TABLE t ADD COLUMN s timestamptz GENERATED ALWAYS AS ROW START"
"ALTER TABLE t ADD COLUMN s timestamptz GENERATED ALWAYS AS ROW START, ADD COLUMN e timestamptz GENERATED ALWAYS AS ROW END"
If I added the start/end timestamp columns to an existing table, it
returns uncertain error.
ALTER TABLE t ADD COLUMN s timestamp(6) GENERATED ALWAYS AS ROW START;
ERROR: column "s" contains null values
ALTER TABLE t ADD COLUMN s timestamp(6) GENERATED ALWAYS AS ROW START, ADD COLUMN e timestamp(6) GENERATED ALWAYS AS ROW END;
ERROR: column "s" contains null values
When I defined only start column, SELECT on the table crashed.
"CREATE TABLE t (s timestamp(6) GENERATED ALWAYS AS ROW START);"
"SELECT * from t;"
(crashed)
The following command ended with ERROR which I cannot understand the
cause, but I expected the command to be accepted.
ALTER TABLE t ADD COLUMN start timestamp(6) GENERATED ALWAYS AS ROW START, ADD COLUMN end timestamp(6) GENERATED ALWAYS AS ROW END;
ERROR: syntax error at or near "end"
I didin't examined further but the syntax part doesn't seem designed
well, and the body part seems vulnerable to unexpected input.
I ran a few queries:
SELECT * shows the timestamp columns, don't we need to hide the period
timestamp columns from this query?
I think UPDATE needs to update the start timestamp, but it doesn't. As
the result the timestamps doesn't represent the correct lifetime of
the row version and we wouldn't be able to pick up correct versions of
a row that exprerienced updates. (I didn't confirmed that because I
couldn't do "FOR SYSTEM_TIME AS OF" query due to syntax error..)
(Sorry in advance for possible pointless comments due to my lack of
access to the SQL11 standard.) If we have the period-timestamp
columns before the last two columns, INSERT in a common way on the
table fails, which doesn't seem to me to be expected behavior:
CREATE TABLE t (s timestamp(6) GENERATED ALWAYS AS ROW START, e timestamp(6) GENERATED ALWAYS AS ROW END, a int) WITH SYSTEM VERSIONING;
INSERT INTO t (SELECT a FROM generate_series(0, 99) a);
ERROR: column "s" is of type timestamp without time zone but expression is of type integer
ERROR: column "s" is of type timestamp with time zone but expression is of type integer
LINE 1: INSERT INTO t (SELECT a FROM generate_series(0, 99) a);
Some queries using SYSTEM_TIME which I think should be accepted ends
with error. Is the grammar part missing something?
SELECT * FROM t FOR SYSTEM_TIME AS OF '2020-01-07 09:57:55';
ERROR: syntax error at or near "system_time"
LINE 1: SELECT * FROM t FOR SYSTEM_TIME AS OF '2020-01-07 09:57:55';
SELECT * FROM t FOR SYSTEM_TIME BETWEEN '2020-01-07 0:00:00' AND '2020-01-08 0:00:00';
ERROR: syntax error at or near "system_time"
LINE 1: SELECT * FROM t FOR SYSTEM_TIME BETWEEN '2020-01-07 0:00:00'...
Other random comments (sorry for it not being comprehensive):
The patch at worst loops over all columns at every parse time. It is
quite ineffecient if there are many columns. We can store the column
indexes in relcache.
If I'm not missing anything, alter table doesn't properly modify
existing data in the target table. AddSystemVersioning should fill in
start/end_timestamp with proper values and DropSystemVersioning shuold
remove rows no longer useful.
+makeAndExpr(Node *lexpr, Node *rexpr, int location)
I believe that planner flattenes nested AND/ORs in
eval_const_expressions(). Shouldn't we leave the work to the planner?
+makeConstraint(ConstrType type)
We didn't use such a function to make that kind of nodes. Maybe we
should use makeNode directly, or we need to change similar coding into
that using the function. Addition to that, setting .location to 1 is
wrong. "Unknown" location is -1.
Separately from that, temporal clauses is not restriction of a
table. So it seems wrong to me to use constraint mechamism for this
purpose.
+makeSystemColumnDef(char *name)
"system column (or attribute)" is a column specially treated outside
of tuple descriptor. The temporal-period columns are not system
columns in that sense.
Вложения
Hi Surafel,
On 1/3/20 5:57 AM, Surafel Temesgen wrote:
> Rebased and conflict resolved i hope it build clean this time
This patch no longer applies according to cfbot and there are a number
of review comments that don't seem to have been addressed yet.
The patch is not exactly new for this CF but since the first version was
posted 2020-01-01 and there have been no updates (except a rebase) since
then it comes pretty close.
Were you planning to work on this for PG13? If so we'll need to see a
rebased and updated patch pretty soon. My recommendation is that we
move this patch to PG14.
On 3/10/20 9:00 AM, Surafel Temesgen wrote: > On Tue, Mar 3, 2020 at 9:33 PM David Steele <david@pgmasters.net > <mailto:david@pgmasters.net>> wrote: > > The patch is not exactly new for this CF but since the first version > was > posted 2020-01-01 and there have been no updates (except a rebase) > since > then it comes pretty close. > > Were you planning to work on this for PG13? If so we'll need to see a > rebased and updated patch pretty soon. My recommendation is that we > move this patch to PG14. > > I agree with moving to PG14 . Its hard to make it to PG13. The target version is now PG14. Regards, -- -David david@pgmasters.net
Hi,On Tue, Mar 3, 2020 at 9:33 PM David Steele <david@pgmasters.net> wrote:Hi Surafel,
On 1/3/20 5:57 AM, Surafel Temesgen wrote:
> Rebased and conflict resolved i hope it build clean this time
This patch no longer applies according to cfbot and there are a number
of review comments that don't seem to have been addressed yet.
The patch is not exactly new for this CF but since the first version was
posted 2020-01-01 and there have been no updates (except a rebase) since
then it comes pretty close.
Were you planning to work on this for PG13? If so we'll need to see a
rebased and updated patch pretty soon. My recommendation is that we
move this patch to PG14.I agree with moving to PG14 . Its hard to make it to PG13.regardsSurafel
Hi Surafel and the rest,I'm the owner of the Israeli meetup group of PostgreSQL, and I'm interested in Temporality and have been trying for several years a few ways to add it to PostgreSQL(all of them through extensions and external ways).I'm happy that this is done by you internally (and a little bit disappointed that it's delayed again and again, but that's life...).I'll be happy to join this effort.I can't promise that I'll succeed to contribute anything, but first I want to play with it a little.To save me several hours, can you advise me what is the best way to install it?Which exact version of PG should I apply this patch to?Thanks in advance, and thanks for your great work!Eli
Вложения
Hi, thanks for working on this. I had planned to work on it and I’m looking forward to this natively in Postgres.
The patch builds with the following warnings:
plancat.c:2368:18: warning: variable 'name' is used uninitialized whenever 'for' loop exits because its condition is
false[-Wsometimes-uninitialized] 
        for (int i = 0; i < natts; i++)
                        ^~~~~~~~~
plancat.c:2379:9: note: uninitialized use occurs here
        return name;
               ^~~~
plancat.c:2368:18: note: remove the condition if it is always true
        for (int i = 0; i < natts; i++)
                        ^~~~~~~~~
plancat.c:2363:15: note: initialize the variable 'name' to silence this warning
        char       *name;
                        ^
                         = NULL
plancat.c:2396:18: warning: variable 'name' is used uninitialized whenever 'for' loop exits because its condition is
false[-Wsometimes-uninitialized] 
        for (int i = 0; i < natts; i++)
                        ^~~~~~~~~
plancat.c:2407:9: note: uninitialized use occurs here
        return name;
               ^~~~
plancat.c:2396:18: note: remove the condition if it is always true
        for (int i = 0; i < natts; i++)
                        ^~~~~~~~~
plancat.c:2391:15: note: initialize the variable 'name' to silence this warning
        char       *name;
                        ^
                         = NULL
2 warnings generated.
make check pass without issues, but make check-world fails for postgres_fdw, the diff is attached.
Before going further in the review, I’m a bit surprised by the quantity of code needed here. In
https://github.com/xocolatl/periodsthere is far less code and I would have expected the same here. For example, are the
changesto copy necessary or would it be possible to have a first patch the only implement the minimal changes required
forthis feature? 
Thanks a lot!
Rémi
			
		Вложения
Hi, thanks for working on this. I had planned to work on it and I’m looking forward to this natively in Postgres.
The patch builds with the following warnings:
plancat.c:2368:18: warning: variable 'name' is used uninitialized whenever 'for' loop exits because its condition is false [-Wsometimes-uninitialized]
for (int i = 0; i < natts; i++)
^~~~~~~~~
plancat.c:2379:9: note: uninitialized use occurs here
return name;
^~~~
plancat.c:2368:18: note: remove the condition if it is always true
for (int i = 0; i < natts; i++)
^~~~~~~~~
plancat.c:2363:15: note: initialize the variable 'name' to silence this warning
char *name;
^
= NULL
plancat.c:2396:18: warning: variable 'name' is used uninitialized whenever 'for' loop exits because its condition is false [-Wsometimes-uninitialized]
for (int i = 0; i < natts; i++)
^~~~~~~~~
plancat.c:2407:9: note: uninitialized use occurs here
return name;
^~~~
plancat.c:2396:18: note: remove the condition if it is always true
for (int i = 0; i < natts; i++)
^~~~~~~~~
plancat.c:2391:15: note: initialize the variable 'name' to silence this warning
char *name;
^
= NULL
2 warnings generated.
make check pass without issues, but make check-world fails for postgres_fdw, the diff is attached.
Before going further in the review, I’m a bit surprised by the quantity of code needed here. In https://github.com/xocolatl/periods there is far less code and I would have expected the same here. For example, are the changes to copy necessary or would it be possible to have a first patch the only implement the minimal changes required for this feature?
Вложения
On Tue, Jul 21, 2020 at 05:32:44PM +0300, Surafel Temesgen wrote: > Thank you for looking at it The patch is failing to apply. Could you send a rebase please? -- Michael
Вложения
The patch is failing to apply. Could you send a rebase please?
Вложения
Hi, just a quick comment that this patch fails on the cfbot. Cheers, //Georgios
Attached is a rebased one.regardsSurafel
gram.y:16695:1: error: static declaration of ‘makeAndExpr’ follows non-static declarationmakeAndExpr(Node *lexpr, Node *rexpr, int location)^~~~~~~~~~~In file included from gram.y:58:0:../../../src/include/nodes/makefuncs.h:108:14: note: previous declaration of ‘makeAndExpr’ was hereextern Node *makeAndExpr(Node *lexpr, Node *rexpr, int location);^~~~~~~~~~~gram.y:16695:1: warning: ‘makeAndExpr’ defined but not used [-Wunused-function]makeAndExpr(Node *lexpr, Node *rexpr, int location)^~~~~~~~~~~
On Thu, Nov 19, 2020 at 11:04 AM Surafel Temesgen <surafel3000@gmail.com> wrote:The docs have two instances of "EndtTime" that should be "EndTime".
Hi Surafel, On Tue, Dec 22, 2020 at 3:01 AM Surafel Temesgen <surafel3000@gmail.com> wrote: > > Hi Ryan, > > On Fri, Dec 18, 2020 at 10:28 PM Ryan Lambert <ryan@rustprooflabs.com> wrote: >> >> On Thu, Nov 19, 2020 at 11:04 AM Surafel Temesgen <surafel3000@gmail.com> wrote: >> >> The docs have two instances of "EndtTime" that should be "EndTime". > > > Since my first language is not english i'm glad you find only this error > on doc. I will send rebased pach soon > The patch is not submitted yet. Are you planning to submit the updated patch? Please also note the v7 patch cannot be applied to the current HEAD. I'm switching the patch as Waiting on Author. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
On Mon, Jan 4, 2021 at 2:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Please also note the v7 patch cannot be applied to the current HEAD. I'm switching the patch as Waiting on Author. Surafel, please say whether you are working on this or not. If you need help, let us know. On Tue, 7 Jan 2020 at 10:33, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > I think that the start and end timestamps represent the period where > that version of the row was active. So UPDATE should modify the start > timestamp of the new version to the same value with the end timestamp > of the old version to the updated time. Thus, I don't think adding > start timestamp to PK doesn't work as expected. That hinders us from > rejecting rows with the same user-defined unique key because start > timestamps is different each time of inserts. I think what Surafel is > doing in the current patch is correct. Only end_timestamp = +infinity > rejects another non-historical (= live) version with the same > user-defined unique key. The end_time needs to be updated when a row is updated, so it cannot form part of the PK. If you try to force that to happen, then logical replication will not work with system versioned tables, which would be a bad thing. So *only* start_time should be added to the PK to make this work. (A later comment also says the start_time needs to be updated, which makes no sense!) On Wed, 23 Oct 2019 at 21:03, Vik Fearing <vik.fearing@2ndquadrant.com> wrote: > I don't see any error handling for transaction anomalies. In READ > COMMITTED, you can easily end up with a case where the end time comes > before the start time. I don't even see anything constraining start > time to be strictly inferior to the end time. Such a constraint will be > necessary for application-time periods (which your patch doesn't address > at all but that's okay). I don't see how it can have meaning to have an end_time earlier than a start_time, so yes that should be checked. Having said that, if we use a statement timestamp on row insertion then, yes, the end_time could be earlier than start time, so that is just wrong. Ideally we would use commit timestamp and fill the values in later. So the only thing that makes sense for me is to use the dynamic time of insertion while we hold the buffer lock, otherwise we will get anomalies. The work looks interesting and I will be doing a longer review. -- Simon Riggs http://www.EnterpriseDB.com/
On Thu, Jan 7, 2021 at 5:59 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > On Mon, Jan 4, 2021 at 2:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Please also note the v7 patch cannot be applied to the current HEAD. I'm switching the patch as Waiting on Author. > > Surafel, please say whether you are working on this or not. If you > need help, let us know. I've minimally rebased the patch to current head so that it compiles and passes current make check. From here, I will add further docs and tests to enhance review and discover issues. -- Simon Riggs http://www.EnterpriseDB.com/
Вложения
On Fri, Jan 8, 2021 at 7:13 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > I've minimally rebased the patch to current head so that it compiles > and passes current make check. Full version attached -- Simon Riggs http://www.EnterpriseDB.com/
Вложения
On Fri, Jan 8, 2021 at 7:34 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > On Fri, Jan 8, 2021 at 7:13 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > > I've minimally rebased the patch to current head so that it compiles > > and passes current make check. > > Full version attached New version attached with improved error messages, some additional docs and a review of tests. * UPDATE doesn't set EndTime correctly, so err... the patch doesn't work on this aspect. Everything else does actually work, AFAICS, so we "just" need a way to update the END_TIME column in place... So input from other Hackers/Committers needed on this point to see what is acceptable. * Anomalies around use of CURRENT_TIMESTAMP are not discussed or resolved * No discussion, comments or tests around freezing and whether that causes issues here * What happens if you ask for a future time? It will give an inconsistent result as it scans, so we should refuse a query for time > current_timestamp. * ALTER TABLE needs some work, it's a bit klugey at the moment and needs extra tests. Should refuse DROP COLUMN on a system time column * Do StartTime and EndTime show in SELECT *? Currently, yes. Would guess we wouldn't want them to, not sure what standard says. * The syntax changes in gram.y probably need some coralling Overall, it's a pretty good patch and worth working on more. I will consider a recommendation on what to do with this. -- Simon Riggs http://www.EnterpriseDB.com/
Вложения
On 1/8/21 7:33 AM, Simon Riggs wrote: > > * What happens if you ask for a future time? > It will give an inconsistent result as it scans, so we should refuse a > query for time > current_timestamp. That seems like a significant limitation. Can we fix it instead of refusing the query? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Fri, Jan 8, 2021 at 7:34 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>
> On Fri, Jan 8, 2021 at 7:13 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>
> > I've minimally rebased the patch to current head so that it compiles
> > and passes current make check.
>
> Full version attached
New version attached with improved error messages, some additional
docs and a review of tests.
make[2]: Entering directory '/var/lib/postgresql/git/postgresql/src/backend/parser''/usr/bin/perl' ./check_keywords.pl gram.y ../../../src/include/parser/kwlist.h/usr/bin/bison -Wno-deprecated -d -o gram.c gram.ygram.y:3685.55-56: error: $4 of ‘ColConstraintElem’ has no declared typen->contype = ($4)->contype;^^gram.y:3687.56-57: error: $4 of ‘ColConstraintElem’ has no declared typen->raw_expr = ($4)->raw_expr;^^gram.y:3734.41-42: error: $$ of ‘generated_type’ has no declared type$$ = n;^^gram.y:3741.41-42: error: $$ of ‘generated_type’ has no declared type$$ = n;^^gram.y:3748.41-42: error: $$ of ‘generated_type’ has no declared type$$ = n;^^../../../src/Makefile.global:750: recipe for target 'gram.c' failedmake[2]: *** [gram.c] Error 1make[2]: Leaving directory '/var/lib/postgresql/git/postgresql/src/backend/parser'Makefile:137: recipe for target 'parser/gram.h' failedmake[1]: *** [parser/gram.h] Error 2make[1]: Leaving directory '/var/lib/postgresql/git/postgresql/src/backend'src/Makefile.global:389: recipe for target 'submake-generated-headers' failedmake: *** [submake-generated-headers] Error 2
* UPDATE doesn't set EndTime correctly, so err... the patch doesn't
work on this aspect.
Everything else does actually work, AFAICS, so we "just" need a way to
update the END_TIME column in place...
So input from other Hackers/Committers needed on this point to see
what is acceptable.
* Anomalies around use of CURRENT_TIMESTAMP are not discussed or resolved
* No discussion, comments or tests around freezing and whether that
causes issues here
* What happens if you ask for a future time?
It will give an inconsistent result as it scans, so we should refuse a
query for time > current_timestamp.
* ALTER TABLE needs some work, it's a bit klugey at the moment and
needs extra tests.
Should refuse DROP COLUMN on a system time column
* Do StartTime and EndTime show in SELECT *? Currently, yes. Would
guess we wouldn't want them to, not sure what standard says.
> SELECT * shows the timestamp columns, don't we need to hide the period
> timestamp columns from this query?
>
>
The sql standard didn't dictate hiding the column but i agree hiding it by
default is good thing because this columns are used by the system
to classified the data and not needed in user side frequently. I can
change to that if we have consensus
* The syntax changes in gram.y probably need some coralling
Overall, it's a pretty good patch and worth working on more. I will
consider a recommendation on what to do with this.
--
Simon Riggs http://www.EnterpriseDB.com/
On Fri, Jan 8, 2021 at 4:50 PM Ryan Lambert <ryan@rustprooflabs.com> wrote: > > On Fri, Jan 8, 2021 at 5:34 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: >> >> On Fri, Jan 8, 2021 at 7:34 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: >> > >> > On Fri, Jan 8, 2021 at 7:13 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: >> > >> > > I've minimally rebased the patch to current head so that it compiles >> > > and passes current make check. >> > >> > Full version attached >> >> New version attached with improved error messages, some additional >> docs and a review of tests. >> > > The v10 patch fails to make on the current master branch (15b824da). Error: Updated v11 with additional docs and some rewording of messages/tests to use "system versioning" correctly. No changes on the points previously raised. -- Simon Riggs http://www.EnterpriseDB.com/
Вложения
On Fri, Jan 8, 2021 at 4:50 PM Ryan Lambert <ryan@rustprooflabs.com> wrote:
>
> On Fri, Jan 8, 2021 at 5:34 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>>
>> On Fri, Jan 8, 2021 at 7:34 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>> >
>> > On Fri, Jan 8, 2021 at 7:13 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>> >
>> > > I've minimally rebased the patch to current head so that it compiles
>> > > and passes current make check.
>> >
>> > Full version attached
>>
>> New version attached with improved error messages, some additional
>> docs and a review of tests.
>>
>
> The v10 patch fails to make on the current master branch (15b824da). Error:
Updated v11 with additional docs and some rewording of messages/tests
to use "system versioning" correctly.
No changes on the points previously raised.
--
Simon Riggs http://www.EnterpriseDB.com/
Thank you! The v11 applies and installs. I tried a simple test, unfortunately it appears the versioning is not working. The initial value is not preserved through an update and a new row does not appear to be created.
CREATE TABLE t
(
id BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
v BIGINT NOT NULL
)
WITH SYSTEM VERSIONING
;
Verify start/end time columns created.
t=# \d t
Table "public.t"
Column | Type | Collation | Nullable | Default
-----------+--------------------------+-----------+----------+----------------------------------
id | bigint | | not null | generated by default as identity
v | bigint | | not null |
StartTime | timestamp with time zone | | not null | generated always as row start
EndTime | timestamp with time zone | | not null | generated always as row end
Indexes:
"t_pkey" PRIMARY KEY, btree (id, "EndTime")
Add a row and check the timestamps set as expected.
INSERT INTO t (v) VALUES (1);
SELECT * FROM t;
id | v | StartTime | EndTime
----+---+-------------------------------+----------
1 | 1 | 2021-01-08 20:56:20.848097+00 | infinity
Update the row.
UPDATE t SET v = -1;
The value for v updated but StartTime is the same.
SELECT * FROM t;
id | v | StartTime | EndTime
----+----+-------------------------------+----------
1 | -1 | 2021-01-08 20:56:20.848097+00 | infinity
Querying the table for all versions only returns the single updated row (v = -1) with the original row StartTime. The original value has disappeared entirely it seems.
SELECT * FROM t
FOR SYSTEM_TIME FROM '-infinity' TO 'infinity';
I also created a non-versioned table and later added the columns using ALTER TABLE and encountered the same behavior.
Ryan Lambert
On Fri, Jan 8, 2021 at 9:19 PM Ryan Lambert <ryan@rustprooflabs.com> wrote: >> Updated v11 with additional docs and some rewording of messages/tests >> to use "system versioning" correctly. >> >> No changes on the points previously raised. >> > Thank you! The v11 applies and installs. I tried a simple test, unfortunately it appears the versioning is not working.The initial value is not preserved through an update and a new row does not appear to be created. Agreed. I already noted this in my earlier review comments. I will send in a new version with additional tests so we can see more clearly that the tests fail on the present patch. I will post more on this next week. -- Simon Riggs http://www.EnterpriseDB.com/
On Sat, Jan 9, 2021 at 10:39 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > On Fri, Jan 8, 2021 at 9:19 PM Ryan Lambert <ryan@rustprooflabs.com> wrote: > > >> Updated v11 with additional docs and some rewording of messages/tests > >> to use "system versioning" correctly. > >> > >> No changes on the points previously raised. > >> > > Thank you! The v11 applies and installs. I tried a simple test, unfortunately it appears the versioning is not working.The initial value is not preserved through an update and a new row does not appear to be created. > > Agreed. I already noted this in my earlier review comments. I'm pleased to note that UPDATE-not-working was a glitch, possibly in an earlier patch merge. That now works as advertised. I've added fairly clear SGML docs to explain how the current patch works, which should assist wider review. Also moved test SQL around a bit, renamed some things in code for readability, but not done any structural changes. This is looking much better now... with the TODO/issues list now looking like this... * Anomalies around use of CURRENT_TIMESTAMP are not discussed or resolved. Probably need to add a test that end_timestamp > start_timestamp or ERROR, which effectively enforces serializability. * No discussion, comments or tests around freezing and whether that causes issues here * What happens if you ask for a future time? It will give an inconsistent result as it scans, so we should refuse a query for time > current_timestamp. * ALTER TABLE needs some work, it's a bit klugey at the moment and needs extra tests. Should refuse DROP COLUMN on a system time column, but currently doesn't * UPDATE foo SET start_timestamp = DEFAULT should fail but currently doesn't * Do StartTime and EndTime show in SELECT *? Currently, yes. Would guess we wouldn't want them to, not sure what standard says. From here, the plan would be to set this to "Ready For Committer" in about a week. That is not the same thing as me saying it is ready-for-commit, but we need some more eyes on this patch to decide if it is something we want and, if so, are the code changes cool. -- Simon Riggs http://www.EnterpriseDB.com/
Вложения
On Sat, Jan 9, 2021 at 10:39 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
>
> On Fri, Jan 8, 2021 at 9:19 PM Ryan Lambert <ryan@rustprooflabs.com> wrote:
>
> >> Updated v11 with additional docs and some rewording of messages/tests
> >> to use "system versioning" correctly.
> >>
> >> No changes on the points previously raised.
> >>
> > Thank you! The v11 applies and installs. I tried a simple test, unfortunately it appears the versioning is not working. The initial value is not preserved through an update and a new row does not appear to be created.
>
> Agreed. I already noted this in my earlier review comments.
I'm pleased to note that UPDATE-not-working was a glitch, possibly in
an earlier patch merge. That now works as advertised.
I've added fairly clear SGML docs to explain how the current patch
works, which should assist wider review.
Also moved test SQL around a bit, renamed some things in code for
readability, but not done any structural changes.
This is looking much better now... with the TODO/issues list now
looking like this...
* Anomalies around use of CURRENT_TIMESTAMP are not discussed or resolved.
Probably need to add a test that end_timestamp > start_timestamp or ERROR,
which effectively enforces serializability.
* No discussion, comments or tests around freezing and whether that
causes issues here
* What happens if you ask for a future time?
It will give an inconsistent result as it scans, so we should refuse a
query for time > current_timestamp.
* ALTER TABLE needs some work, it's a bit klugey at the moment and
needs extra tests.
Should refuse DROP COLUMN on a system time column, but currently doesn't
* UPDATE foo SET start_timestamp = DEFAULT should fail but currently doesn't
* Do StartTime and EndTime show in SELECT *? Currently, yes. Would
guess we wouldn't want them to, not sure what standard says.
From here, the plan would be to set this to "Ready For Committer" in
about a week. That is not the same thing as me saying it is
ready-for-commit, but we need some more eyes on this patch to decide
if it is something we want and, if so, are the code changes cool.
On 1/8/21 7:33 AM, Simon Riggs wrote:
>
> * What happens if you ask for a future time?
> It will give an inconsistent result as it scans, so we should refuse a
> query for time > current_timestamp.
That seems like a significant limitation. Can we fix it instead of
refusing the query?
* Anomalies around use of CURRENT_TIMESTAMP are not discussed or resolved.
Probably need to add a test that end_timestamp > start_timestamp or ERROR,
which effectively enforces serializability.
* No discussion, comments or tests around freezing and whether that
causes issues here
* ALTER TABLE needs some work, it's a bit klugey at the moment and
needs extra tests.
Should refuse DROP COLUMN on a system time column, but currently doesn't
* UPDATE foo SET start_timestamp = DEFAULT should fail but currently doesn't
I prefer to have them hidden by default. This was mentioned up-thread with no decision, it seems the standard is ambiguous. MS SQL appears to have flip-flopped on this decision [1].
On Thu, Jan 14, 2021 at 5:46 PM Surafel Temesgen <surafel3000@gmail.com> wrote: > On Fri, Jan 8, 2021 at 7:50 PM Ryan Lambert <ryan@rustprooflabs.com> wrote: >> >> I prefer to have them hidden by default. This was mentioned up-thread with no decision, it seems the standard is ambiguous. MS SQL appears to have flip-flopped on this decision [1]. I think the default should be like this: SELECT * FROM foo FOR SYSTEM_TIME AS OF ... should NOT include the Start and End timestamp columns because this acts like a normal query just with a different snapshot timestamp SELECT * FROM foo FOR SYSTEM_TIME BETWEEN x AND y SHOULD include the Start and End timestamp columns since this form of query can include multiple row versions for the same row, so it makes sense to see the validity times -- Simon Riggs http://www.EnterpriseDB.com/
On Thu, Jan 14, 2021 at 5:42 PM Surafel Temesgen <surafel3000@gmail.com> wrote: > > Hi Simon, > Thank you for all the work you does No problem. > On Mon, Jan 11, 2021 at 5:02 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote: >> >> >> >> * Anomalies around use of CURRENT_TIMESTAMP are not discussed or resolved. >> Probably need to add a test that end_timestamp > start_timestamp or ERROR, >> which effectively enforces serializability. >> > > > This scenario doesn't happen. Yes, I think it can. The current situation is that the Start or End is set to the Transaction Start Timestamp. So if t2 starts before t1, then if t1 creates a row and t2 deletes it then we will have start=t1 end=t2, but t2<t1 Your tests don't show that because it must happen concurrently. We need to add an isolation test to show this, or to prove it doesn't happen. > There are no possibility of a record being deleted or updated before inserting Agreed, but that was not the point. -- Simon Riggs http://www.EnterpriseDB.com/
On Thu, Jan 14, 2021 at 5:46 PM Surafel Temesgen <surafel3000@gmail.com> wrote:
> On Fri, Jan 8, 2021 at 7:50 PM Ryan Lambert <ryan@rustprooflabs.com> wrote:
>>
>> I prefer to have them hidden by default. This was mentioned up-thread with no decision, it seems the standard is ambiguous. MS SQL appears to have flip-flopped on this decision [1].
I think the default should be like this:
SELECT * FROM foo FOR SYSTEM_TIME AS OF ...
should NOT include the Start and End timestamp columns
because this acts like a normal query just with a different snapshot timestamp
SELECT * FROM foo FOR SYSTEM_TIME BETWEEN x AND y
SHOULD include the Start and End timestamp columns
since this form of query can include multiple row versions for the
same row, so it makes sense to see the validity times
Yes, I think it can. The current situation is that the Start or End is
set to the Transaction Start Timestamp.
So if t2 starts before t1, then if t1 creates a row and t2 deletes it
then we will have start=t1 end=t2, but t2<t1
Your tests don't show that because it must happen concurrently.
We need to add an isolation test to show this, or to prove it doesn't happen.
On Fri, Jan 15, 2021 at 4:46 PM Surafel Temesgen <surafel3000@gmail.com> wrote: > > > > On Fri, Jan 15, 2021 at 12:27 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: >> >> >> Yes, I think it can. The current situation is that the Start or End is >> set to the Transaction Start Timestamp. >> So if t2 starts before t1, then if t1 creates a row and t2 deletes it >> then we will have start=t1 end=t2, but t2<t1 >> Your tests don't show that because it must happen concurrently. >> We need to add an isolation test to show this, or to prove it doesn't happen. >> > > > Does MVCC allow that? i am not expert on MVCC but i don't > think t2 can see the row create by translation started before > itself Yeh. Read Committed mode can see anything committed prior to the start of the current statement, but UPDATEs always update the latest version even if they can't see it. Anyway, isolationtester spec file needed to test this. See src/test/isolation and examples in specs/ directory -- Simon Riggs http://www.EnterpriseDB.com/
SELECT * FROM foo FOR SYSTEM_TIME AS OF ...
should NOT include the Start and End timestamp columns
because this acts like a normal query just with a different snapshot timestamp
SELECT * FROM foo FOR SYSTEM_TIME BETWEEN x AND y
SHOULD include the Start and End timestamp columns
since this form of query can include multiple row versions for the
same row, so it makes sense to see the validity times
On Fri, Jan 15, 2021 at 4:56 PM Surafel Temesgen <surafel3000@gmail.com> wrote: > > > > On Fri, Jan 15, 2021 at 12:22 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: >> >> SELECT * FROM foo FOR SYSTEM_TIME AS OF ... >> should NOT include the Start and End timestamp columns >> because this acts like a normal query just with a different snapshot timestamp >> >> SELECT * FROM foo FOR SYSTEM_TIME BETWEEN x AND y >> SHOULD include the Start and End timestamp columns >> since this form of query can include multiple row versions for the >> same row, so it makes sense to see the validity times >> > > One disadvantage of returning system time columns is it > breaks upward compatibility. if an existing application wants to > switch to system versioning it will break. There are no existing applications, so for PostgreSQL, it wouldn't be an issue. If you mean compatibility with other databases, that might be an argument to do what others have done. What have other databases done for SELECT * ? -- Simon Riggs http://www.EnterpriseDB.com/
Hello, it seems that Oracle (11R2) doesn't add the Start and End timestamp columns and permit statement like select * from tt union select * from tt AS OF TIMESTAMP (SYSTIMESTAMP - INTERVAL '6' SECOND) minus select * from tt VERSIONS BETWEEN TIMESTAMP (SYSTIMESTAMP - INTERVAL '6' second) and SYSTIMESTAMP; Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
On 1/14/21 10:22 PM, Simon Riggs wrote: > On Thu, Jan 14, 2021 at 5:46 PM Surafel Temesgen <surafel3000@gmail.com> wrote: > >> On Fri, Jan 8, 2021 at 7:50 PM Ryan Lambert <ryan@rustprooflabs.com> wrote: >>> >>> I prefer to have them hidden by default. This was mentioned up-thread with no decision, it seems the standard is ambiguous. MS SQL appears to have flip-flopped on this decision [1]. > > I think the default should be like this: > > SELECT * FROM foo FOR SYSTEM_TIME AS OF ... > should NOT include the Start and End timestamp columns > because this acts like a normal query just with a different snapshot timestamp > > SELECT * FROM foo FOR SYSTEM_TIME BETWEEN x AND y > SHOULD include the Start and End timestamp columns > since this form of query can include multiple row versions for the > same row, so it makes sense to see the validity times I don't read the standard as being ambiguous about this at all. The columns should be shown just like any other column of the table. I am not opposed to being able to set an attribute on columns allowing them to be excluded from "*" but that is irrelevant to this patch. -- Vik Fearing
On 1/14/21 6:42 PM, Surafel Temesgen wrote: > Hi Simon, > Thank you for all the work you does > > On Mon, Jan 11, 2021 at 5:02 PM Simon Riggs <simon.riggs@enterprisedb.com> > wrote: > >> >> >> * Anomalies around use of CURRENT_TIMESTAMP are not discussed or resolved. >> Probably need to add a test that end_timestamp > start_timestamp or ERROR, >> which effectively enforces serializability. >> >> > > This scenario doesn't happen. It *does* happen and the standard even provides a specific error code for it (2201H). Please look at my extension for this feature which implements all the requirements of the standard (except syntax grammar, of course). https://github.com/xocolatl/periods/ -- Vik Fearing
There are no existing applications, so for PostgreSQL, it wouldn't be an issue.
On 1/16/21 7:39 PM, Surafel Temesgen wrote: > On Fri, Jan 15, 2021 at 8:02 PM Simon Riggs <simon.riggs@enterprisedb.com> > wrote: > >> >> There are no existing applications, so for PostgreSQL, it wouldn't be an >> issue. >> >> > Yes we don't have but the main function of ALTER TABLE foo ADD SYSTEM > VERSIONING > is to add system versioning functionality to existing application I haven't looked at this patch in a while, but I hope that ALTER TABLE t ADD SYSTEM VERSIONING is not adding any columns. That is a bug if it does. -- Vik Fearing
I haven't looked at this patch in a while, but I hope that ALTER TABLE t
ADD SYSTEM VERSIONING is not adding any columns. That is a bug if it does.
On 1/17/21 5:46 PM, Surafel Temesgen wrote: > On Sat, Jan 16, 2021 at 10:12 PM Vik Fearing <vik@postgresfriends.org> > wrote: > >> >> I haven't looked at this patch in a while, but I hope that ALTER TABLE t >> ADD SYSTEM VERSIONING is not adding any columns. That is a bug if it does. >> >> > Yes, that is how I implement it. I don't understand how it became a bug? This is not good, and I see that DROP SYSTEM VERSIONING also removes these columns which is even worse. Please read the standard that you are trying to implement! I will do a more thorough review of the functionalities in this patch (not necessarily the code) this week. -- Vik Fearing
This is not good, and I see that DROP SYSTEM VERSIONING also removes
these columns which is even worse. Please read the standard that you
are trying to implement!
I will do a more thorough review of the functionalities in this patch
(not necessarily the code) this week.
On 1/11/21 3:02 PM, Simon Riggs wrote: > * UPDATE foo SET start_timestamp = DEFAULT should fail but currently doesn't I'm still in the weeds of reviewing this patch, but why should this fail? It should not fail. -- Vik Fearing
On Tue, Jan 26, 2021 at 11:33 AM Vik Fearing <vik@postgresfriends.org> wrote: > > On 1/11/21 3:02 PM, Simon Riggs wrote: > > * UPDATE foo SET start_timestamp = DEFAULT should fail but currently doesn't > > I'm still in the weeds of reviewing this patch, but why should this > fail? It should not fail. It should not be possible for the user to change the start or end timestamp of a system_time time range, by definition. -- Simon Riggs http://www.EnterpriseDB.com/
On 1/26/21 1:16 PM, Simon Riggs wrote: > On Tue, Jan 26, 2021 at 11:33 AM Vik Fearing <vik@postgresfriends.org> wrote: >> >> On 1/11/21 3:02 PM, Simon Riggs wrote: >>> * UPDATE foo SET start_timestamp = DEFAULT should fail but currently doesn't >> >> I'm still in the weeds of reviewing this patch, but why should this >> fail? It should not fail. > > It should not be possible for the user to change the start or end > timestamp of a system_time time range, by definition. Correct, but setting it to DEFAULT is not changing it. See also SQL:2016 11.5 <default clause> General Rule 3.a. -- Vik Fearing
On Tue, Jan 26, 2021 at 12:51 PM Vik Fearing <vik@postgresfriends.org> wrote: > > On 1/26/21 1:16 PM, Simon Riggs wrote: > > On Tue, Jan 26, 2021 at 11:33 AM Vik Fearing <vik@postgresfriends.org> wrote: > >> > >> On 1/11/21 3:02 PM, Simon Riggs wrote: > >>> * UPDATE foo SET start_timestamp = DEFAULT should fail but currently doesn't > >> > >> I'm still in the weeds of reviewing this patch, but why should this > >> fail? It should not fail. > > > > It should not be possible for the user to change the start or end > > timestamp of a system_time time range, by definition. > > Correct, but setting it to DEFAULT is not changing it. > > See also SQL:2016 11.5 <default clause> General Rule 3.a. Thanks for pointing this out. Identity columns don't currently work that way. -- Simon Riggs http://www.EnterpriseDB.com/
I'm still in the weeds of reviewing this patch, but why should this
fail? It should not fail.
Вложения
hi, i tested the temporal patch ( https://commitfest.postgresql.org/26/2316/ ) with the current 14devel applied ontop of ef3d461without any conflicts. i build with no special options passed to ./configure and noticed, that the postgresql-client-13 from the debian repositoriescrashes with the \d command to reproduce the issue: CREATE TABLE test ( id int PRIMARY KEY generated ALWAYS AS IDENTITY, name text NOT NULL, start_timestamp timestamp with time zone GENERATED ALWAYS AS ROW START, end_timestamp timestamp with time zone GENERATED ALWAYS AS ROW END, PERIOD FOR SYSTEM_TIME (start_timestamp, end_timestamp) ); \d test it failes after outputting the table informations with this backtrace: free(): invalid pointer [1] 587783 abort (core dumped) psql -X -U easteregg -h localhost postgres (gdb) bt 50 #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f21a62e0537 in __GI_abort () at abort.c:79 #2 0x00007f21a6339768 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f21a6447e31 "%s\n") at ../sysdeps/posix/libc_fatal.c:155 #3 0x00007f21a6340a5a in malloc_printerr (str=str@entry=0x7f21a644605e "free(): invalid pointer") at malloc.c:5347 #4 0x00007f21a6341c14 in _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:4173 #5 0x000055c9fa47b602 in printTableCleanup (content=content@entry=0x7ffece7e41c0) at ./build/../src/fe_utils/print.c:3250 #6 0x000055c9fa444aa3 in describeOneTableDetails (schemaname=<optimized out>, schemaname@entry=0x55c9fbebfee6 "public",relationname=<optimized out>, oid=oid@entry=0x55c9fbebfee0 "16436", verbose=verbose@entry=false) at ./build/../src/bin/psql/describe.c:3337 #7 0x000055c9fa4490c9 in describeTableDetails (pattern=pattern@entry=0x55c9fbebf540 "abk", verbose=verbose@entry=false,showSystem=<optimized out>) at ./build/../src/bin/psql/describe.c:1421 #8 0x000055c9fa4372ff in exec_command_d (scan_state=scan_state@entry=0x55c9fbebd130, active_branch=active_branch@entry=true,cmd=cmd@entry=0x55c9fbebf430 "d") at ./build/../src/bin/psql/command.c:722 #9 0x000055c9fa43ae2b in exec_command (previous_buf=0x55c9fbebd3a0, query_buf=0x55c9fbebd270, cstack=0x55c9fbebd250, scan_state=0x55c9fbebd130,cmd=0x55c9fbebf430 "d") at ./build/../src/bin/psql/command.c:317 #10 HandleSlashCmds (scan_state=scan_state@entry=0x55c9fbebd130, cstack=cstack@entry=0x55c9fbebd250, query_buf=0x55c9fbebd270,previous_buf=0x55c9fbebd3a0) at ./build/../src/bin/psql/command.c:220 #11 0x000055c9fa4539e0 in MainLoop (source=0x7f21a6479980 <_IO_2_1_stdin_>) at ./build/../src/bin/psql/mainloop.c:502 #12 0x000055c9fa433d64 in main (argc=<optimized out>, argv=0x7ffece7e47f8) at ./build/../src/bin/psql/startup.c:441 the client is this version: apt-cache policy postgresql-client-13 postgresql-client-13: Installed: 13.1-1.pgdg+2+b3 Candidate: 13.1-1.pgdg+2+b3 Version table: *** 13.1-1.pgdg+2+b3 100 100 http://apt.postgresql.org/pub/repos/apt sid-pgdg-testing/main amd64 Packages 100 /var/lib/dpkg/status the the 14devel version from my build or a selfcompiled REL_13_STABLE client will not crash. i was wondering if this might pose a security concern. i am a bit out of my depths here, but would be glad to help, if any informations are missing with kind regards, richard
On Mon, Mar 8, 2021 at 11:04 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote: > > > > On Thu, Feb 25, 2021 at 3:28 PM Li Japin <japinli@hotmail.com> wrote: >> >> >> On Jan 27, 2021, at 12:39 AM, Surafel Temesgen <surafel3000@gmail.com> wrote: >> >> >> >> On Tue, Jan 26, 2021 at 2:33 PM Vik Fearing <vik@postgresfriends.org> wrote: >>> >>> I'm still in the weeds of reviewing this patch, but why should this >>> fail? It should not fail. >> >> >> Attached is rebased patch that include isolation test >> >> >> Thanks for updating the patch. However it cannot apply to master (e5d8a9990). >> >> Here are some comments on system-versioning-temporal-table_2021_v13.patch. >> >> +</programlisting> >> + When system versioning is specified two columns are added which >> + record the start timestamp and end timestamp of each row verson. >> >> verson -> version >> >> + By default, the column names will be StartTime and EndTime, though >> + you can specify different names if you choose. >> >> In fact, it is start_time and end_time, not StartTime and EndTime. >> I think it's better to use <literal> label around start_time and end_time. >> >> + column will be automatically added to the Primary Key of the >> + table. >> >> Should we mention the unique constraints? >> >> + The system versioning period end column will be added to the >> + Primary Key of the table as a way of ensuring that concurrent >> + INSERTs conflict correctly. >> >> Same as above. >> >> Since the get_row_start_time_col_name() and get_row_end_time_col_name() >> are similar, IMO we can pass a flag to get StartTime/EndTime column name, >> thought? >> >> -- >> Regrads, >> Japin Li. >> ChengDu WenWu Information Technology Co.,Ltd. >> > The patch (system-versioning-temporal-table_2021_v13.patch) does not apply successfully. > > http://cfbot.cputube.org/patch_32_2316.log > > Hunk #1 FAILED at 80. > 1 out of 1 hunk FAILED -- saving rejects to file src/test/regress/parallel_schedule.rej > patching file src/test/regress/serial_schedule > Hunk #1 succeeded at 126 (offset -1 lines). > > > Therefore it is a minor change so I rebased the patch, please take a look at that. The patch does not apply on Head anymore, could you rebase and post a patch. I'm changing the status to "Waiting for Author". Regards, Vignesh
On Wed, 14 Jul 2021 at 12:48, vignesh C <vignesh21@gmail.com> wrote: > The patch does not apply on Head anymore, could you rebase and post a > patch. I'm changing the status to "Waiting for Author". OK, so I've rebased the patch against current master to take it to v15. I've then worked on the patch some myself to make v16 (attached), adding these things: * Add code, docs and test to remove the potential anomaly where endtime < starttime, using the sqlstate 2201H as pointed out by Vik * Add code and tests to handle multiple changes in a transaction correctly, according to SQL Std * Add code and tests to make Foreign Keys behave correctly, according to SQL Std * Fixed nascent bug in relcache setup code * Various small fixes from Japin's review - thanks! I've used starttime and endtime as default column names * Additional tests and docs to show that the functionality works with or without PKs on the table I am now satisfied that the patch does not have any implementation anomalies in behavioral design, but it is still a long way short in code architecture. There are various aspects still needing work. This is not yet ready for Commit, but it is appropriate now to ask for initial design guidance on architecture and code placement by a Committer, so I am setting this to Ready For Committer, in the hope that we get the review in SeptCF and a later version can be submitted for later commit in JanCF. With the right input, this patch is about a person-month away from being ready, assuming we don't hit any blocking issues. Major Known Issues * SQLStd says that it should not be possible to update historical rows, but those tests show we fail to prevent that and there is code marked NOT_USED in those areas * The code is structured poorly around parse-analyze/rewriter/optimizer/executor and that needs positive design recommendations, rather than critical review * Joins currently fail because of the botched way WHERE clauses are added, resulting in duplicate names * Views probably don't work, but there are no tests * CREATE TABLE (LIKE foo) doesn't correctly copy across all features - test for that added, with test failure accepted for now * ALTER TABLE is still incomplete and also broken; I suggest we remove that for the first version of the patch to reduce patch size for an initial commit. Minor Known Issues * Logical replication needs some minor work, no tests yet * pg_dump support looks like it exists and might work easily, but there are no tests yet * Correlation names don't work in FROM clause - shift/reduce errors from double use of AS * Add test and code to prevent triggers referencing period cols in the WHEN clause * No tests yet to prove you can't set various parameters/settings on the period time start/end cols * Code needs some cleanup in a few places * Not really sure what value is added by lock-update-delete-system-versioned.spec * IMHO we should make the PK definition use "endtime DESC", so that the current version is always the first row found in the PK for any key, since historical indexes will grow bigger over time There are no expected issues with integration with MERGE, since SQLStd explains how to handle that. Other reviews are welcome. -- Simon Riggs http://www.EnterpriseDB.com/
Вложения
Hi,
quick note: The documentation for this patch mentions:
 The <literal>starttime</literal> column
+    will be automatically added to the Primary Key of the table.
A quick tests shows that the endtime column is added instead:
postgres=# create table t1 ( a int primary key generated always as identity, b text ) with system versioning;
CREATE TABLE
postgres=# \d t1
                                      Table "public.t1"
  Column   |           Type           | Collation | Nullable |            Default            
-----------+--------------------------+-----------+----------+-------------------------------
 a         | integer                  |           | not null | generated always as identity
 b         | text                     |           |          | 
 starttime | timestamp with time zone |           | not null | generated always as row start
 endtime   | timestamp with time zone |           | not null | generated always as row end
Indexes:
    "t1_pkey" PRIMARY KEY, btree (a, endtime)
Regards
Daniel
			
		On Wed, 14 Jul 2021 at 12:48, vignesh C <vignesh21@gmail.com> wrote:
> The patch does not apply on Head anymore, could you rebase and post a
> patch. I'm changing the status to "Waiting for Author".
OK, so I've rebased the patch against current master to take it to v15.
I've then worked on the patch some myself to make v16 (attached),
adding these things:
* Add code, docs and test to remove the potential anomaly where
endtime < starttime, using the sqlstate 2201H as pointed out by Vik
* Add code and tests to handle multiple changes in a transaction
correctly, according to SQL Std
* Add code and tests to make Foreign Keys behave correctly, according to SQL Std
* Fixed nascent bug in relcache setup code
* Various small fixes from Japin's review - thanks! I've used
starttime and endtime as default column names
* Additional tests and docs to show that the functionality works with
or without PKs on the table
I am now satisfied that the patch does not have any implementation
anomalies in behavioral design, but it is still a long way short in
code architecture.
There are various aspects still needing work. This is not yet ready
for Commit, but it is appropriate now to ask for initial design
guidance on architecture and code placement by a Committer, so I am
setting this to Ready For Committer, in the hope that we get the
review in SeptCF and a later version can be submitted for later commit
in JanCF. With the right input, this patch is about a person-month
away from being ready, assuming we don't hit any blocking issues.
Major Known Issues
* SQLStd says that it should not be possible to update historical
rows, but those tests show we fail to prevent that and there is code
marked NOT_USED in those areas
* The code is structured poorly around
parse-analyze/rewriter/optimizer/executor and that needs positive
design recommendations, rather than critical review
* Joins currently fail because of the botched way WHERE clauses are
added, resulting in duplicate names
* Views probably don't work, but there are no tests
* CREATE TABLE (LIKE foo) doesn't correctly copy across all features -
test for that added, with test failure accepted for now
* ALTER TABLE is still incomplete and also broken; I suggest we remove
that for the first version of the patch to reduce patch size for an
initial commit.
Minor Known Issues
* Logical replication needs some minor work, no tests yet
* pg_dump support looks like it exists and might work easily, but
there are no tests yet
* Correlation names don't work in FROM clause - shift/reduce errors
from double use of AS
* Add test and code to prevent triggers referencing period cols in the
WHEN clause
* No tests yet to prove you can't set various parameters/settings on
the period time start/end cols
* Code needs some cleanup in a few places
* Not really sure what value is added by
lock-update-delete-system-versioned.spec
* IMHO we should make the PK definition use "endtime DESC", so that
the current version is always the first row found in the PK for any
key, since historical indexes will grow bigger over time
There are no expected issues with integration with MERGE, since SQLStd
explains how to handle that.
Other reviews are welcome.
--
Simon Riggs http://www.EnterpriseDB.com/
--
On Tue, Aug 10, 2021 at 01:20:14PM +0100, Simon Riggs wrote: > On Wed, 14 Jul 2021 at 12:48, vignesh C <vignesh21@gmail.com> wrote: > > > The patch does not apply on Head anymore, could you rebase and post a > > patch. I'm changing the status to "Waiting for Author". > > OK, so I've rebased the patch against current master to take it to v15. > > I've then worked on the patch some myself to make v16 (attached), > adding these things: > Hi Simon, This one doesn't apply nor compile anymore. Can we expect a rebase soon? -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
On Fri, 10 Sept 2021 at 19:30, Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > > On Tue, Aug 10, 2021 at 01:20:14PM +0100, Simon Riggs wrote: > > On Wed, 14 Jul 2021 at 12:48, vignesh C <vignesh21@gmail.com> wrote: > > > > > The patch does not apply on Head anymore, could you rebase and post a > > > patch. I'm changing the status to "Waiting for Author". > > > > OK, so I've rebased the patch against current master to take it to v15. > > > > I've then worked on the patch some myself to make v16 (attached), > > adding these things: > > > > Hi Simon, > > This one doesn't apply nor compile anymore. > Can we expect a rebase soon? Hi Jaime, Sorry for not replying. Yes, I will rebase again to assist the design input I have requested. Please expect that on Sep 15. Cheers -- Simon Riggs http://www.EnterpriseDB.com/
On Fri, 10 Sept 2021 at 19:30, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:
>
> On Tue, Aug 10, 2021 at 01:20:14PM +0100, Simon Riggs wrote:
> > On Wed, 14 Jul 2021 at 12:48, vignesh C <vignesh21@gmail.com> wrote:
> >
> > > The patch does not apply on Head anymore, could you rebase and post a
> > > patch. I'm changing the status to "Waiting for Author".
> >
> > OK, so I've rebased the patch against current master to take it to v15.
> >
> > I've then worked on the patch some myself to make v16 (attached),
> > adding these things:
> >
>
> Hi Simon,
>
> This one doesn't apply nor compile anymore.
> Can we expect a rebase soon?
Hi Jaime,
Sorry for not replying.
Yes, I will rebase again to assist the design input I have requested.
Please expect that on Sep 15.
Cheers
--
Simon Riggs http://www.EnterpriseDB.com/
1. Much of what I have read about temporal tables seemed to imply or almost assume that system temporal tables would be implemented as two actual separate tables. Indeed, SQLServer appears to do it that way [1] with syntax like
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.WebsiteUserInfoHistory));
2. The endtime column constraint which enforces GENERATED ALWAYS AS ROW END seems like it would have appeal outside of system versioning, as a lot of tables have a last_updated column, and it would be nice if it could handle itself and not rely on fallible application programmers or require trigger overhead.
CREATE TABLE Emp(ENo INTEGER,EStart DATE,EEnd DATE,EDept INTEGER,PERIOD FOR EPeriod (EStart, EEnd),Sys_start TIMESTAMP(12) GENERATED ALWAYS AS ROW START,Sys_end TIMESTAMP(12) GENERATED ALWAYS AS ROW END,EName VARCHAR(30),PERIOD FOR SYSTEM_TIME(Sys_start, Sys_end),PRIMARY KEY (ENo, EPeriod WITHOUT OVERLAPS),FOREIGN KEY (Edept, PERIOD EPeriod) REFERENCES Dept (DNo, PERIOD DPeriod)) WITH SYSTEM VERSIONING
Q 6.1. Would system versioning belong in such a view?
1. Much of what I have read about temporal tables seemed to imply or almost assume that system temporal tables would be implemented as two actual separate tables. Indeed, SQLServer appears to do it that way [1] with syntax likeWITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.WebsiteUserInfoHistory));Q 1.1. Was that implementation considered and if so, what made this implementation more appealing?
13. If a history table is directly referenceable, then SELECT permission can be granted or revoked as normal, but all insert/update/delete/truncate options would raise an error.
On Sun, 19 Sept 2021 at 01:16, Corey Huinker <corey.huinker@gmail.com> wrote: >> >> 1. Much of what I have read about temporal tables seemed to imply or almost assume that system temporal tables would beimplemented as two actual separate tables. Indeed, SQLServer appears to do it that way [1] with syntax like >> >> WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.WebsiteUserInfoHistory)); >> >> >> Q 1.1. Was that implementation considered and if so, what made this implementation more appealing? >> > > I've been digging some more on this point, and I've reached the conclusion that a separate history table is the betterimplementation. It would make the act of removing system versioning into little more than a DROP TABLE, plus adjustingthe base table to reflect that it is no longer system versioned. Thanks for giving this a lot of thought. When you asked the question the first time you hadn't discussed how that might work, but now we have something to discuss. > 10. Queries that omit the FOR SYSTEM_TIME clause, as well as ones that use FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP, wouldsimply use the base table directly with no quals to add. > 11. Queries that use FOR SYSTEM_TIME and not FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP, then the query would do a union ofthe base table and the history table with quals applied to both. > 14. DROP SYSTEM VERSIONING from a table would be quite straightforward - the history table would be dropped along withthe triggers that reference it, setting relissystemversioned = 'f' on the base table. > > I think this would have some key advantages: > > 1. MVCC bloat is no worse than it was before. The number of row versions stored in the database is the same for both, just it would be split across two tables in this form. > 2. No changes whatsoever to referential integrity. The changes were fairly minor, but I see your thinking about indexes as a simplification. > 3. DROP SYSTEM VERSIONING becomes an O(1) operation. It isn't top of mind to make this work well. The whole purpose of the history is to keep it, not to be able to drop it quickly. > Thoughts? There are 3 implementation routes that I see, so let me explain so that others can join the discussion. 1. Putting all data in one table. This makes DROP SYSTEM VERSIONING effectively impossible. It requires access to the table to be rewritten to add in historical quals for non-historical access and it requires some push-ups around indexes. (The current patch adds the historic quals by kludging the parser, which is wrong place, since it doesn't work for joins etc.. However, given that issue, the rest seems to follow on naturally). 2. Putting data in a side table. This makes DROP SYSTEM VERSIONING fairly trivial, but it complicates many DDL commands (please make a list?) and requires the optimizer to know about this and cater to it, possibly complicating plans. Neither issue is insurmountable, but it becomes more intrusive. The current patch could go in either of the first 2 directions with further work. 3. Let the Table Access Method handle it. I call this out separately since it avoids making changes to the rest of Postgres, which might be a good thing, with the right TAM implementation. My preferred approach would be to do this "for free" in the table access method, but we're a long way from this in terms of actual implementation. When Corey suggested earlier that we just put the syntax in there, this was the direction I was thinking. After waiting a day since I wrote the above, I think we should go with (2) as Corey suggests, at least for now, and we can always add (3) later. -- Simon Riggs http://www.EnterpriseDB.com/
A side table has the nice additional benefit that we can very easily version the *table structure* so when we ALTER TABLE and the table structure changes we just make a new side table with now-currents structure. Also we may want different set of indexes on historic table(s) for whatever reason And we may even want to partition history tables for speed, storage cost or just to drop very ancient history ----- Hannu Krosing Google Cloud - We have a long list of planned contributions and we are hiring. Contact me if interested. On Sun, Sep 19, 2021 at 8:32 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > On Sun, 19 Sept 2021 at 01:16, Corey Huinker <corey.huinker@gmail.com> wrote: > >> > >> 1. Much of what I have read about temporal tables seemed to imply or almost assume that system temporal tables wouldbe implemented as two actual separate tables. Indeed, SQLServer appears to do it that way [1] with syntax like > >> > >> WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.WebsiteUserInfoHistory)); > >> > >> > >> Q 1.1. Was that implementation considered and if so, what made this implementation more appealing? > >> > > > > I've been digging some more on this point, and I've reached the conclusion that a separate history table is the betterimplementation. It would make the act of removing system versioning into little more than a DROP TABLE, plus adjustingthe base table to reflect that it is no longer system versioned. > > Thanks for giving this a lot of thought. When you asked the question > the first time you hadn't discussed how that might work, but now we > have something to discuss. > > > 10. Queries that omit the FOR SYSTEM_TIME clause, as well as ones that use FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP, wouldsimply use the base table directly with no quals to add. > > 11. Queries that use FOR SYSTEM_TIME and not FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP, then the query would do a unionof the base table and the history table with quals applied to both. > > > > 14. DROP SYSTEM VERSIONING from a table would be quite straightforward - the history table would be dropped along withthe triggers that reference it, setting relissystemversioned = 'f' on the base table. > > > > I think this would have some key advantages: > > > > 1. MVCC bloat is no worse than it was before. > > The number of row versions stored in the database is the same for > both, just it would be split across two tables in this form. > > > 2. No changes whatsoever to referential integrity. > > The changes were fairly minor, but I see your thinking about indexes > as a simplification. > > > 3. DROP SYSTEM VERSIONING becomes an O(1) operation. > > It isn't top of mind to make this work well. The whole purpose of the > history is to keep it, not to be able to drop it quickly. > > > > Thoughts? > > There are 3 implementation routes that I see, so let me explain so > that others can join the discussion. > > 1. Putting all data in one table. This makes DROP SYSTEM VERSIONING > effectively impossible. It requires access to the table to be > rewritten to add in historical quals for non-historical access and it > requires some push-ups around indexes. (The current patch adds the > historic quals by kludging the parser, which is wrong place, since it > doesn't work for joins etc.. However, given that issue, the rest seems > to follow on naturally). > > 2. Putting data in a side table. This makes DROP SYSTEM VERSIONING > fairly trivial, but it complicates many DDL commands (please make a > list?) and requires the optimizer to know about this and cater to it, > possibly complicating plans. Neither issue is insurmountable, but it > becomes more intrusive. > > The current patch could go in either of the first 2 directions with > further work. > > 3. Let the Table Access Method handle it. I call this out separately > since it avoids making changes to the rest of Postgres, which might be > a good thing, with the right TAM implementation. > > My preferred approach would be to do this "for free" in the table > access method, but we're a long way from this in terms of actual > implementation. When Corey suggested earlier that we just put the > syntax in there, this was the direction I was thinking. > > After waiting a day since I wrote the above, I think we should go with > (2) as Corey suggests, at least for now, and we can always add (3) > later. > > -- > Simon Riggs http://www.EnterpriseDB.com/ > >
Thanks for giving this a lot of thought. When you asked the question
the first time you hadn't discussed how that might work, but now we
have something to discuss.
There are 3 implementation routes that I see, so let me explain so
that others can join the discussion.
1. Putting all data in one table. This makes DROP SYSTEM VERSIONING
effectively impossible. It requires access to the table to be
rewritten to add in historical quals for non-historical access and it
requires some push-ups around indexes. (The current patch adds the
historic quals by kludging the parser, which is wrong place, since it
doesn't work for joins etc.. However, given that issue, the rest seems
to follow on naturally).
2. Putting data in a side table. This makes DROP SYSTEM VERSIONING
fairly trivial, but it complicates many DDL commands (please make a
list?) and requires the optimizer to know about this and cater to it,
possibly complicating plans. Neither issue is insurmountable, but it
becomes more intrusive.
The current patch could go in either of the first 2 directions with
further work.
3. Let the Table Access Method handle it. I call this out separately
since it avoids making changes to the rest of Postgres, which might be
a good thing, with the right TAM implementation.
A side table has the nice additional benefit that we can very easily
version the *table structure* so when we ALTER TABLE and the table
structure changes we just make a new side table with now-currents
structure.
If we then later added another column that happened to also be named foo but now was type JSONB, would we change the datatype returned based on the time period being queried?
Is the change in structure a system versioning which itself must be captured?
Also we may want different set of indexes on historic table(s) for
whatever reason
And we may even want to partition history tables for speed, storage
cost or just to drop very ancient history
On Mon, Sep 20, 2021 at 7:09 AM Corey Huinker <corey.huinker@gmail.com> wrote: > > On Sun, Sep 19, 2021 at 3:12 PM Hannu Krosing <hannuk@google.com> wrote: >> >> A side table has the nice additional benefit that we can very easily >> version the *table structure* so when we ALTER TABLE and the table >> structure changes we just make a new side table with now-currents >> structure. > > > It's true that would allow for perfect capture of changes to the table structure, but how would you query the thing? > > If a system versioned table was created with a column foo that is type float, and then we dropped that column, how wouldwe ever query what the value of foo was in the past? We can query that thing only in tables AS OF the time when the column was still there. We probably could get away with pretending the dropped columns to be NULL in newer versions (and the versions before the column was added) Even more tricky case would be changing the column data type. > > Would the columns returned from SELECT * change based on the timeframe requested? If we want to emulate real table history, then it should. But the * thing was not really specified well even for original PostgreSQL inheritance. Maybe we could do SELECT (* AS OF 'yesterday afternoon'::timestamp) FROM ... :) > If we then later added another column that happened to also be named foo but now was type JSONB, would we change the datatypereturned based on the time period being queried? Many databases do allow returning multiple result sets, and actually the PostgreSQL wire *protocol* also theoretically supports this, just that it is not supported by any current client library. With current libraries it would be possible to return a dynamic version of row_to_json(t.*) which changes based on actual historical table structure > Is the change in structure a system versioning which itself must be captured? We do capture it (kind of) for logical decoding. That is, we decode according to the structure in effect at the time of row creation, though we currently miss the actual DDL itself. So there is a lot to figure out and define, but at least storing the history in a separate table gives a good foundation to build upon. ----- Hannu Krosing Google Cloud - We have a long list of planned contributions and we are hiring. Contact me if interested.
> On 19 Sep 2021, at 20:32, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > My preferred approach would be to do this "for free" in the table > access method, but we're a long way from this in terms of actual > implementation. When Corey suggested earlier that we just put the > syntax in there, this was the direction I was thinking. > > After waiting a day since I wrote the above, I think we should go with > (2) as Corey suggests, at least for now, and we can always add (3) > later. This patch no longer applies, are there plans on implementing the approaches discussed above, or should we close this entry and open a new one when a freshly baked pathed is ready? -- Daniel Gustafsson https://vmware.com/
On 11/15/21 10:47 AM, Daniel Gustafsson wrote: >> On 19 Sep 2021, at 20:32, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > >> My preferred approach would be to do this "for free" in the table >> access method, but we're a long way from this in terms of actual >> implementation. When Corey suggested earlier that we just put the >> syntax in there, this was the direction I was thinking. >> >> After waiting a day since I wrote the above, I think we should go with >> (2) as Corey suggests, at least for now, and we can always add (3) >> later. > > This patch no longer applies, are there plans on implementing the approaches > discussed above, or should we close this entry and open a new one when a > freshly baked pathed is ready? I spent a lot of time a while ago trying to fix this patch (not just rebase it), and I think it should just be rejected, unfortunately. The design decisions are just too flawed, and it conflates system_time periods with system versioning which is very wrong. -- Vik Fearing
On Mon, 15 Nov 2021 at 09:47, Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 19 Sep 2021, at 20:32, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > > My preferred approach would be to do this "for free" in the table > > access method, but we're a long way from this in terms of actual > > implementation. When Corey suggested earlier that we just put the > > syntax in there, this was the direction I was thinking. > > > > After waiting a day since I wrote the above, I think we should go with > > (2) as Corey suggests, at least for now, and we can always add (3) > > later. > > This patch no longer applies, are there plans on implementing the approaches > discussed above, or should we close this entry and open a new one when a > freshly baked pathed is ready? As I mentioned upthread, there are at least 2 different ways forward (1) and (2), both of which need further work. I don't think that additional work is impossible, but it is weeks of work, not days and it needs to be done in collaboration with other thoughts on other threads Corey refers to. I have no plans on taking this patch further, but will give some help to anyone that wishes to do that. I suggest we Return with Feedback. -- Simon Riggs http://www.EnterpriseDB.com/
> On 15 Nov 2021, at 11:50, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > I have no plans on taking this patch further, but will give some help > to anyone that wishes to do that. > > I suggest we Return with Feedback. Fair enough, done that way. -- Daniel Gustafsson https://vmware.com/
Chiming in as a user, not so much a developer - I've been using system versioned tables in MariaDB for about half a year now, would just like to add some feedback about what they did right and wrong and how PG could learn from their mistakes & successes. > 2. Putting data in a side table. This makes DROP SYSTEM VERSIONING > fairly trivial, but it complicates many DDL commands (please make a > list?) and requires the optimizer to know about this and cater to it, > possibly complicating plans. Neither issue is insurmountable, but it > becomes more intrusive. I'd vouch for this being the way to go; you completely sidestep issues like partitioning, unique constraints, optimization, etc. Especially true when 90% of the time, SELECTs will only be looking at currently-active data. MDB seems to have gone with the single-table approach (unless you partition) and I've run into a bug where I can't add a unique constraint because historical data fails. #### System versioning & Application versioning I saw that there is an intent to harmonize system versioning with application versioning. Haven't read the AV thread so not positive if that meant intending to split tables by application versioning and system versioning both: to me it seems like maybe it would be good to use a separate table for SV, but keep AV in the same table. Reasons include: - ISO states only one AV config per table, but there's no reason this always has to be the case; maybe you're storing products that are active for a period of time, EOL for a period of time, and obsolete for a period of time. If ISO sometime decides >1 AV config is OK, there would be a mess trying to split that into tables. - DB users who are allowed to change AV items likely won't be allowed to rewrite history by changing SV items. My proposed schema would keep these separate. - Table schemas change, and all (SV active) AV items would logically need to fit the active schema or be updated to do so. Different story for SV, nothing there should ever need to be changed. - Partitioning for AV tables isn't as clear as with SV and is likely better to be user-defined Sorry for acronyms, SV=system versioning, AV=application versioning In general, I think AV should be treated literally as extra rows in the main DB, plus the extra PK element and shortcut functions. SV though, needs to have a lot more nuance. #### ALTER TABLE On to ideas about how ALTER TABLE could work. I don't think the question was ever answered "Do schema changes need to be tracked?" I'm generally in favor of saying that it should be possible to recreate the table exactly as it was, schema and all, at a specific period of time (perhaps for a view) using a fancy combination of SELECT ... AS and such - but it doesn't need to be straightforward. In any case, no data should ever be deleted by ALTER TABLE. As someone pointed out earlier, speed and storage space of ALTER TABLE are likely low considerations for system versioned tables. - ADD COLUMN easy, add the column to both the current and historical table, all null in historical - DROP COLUMN delete the column from the current table. Historical is difficult, because what happens if a new column with the same name is added? Maybe `DROP COLUMN col1` would rename col1 to _col1_1642929683 (epoch time) in the historical table or something like that. - RENAME COLUMN is a bit tricky too - from a usability standpoint, the historical table should be renamed as well. A quick thought is maybe `RENAME col1 TO new_name` would perform the rename in the historical table, but also create _col1_1642929683 as an alias to new_name to track that there was a change. I don't think there would be any name violations in the history table because there would never be a column name in history that isn't in current (because of the rename described with DROP). - Changing column data type: ouch. This needs to be mainly planned for cases where data types are incompatible, possibly optimized for times when they are compatible. Seems like another _col1_1642929683 rename would be in order, and a new col1 created with the new datatype, and a historical SELECT would automatically merge the two. Possible optimization: if the old type fits into the new type, just change the data type in history and make _col1_1642929683 an alias to it. - Change defaults, nullability, constraints, etc: I think these can safely be done for the current table only. Realistically, historical tables could probably skip all checks, always (except their tuple PK), since trying to enforce them would just be opening the door to bugs. Trying to think of any times this isn't true. - FKs: I'm generally in the same boat as above, thinking that these don't need to affect historical tables. Section 2.5 in the paper I link below discusses period joins, but I don't think any special behavior is needed for now. Perhaps references could be kept in history but not enforced - Changing PK / adding/removing more columns to PK: Annoying and not easily dealt with. Maybe just disallow - Triggers: no affect on historical - DROP TABLE bye bye, history & all Things like row level security add extra complication but can probably be disregarded. Maybe just have a `select history` permission or similar. An interesting idea could be to automatically add system versioning to information_schema whenever it is added to a table. This would provide a way to easily query historical DDL. It would also help solve how to keep historical FKs. This would make it possible to perfectly recreate system versioned parts of your database at any period of time, schema and data both. #### Partitioning Allowing for partitioning and automatic rotation seems like a good idea, should be possible with current syntax but maybe worth adding some shortcuts like maria has. #### Permissions - MDB has the new 'delete history' schema privilege that defines who can delete historical data before a certain time or drop system versioning, seems like a good idea to implement. They also require `@@system_versioning_alter_history=keep;` to be set before doing anything ALTER TABLE; doesn't do much outside of serving as a reminder that changing system versioned tables can be dangerous.¯\_(ツ)_/¯ - This part sucks and goes against everything ISO is going for, but IMO there needs to be a way to insert/update/delete historical data. Maybe there needs to be a new superduperuser role to do it and you need to type the table name backwards to verify you want to insert, but situations like data migration, fixing incorrectly stored data, or removing accidental sensitive information demand it. This isn't a priority though, and basic system versioning can be shipped without it. #### Misc - Seems like a good idea to include MDB's option to exclude columns from versioning (`WITHOUT SYSTEM VERSIONING` as a column argument). This is relatively nuanced and I'm not sure if it's officially part of ISO, but probably helpful for frequently updating small data in rows with BLOBs. Easy enough to implement, just forget the column in the historical table. - I thought I saw somewhere that somebody was discussing adding both row_start and row_end to the PK. Why would this be? Row_end should be all that's needed to keep unique, but maybe I misread. #### Links - I haven't seen it linked here yet but this paper does a phenomenal deep dive into SV and AV https://sigmodrecord.org/publications/sigmodRecord/1209/pdfs/07.industry.kulkarni.pdf - It's not perfect, but MDB's system versioning is pretty well thought out. You get a good idea of their thought process going through this page, worth a read https://mariadb.com/kb/en/system-versioned-tables/#excluding-columns-from-versioning #### Finally, the end There's a heck of a lot of thought that could go into this thing, probably worth making sure there's a formal agreement on what to be done before coding starts (PGEP for postgres enhancement proposal, like PEP? Not sure if something like that exists but it probably should.). Large parts of the existing patch could likely be reused for whatever is decided. Best, Trevor On Sun, Jan 23, 2022 at 2:47 AM Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 15 Nov 2021, at 11:50, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > > I have no plans on taking this patch further, but will give some help > > to anyone that wishes to do that. > > > > I suggest we Return with Feedback. > > Fair enough, done that way. > > -- > Daniel Gustafsson https://vmware.com/ > > > > >
> 2. Putting data in a side table. This makes DROP SYSTEM VERSIONING
> fairly trivial, but it complicates many DDL commands (please make a
> list?) and requires the optimizer to know about this and cater to it,
> possibly complicating plans. Neither issue is insurmountable, but it
> becomes more intrusive.
I'd vouch for this being the way to go; you completely sidestep issues
like partitioning, unique constraints, optimization, etc. Especially
true when 90% of the time, SELECTs will only be looking at
currently-active data. MDB seems to have gone with the single-table
approach (unless you partition) and I've run into a bug where I can't
add a unique constraint because historical data fails.
#### System versioning & Application versioning
I saw that there is an intent to harmonize system versioning with
application versioning. Haven't read the AV thread so not positive if
that meant intending to split tables by application versioning and
system versioning both: to me it seems like maybe it would be good to
use a separate table for SV, but keep AV in the same table. Reasons
include:
- ISO states only one AV config per table, but there's no reason this
always has to be the case; maybe you're storing products that are
active for a period of time, EOL for a period of time, and obsolete
for a period of time. If ISO sometime decides >1 AV config is OK,
there would be a mess trying to split that into tables.
- DB users who are allowed to change AV items likely won't be allowed
to rewrite history by changing SV items. My proposed schema would keep
these separate.
- Table schemas change, and all (SV active) AV items would logically
need to fit the active schema or be updated to do so. Different story
for SV, nothing there should ever need to be changed.
- Partitioning for AV tables isn't as clear as with SV and is likely
better to be user-defined
Sorry for acronyms, SV=system versioning, AV=application versioning
In general, I think AV should be treated literally as extra rows in
the main DB, plus the extra PK element and shortcut functions. SV
though, needs to have a lot more nuance.
#### ALTER TABLE
On to ideas about how ALTER TABLE could work. I don't think the
question was ever answered "Do schema changes need to be tracked?" I'm
generally in favor of saying that it should be possible to recreate
the table exactly as it was, schema and all, at a specific period of
time (perhaps for a view) using a fancy combination of SELECT ... AS
and such - but it doesn't need to be straightforward. In any case, no
data should ever be deleted by ALTER TABLE. As someone pointed out
earlier, speed and storage space of ALTER TABLE are likely low
considerations for system versioned tables.
- ADD COLUMN easy, add the column to both the current and historical
table, all null in historical
- DROP COLUMN delete the column from the current table. Historical is
difficult, because what happens if a new column with the same name is
added? Maybe `DROP COLUMN col1` would rename col1 to _col1_1642929683
(epoch time) in the historical table or something like that.
- RENAME COLUMN is a bit tricky too - from a usability standpoint, the
historical table should be renamed as well. A quick thought is maybe
`RENAME col1 TO new_name` would perform the rename in the historical
table, but also create _col1_1642929683 as an alias to new_name to
track that there was a change. I don't think there would be any name
violations in the history table because there would never be a column
name in history that isn't in current (because of the rename described
with DROP).
- Changing column data type: ouch. This needs to be mainly planned for
cases where data types are incompatible, possibly optimized for times
when they are compatible. Seems like another _col1_1642929683 rename
would be in order, and a new col1 created with the new datatype, and a
historical SELECT would automatically merge the two. Possible
optimization: if the old type fits into the new type, just change the
data type in history and make _col1_1642929683 an alias to it.
- Change defaults, nullability, constraints, etc: I think these can
safely be done for the current table only. Realistically, historical
tables could probably skip all checks, always (except their tuple PK),
since trying to enforce them would just be opening the door to bugs.
Trying to think of any times this isn't true.
- FKs: I'm generally in the same boat as above, thinking that these
don't need to affect historical tables. Section 2.5 in the paper I
link below discusses period joins, but I don't think any special
behavior is needed for now. Perhaps references could be kept in
history but not enforced
- Changing PK / adding/removing more columns to PK: Annoying and not
easily dealt with. Maybe just disallow
- Triggers: no affect on historical
- DROP TABLE bye bye, history & all
Things like row level security add extra complication but can probably
be disregarded. Maybe just have a `select history` permission or
similar.
An interesting idea could be to automatically add system versioning to
information_schema whenever it is added to a table. This would provide
a way to easily query historical DDL. It would also help solve how to
keep historical FKs. This would make it possible to perfectly recreate
system versioned parts of your database at any period of time, schema
and data both.
#### Misc
- Seems like a good idea to include MDB's option to exclude columns
from versioning (`WITHOUT SYSTEM VERSIONING` as a column argument).
This is relatively nuanced and I'm not sure if it's officially part of
ISO, but probably helpful for frequently updating small data in rows
with BLOBs. Easy enough to implement, just forget the column in the
historical table.
- I thought I saw somewhere that somebody was discussing adding both
row_start and row_end to the PK. Why would this be? Row_end should be
all that's needed to keep unique, but maybe I misread.
#### Links
- I haven't seen it linked here yet but this paper does a phenomenal
deep dive into SV and AV
https://sigmodrecord.org/publications/sigmodRecord/1209/pdfs/07.industry.kulkarni.pdf
- It's not perfect, but MDB's system versioning is pretty well thought
out. You get a good idea of their thought process going through this
page, worth a read
https://mariadb.com/kb/en/system-versioned-tables/#excluding-columns-from-versioning
#### Finally, the end
There's a heck of a lot of thought that could go into this thing,
probably worth making sure there's a formal agreement on what to be
done before coding starts (PGEP for postgres enhancement proposal,
like PEP? Not sure if something like that exists but it probably
should.). Large parts of the existing patch could likely be reused for
whatever is decided.
On 1/24/22 00:16, Corey Huinker wrote: >> - Table schemas change, and all (SV active) AV items would logically >> need to fit the active schema or be updated to do so. Different story >> for SV, nothing there should ever need to be changed. >> > Yeah, there's a mess (which you state below) about what happens if you > create a table and then rename a column, or drop a column and add a > same-named column back of another type at a later date, etc. In theory, > this means that the valid set of columns and their types changes according > to the time range specified. I may not be remembering correctly, but Vik > stated that the SQL spec seemed to imply that you had to track all those > things. The spec does not allow schema changes at all on a a system versioned table, except to change the system versioning itself. -- Vik Fearing
On 1/24/22 00:16, Corey Huinker wrote:
>> - Table schemas change, and all (SV active) AV items would logically
>> need to fit the active schema or be updated to do so. Different story
>> for SV, nothing there should ever need to be changed.
>>
> Yeah, there's a mess (which you state below) about what happens if you
> create a table and then rename a column, or drop a column and add a
> same-named column back of another type at a later date, etc. In theory,
> this means that the valid set of columns and their types changes according
> to the time range specified. I may not be remembering correctly, but Vik
> stated that the SQL spec seemed to imply that you had to track all those
> things.
The spec does not allow schema changes at all on a a system versioned
table, except to change the system versioning itself.
--
Vik Fearing
The spec does not allow schema changes at all on a a system versioned
table, except to change the system versioning itself.