Обсуждение: AS OF queries

Поиск
Список
Период
Сортировка

AS OF queries

От
Konstantin Knizhnik
Дата:
I wonder if Postgres community is interested in supporting time travel 
queries in PostgreSQL (something like AS OF queries in Oracle: 
https://docs.oracle.com/cd/B14117_01/appdev.101/b10795/adfns_fl.htm).
As far as I know something similar is now developed for MariaDB.

It seems to me that it will be not so difficult to implement them in 
Postgres - we already have versions of tuples.
Looks like we only need to do three things:
1. Disable autovacuum (autovacuum = off)
2. Enable commit timestamp (track_commit_timestamp = on)
3. Add asofTimestamp to snapshot and patch XidInMVCCSnapshot to compare 
commit timestamps when it is specified in snapshot.


Attached please find my prototype implementation of it.
Most of the efforts are needed to support asof timestamp in grammar and 
add it to query plan.
I failed to support AS OF clause (as in Oracle) because of shift-reduce 
conflicts with aliases,
so I have to introduce new ASOF keyword. May be yacc experts can propose 
how to solve this conflict without introducing new keyword...

Please notice that now ASOF timestamp is used only for data snapshot, 
not for catalog snapshot.
I am not sure that it is possible (and useful) to travel through 
database schema history...

Below is an example of how it works:

postgres=# create table foo(pk serial primary key, ts timestamp default 
now(), val text);
CREATE TABLE
postgres=# insert into foo (val) values ('insert');
INSERT 0 1
postgres=# insert into foo (val) values ('insert');
INSERT 0 1
postgres=# insert into foo (val) values ('insert');
INSERT 0 1
postgres=# select * from foo;
  pk |             ts             |  val
----+----------------------------+--------
   1 | 2017-12-20 14:59:17.715453 | insert
   2 | 2017-12-20 14:59:22.933753 | insert
   3 | 2017-12-20 14:59:27.87712  | insert
(3 rows)

postgres=# select * from foo asof timestamp '2017-12-20 14:59:25';
  pk |             ts             |  val
----+----------------------------+--------
   1 | 2017-12-20 14:59:17.715453 | insert
   2 | 2017-12-20 14:59:22.933753 | insert
(2 rows)

postgres=# select * from foo asof timestamp '2017-12-20 14:59:20';
  pk |             ts             |  val
----+----------------------------+--------
   1 | 2017-12-20 14:59:17.715453 | insert
(1 row)

postgres=# update foo set val='upd',ts=now() where pk=1;
UPDATE 1
postgres=# select * from foo asof timestamp '2017-12-20 14:59:20';
  pk |             ts             |  val
----+----------------------------+--------
   1 | 2017-12-20 14:59:17.715453 | insert
(1 row)

postgres=# select * from foo;
  pk |             ts             |  val
----+----------------------------+--------
   2 | 2017-12-20 14:59:22.933753 | insert
   3 | 2017-12-20 14:59:27.87712  | insert
   1 | 2017-12-20 15:09:17.046047 | upd
(3 rows)

postgres=# update foo set val='upd2',ts=now() where pk=1;
UPDATE 1
postgres=# select * from foo asof timestamp '2017-12-20 15:10';
  pk |             ts             |  val
----+----------------------------+--------
   2 | 2017-12-20 14:59:22.933753 | insert
   3 | 2017-12-20 14:59:27.87712  | insert
   1 | 2017-12-20 15:09:17.046047 | upd
(3 rows)


Comments and feedback are welcome:)

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Вложения

Re: AS OF queries

От
Laurenz Albe
Дата:
Konstantin Knizhnik wrote:
> I wonder if Postgres community is interested in supporting time travel 
> queries in PostgreSQL (something like AS OF queries in Oracle: 
> https://docs.oracle.com/cd/B14117_01/appdev.101/b10795/adfns_fl.htm).
> As far as I know something similar is now developed for MariaDB.

I think that would be a good thing to have that could make
the DBA's work easier - all the requests to restore a table
to the state from an hour ago.

> I failed to support AS OF clause (as in Oracle) because of shift-reduce 
> conflicts with aliases,
> so I have to introduce new ASOF keyword. May be yacc experts can propose 
> how to solve this conflict without introducing new keyword...

I think it would be highly desirable to have AS OF, because that's
the way the SQL standard has it.

Yours,
Laurenz Albe


Re: AS OF queries

От
Konstantin Knizhnik
Дата:


On 20.12.2017 16:12, Laurenz Albe wrote:
Konstantin Knizhnik wrote:
I wonder if Postgres community is interested in supporting time travel 
queries in PostgreSQL (something like AS OF queries in Oracle: 
https://docs.oracle.com/cd/B14117_01/appdev.101/b10795/adfns_fl.htm).
As far as I know something similar is now developed for MariaDB.
I think that would be a good thing to have that could make
the DBA's work easier - all the requests to restore a table
to the state from an hour ago.

Please notice that it is necessary to configure postgres in proper way in order to be able to perform time travels.
If you do not disable autovacuum, then old versions will be just cleaned-up.
If transaction commit timestamps are not tracked, then it is not possible to locate required timeline.

So DBA should make a decision in advance whether this feature is needed or not.
It is not a proper instrument for restoring/auditing existed database which was not configured to keep all versions.

May be it is better to add special configuration parameter for this feature which should implicitly toggle

autovacuum and track_commit_timestamp parameters).

The obvious drawbacks of keeping all versions are
1. Increased size of database.
2. Decreased query execution speed because them need to traverse a lot of not visible versions.

So may be in practice it will be useful to limit lifetime of versions.
I failed to support AS OF clause (as in Oracle) because of shift-reduce 
conflicts with aliases,
so I have to introduce new ASOF keyword. May be yacc experts can propose 
how to solve this conflict without introducing new keyword...
I think it would be highly desirable to have AS OF, because that's
the way the SQL standard has it.
Completely agree  with you: I just give up after few hours of attempts to make bison to resolve this conflicts.


Yours,
Laurenz Albe

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: AS OF queries

От
Joe Wildish
Дата:
On 20 Dec 2017, at 13:48, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

On 20.12.2017 16:12, Laurenz Albe wrote:
Konstantin Knizhnik wrote:
I wonder if Postgres community is interested in supporting time travel 
queries in PostgreSQL (something like AS OF queries in Oracle: 
https://docs.oracle.com/cd/B14117_01/appdev.101/b10795/adfns_fl.htm).
As far as I know something similar is now developed for MariaDB.
I think that would be a good thing to have that could make
the DBA's work easier - all the requests to restore a table
to the state from an hour ago.

Please notice that it is necessary to configure postgres in proper way in order to be able to perform time travels.
If you do not disable autovacuum, then old versions will be just cleaned-up.
If transaction commit timestamps are not tracked, then it is not possible to locate required timeline.

So DBA should make a decision in advance whether this feature is needed or not.
It is not a proper instrument for restoring/auditing existed database which was not configured to keep all versions.

May be it is better to add special configuration parameter for this feature which should implicitly toggle

autovacuum and track_commit_timestamp parameters).


I seem to recall that Oracle handles this by requiring tables that want the capability to live within a tablespace that supports flashback. That tablespace is obviously configured to retain redo/undo logs. It would be nice if the vacuuming process could be configured in a similar manner. I have no idea if it would make sense on a tablespace basis or not, though — I’m not entirely sure how analogous they are between Postgres & Oracle as I’ve never used tablespaces in Postgres.

-Joe

Re: AS OF queries

От
Laurenz Albe
Дата:
Konstantin Knizhnik wrote:
> Please notice that it is necessary to configure postgres in proper way in order to be able to perform time travels.
> If you do not disable autovacuum, then old versions will be just cleaned-up.
> If transaction commit timestamps are not tracked, then it is not possible to locate required timeline. 
> 
> So DBA should make a decision in advance whether this feature is needed or not.
> It is not a proper instrument for restoring/auditing existed database which was not configured to keep all versions.

Of course; you'd have to anticipate the need to travel in time,
and you have to pay the price for it.
Anybody who has read science fiction stories know that time travel
does not come free.

> May be it is better to add special configuration parameter for this feature which should implicitly toggle 
> autovacuum and track_commit_timestamp parameters).

The feature would be most useful with some kind of "moving xid
horizon" that guarantees that only dead tuples whose xmax lies
more than a certain time interval in the past can be vacuumed.

Yours,
Laurenz Albe


Re: AS OF queries

От
Tom Lane
Дата:
Laurenz Albe <laurenz.albe@cybertec.at> writes:
> Konstantin Knizhnik wrote:
>> I failed to support AS OF clause (as in Oracle) because of shift-reduce 
>> conflicts with aliases,
>> so I have to introduce new ASOF keyword. May be yacc experts can propose 
>> how to solve this conflict without introducing new keyword...

> I think it would be highly desirable to have AS OF, because that's
> the way the SQL standard has it.

Please say that's just an Oracle-ism and not SQL standard, because it's
formally ambiguous.  This is required to work by spec:

regression=# select x as of from (values(1)) t(x);
 of 
----
  1
(1 row)

so it's not possible for us ever to support an expression that includes
top-level "AS OF" (or, pretty much, "AS anything") without some rather
enormous pushups.

If we absolutely had to do it, the path to a solution would involve some
lexer-level lookahead, cf base_yylex() --- but that's messy and tends to
introduce its own set of corner case misbehaviors.  I'd much rather use a
syntax that wasn't chosen with blind disregard for SQL's existing
syntactic constraints.

            regards, tom lane


Re: AS OF queries

От
David Fetter
Дата:
On Wed, Dec 20, 2017 at 03:03:50PM +0100, Laurenz Albe wrote:
> Konstantin Knizhnik wrote:
> > Please notice that it is necessary to configure postgres in proper
> > way in order to be able to perform time travels.  If you do not
> > disable autovacuum, then old versions will be just cleaned-up.  If
> > transaction commit timestamps are not tracked, then it is not
> > possible to locate required timeline. 
> > 
> > So DBA should make a decision in advance whether this feature is
> > needed or not.  It is not a proper instrument for
> > restoring/auditing existed database which was not configured to
> > keep all versions.
> 
> Of course; you'd have to anticipate the need to travel in time, and
> you have to pay the price for it.  Anybody who has read science
> fiction stories know that time travel does not come free.

A few extra terabytes' worth of storage space is a pretty small price
to pay, at least on the scale of time travel penalties.

> > May be it is better to add special configuration parameter for
> > this feature which should implicitly toggle autovacuum and
> > track_commit_timestamp parameters).
> 
> The feature would be most useful with some kind of "moving xid
> horizon" that guarantees that only dead tuples whose xmax lies more
> than a certain time interval in the past can be vacuumed.

+1 for this horizon.  It would be very nice, but maybe not strictly
necessary, for this to be adjustable downward without a restart.

It's not clear that adjusting it upward should work at all, but if it
did, the state of dead tuples would have to be known, and they'd have
to be vacuumed a way that was able to establish a guarantee of
gaplessness at least back to the new horizon.  Maybe there could be
some kind of "high water mark" for it.  Would that impose overhead or
design constraints on vacuum that we don't want?

Also nice but not strictly necessary, making it tunable per relation,
or at least per table.  I'm up in the air as to whether queries with
an AS OF older than the horizon[1] should error out or merely throw
warnings.

Best,
David.

[1] If we allow setting this at granularities coarser than DB
instance, this means going as far back as the relationship with the
newest "last" tuple among the relations involved in the query.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: AS OF queries

От
Peter Eisentraut
Дата:
On 12/20/17 10:29, Tom Lane wrote:
> Please say that's just an Oracle-ism and not SQL standard, because it's
> formally ambiguous.  This is required to work by spec:
> 
> regression=# select x as of from (values(1)) t(x);
>  of 
> ----
>   1
> (1 row)
> 
> so it's not possible for us ever to support an expression that includes
> top-level "AS OF" (or, pretty much, "AS anything") without some rather
> enormous pushups.

The SQL standard syntax appears to be something like

"tablename" [ AS OF SYSTEM TIME 'something' ] [ [ AS ] "alias" ]

That's not going to be fun to parse.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: AS OF queries

От
Tom Lane
Дата:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 12/20/17 10:29, Tom Lane wrote:
>> Please say that's just an Oracle-ism and not SQL standard, because it's
>> formally ambiguous.

> The SQL standard syntax appears to be something like

> "tablename" [ AS OF SYSTEM TIME 'something' ] [ [ AS ] "alias" ]

> That's not going to be fun to parse.

Bleah.  In principle we could look two tokens ahead so as to recognize
"AS OF SYSTEM", but base_yylex is already a horrid mess with one-token
lookahead; I don't much want to try to extend it to that.

Possibly the most workable compromise is to use lookahead to convert
"AS OF" to "AS_LA OF", and then we could either just break using OF
as an alias, or add an extra production that allows "AS_LA OF" to
be treated as "AS alias" if it's not followed by the appropriate
stuff.

It's a shame that the SQL committee appears to be so ignorant of
standard parsing technology.

            regards, tom lane


Re: AS OF queries

От
Magnus Hagander
Дата:
On Wed, Dec 20, 2017 at 5:17 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 12/20/17 10:29, Tom Lane wrote:
> Please say that's just an Oracle-ism and not SQL standard, because it's
> formally ambiguous.  This is required to work by spec:
>
> regression=# select x as of from (values(1)) t(x);
>  of
> ----
>   1
> (1 row)
>
> so it's not possible for us ever to support an expression that includes
> top-level "AS OF" (or, pretty much, "AS anything") without some rather
> enormous pushups.

The SQL standard syntax appears to be something like

"tablename" [ AS OF SYSTEM TIME 'something' ] [ [ AS ] "alias" ]

That's not going to be fun to parse.

There was a presentation about this given at FOSDEM PGDay a couple of years back. Slides at https://wiki.postgresql.org/images/6/64/Fosdem20150130PostgresqlTemporal.pdf


--

Re: AS OF queries

От
Pantelis Theodosiou
Дата:


On Wed, Dec 20, 2017 at 4:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 12/20/17 10:29, Tom Lane wrote:
>> Please say that's just an Oracle-ism and not SQL standard, because it's
>> formally ambiguous.

> The SQL standard syntax appears to be something like

> "tablename" [ AS OF SYSTEM TIME 'something' ] [ [ AS ] "alias" ]

> That's not going to be fun to parse.

Examples from DB2 documentation (which may be closer to the standard):

SELECT coverage_amt
FROM policy FOR SYSTEM_TIME AS OF '2010-12-01'
WHERE id = 1111;


SELECT count(*)
FROM policy FOR SYSTEM_TIME FROM '2011-11-30'
                              TO '9999-12-30'
WHERE vin = 'A1111';


So besides AS .. AS , it could also be  FROM .. FROM


Bleah.  In principle we could look two tokens ahead so as to recognize
"AS OF SYSTEM", but base_yylex is already a horrid mess with one-token
lookahead; I don't much want to try to extend it to that.

Possibly the most workable compromise is to use lookahead to convert
"AS OF" to "AS_LA OF", and then we could either just break using OF
as an alias, or add an extra production that allows "AS_LA OF" to
be treated as "AS alias" if it's not followed by the appropriate
stuff.

It's a shame that the SQL committee appears to be so ignorant of
standard parsing technology.

                        regards, tom lane


Re: AS OF queries

От
Alvaro Hernandez
Дата:


On 20/12/17 14:48, Konstantin Knizhnik wrote:


On 20.12.2017 16:12, Laurenz Albe wrote:
Konstantin Knizhnik wrote:
I wonder if Postgres community is interested in supporting time travel 
queries in PostgreSQL (something like AS OF queries in Oracle: 
https://docs.oracle.com/cd/B14117_01/appdev.101/b10795/adfns_fl.htm).
As far as I know something similar is now developed for MariaDB.
I think that would be a good thing to have that could make
the DBA's work easier - all the requests to restore a table
to the state from an hour ago.

Please notice that it is necessary to configure postgres in proper way in order to be able to perform time travels.

    This makes sense. BTW, I believe this feature would be an amazing addition to PostgreSQL.


If you do not disable autovacuum, then old versions will be just cleaned-up.
If transaction commit timestamps are not tracked, then it is not possible to locate required timeline.

So DBA should make a decision in advance whether this feature is needed or not.
It is not a proper instrument for restoring/auditing existed database which was not configured to keep all versions.

May be it is better to add special configuration parameter for this feature which should implicitly toggle

autovacuum and track_commit_timestamp parameters).

    Downthread a "moving xid horizon" is proposed. I believe this is not too user friendly. I'd rather use a timestamp horizon (e.g. "up to 2 days ago"). Given that the commit timestamp is tracked, I don't think this is an issue. This is the same as the undo_retention in Oracle, which is expressed in seconds.



The obvious drawbacks of keeping all versions are
1. Increased size of database.
2. Decreased query execution speed because them need to traverse a lot of not visible versions.

    In other words, what is nowadays called "bloat". I have seen in the field a lot of it. Not everybody tunes vacuum to keep up to date. So I don't expect this feature to be too expensive for many. While at the same time an awesome addition, not to fire a new separate server and exercise PITR, and then find the ways to move the old data around.


    Regards,

    Álvaro


-- 

Alvaro Hernandez


-----------
OnGres

Re: AS OF queries

От
Craig Ringer
Дата:
On 21 December 2017 at 00:17, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 12/20/17 10:29, Tom Lane wrote:
> Please say that's just an Oracle-ism and not SQL standard, because it's
> formally ambiguous.  This is required to work by spec:
>
> regression=# select x as of from (values(1)) t(x);
>  of
> ----
>   1
> (1 row)
>
> so it's not possible for us ever to support an expression that includes
> top-level "AS OF" (or, pretty much, "AS anything") without some rather
> enormous pushups.

The SQL standard syntax appears to be something like

"tablename" [ AS OF SYSTEM TIME 'something' ] [ [ AS ] "alias" ]

That's not going to be fun to parse.

Well, the SQL committe seem to specialise in parser torture.

Window functions, anybody?

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: AS OF queries

От
Konstantin Knizhnik
Дата:

On 20.12.2017 19:26, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> On 12/20/17 10:29, Tom Lane wrote:
>>> Please say that's just an Oracle-ism and not SQL standard, because it's
>>> formally ambiguous.
>> The SQL standard syntax appears to be something like
>> "tablename" [ AS OF SYSTEM TIME 'something' ] [ [ AS ] "alias" ]
>> That's not going to be fun to parse.
> Bleah.  In principle we could look two tokens ahead so as to recognize
> "AS OF SYSTEM", but base_yylex is already a horrid mess with one-token
> lookahead; I don't much want to try to extend it to that.
>
> Possibly the most workable compromise is to use lookahead to convert
> "AS OF" to "AS_LA OF", and then we could either just break using OF
> as an alias, or add an extra production that allows "AS_LA OF" to
> be treated as "AS alias" if it's not followed by the appropriate
> stuff.
>
> It's a shame that the SQL committee appears to be so ignorant of
> standard parsing technology.
>
>             regards, tom lane

Thank you for suggestion with AS_LA: it really works.
Actually instead of AS_LA I just return ASOF token if next token after 
AS is OF.
So now it is possible to write query in this way:

     select * from foo as of timestamp '2017-12-21 14:12:15.1867';

There is still one significant difference of my prototype implementation 
with SQL standard: it associates timestamp with select statement, not 
with particular table.
It seems to be more difficult to support and I am not sure that joining 
tables from different timelines has much sense.
But certainly it also can be fixed.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Вложения

Re: AS OF queries

От
David Fetter
Дата:
On Thu, Dec 21, 2017 at 05:00:35PM +0300, Konstantin Knizhnik wrote:
> On 20.12.2017 19:26, Tom Lane wrote:
> >Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> >>On 12/20/17 10:29, Tom Lane wrote:
> >>>Please say that's just an Oracle-ism and not SQL standard, because it's
> >>>formally ambiguous.
> >>The SQL standard syntax appears to be something like
> >>"tablename" [ AS OF SYSTEM TIME 'something' ] [ [ AS ] "alias" ]
> >>That's not going to be fun to parse.
> >Bleah.  In principle we could look two tokens ahead so as to recognize
> >"AS OF SYSTEM", but base_yylex is already a horrid mess with one-token
> >lookahead; I don't much want to try to extend it to that.
> >
> >Possibly the most workable compromise is to use lookahead to convert
> >"AS OF" to "AS_LA OF", and then we could either just break using OF
> >as an alias, or add an extra production that allows "AS_LA OF" to
> >be treated as "AS alias" if it's not followed by the appropriate
> >stuff.
> >
> >It's a shame that the SQL committee appears to be so ignorant of
> >standard parsing technology.
> >
> >            regards, tom lane
> 
> Thank you for suggestion with AS_LA: it really works.
> Actually instead of AS_LA I just return ASOF token if next token after AS is
> OF.
> So now it is possible to write query in this way:
> 
>     select * from foo as of timestamp '2017-12-21 14:12:15.1867';

Thanks for your hard work so far on this!  It looks really exciting.

> There is still one significant difference of my prototype implementation
> with SQL standard: it associates timestamp with select statement, not with
> particular table.
> It seems to be more difficult to support and I am not sure that joining
> tables from different timelines has much sense.

I can think of a use case right offhand that I suspect would be very
common: comparing the state of a table at multiple times.

> But certainly it also can be fixed.

That would be really fantastic.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: AS OF queries

От
Greg Stark
Дата:
On 20 December 2017 at 12:45, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

> It seems to me that it will be not so difficult to implement them in
> Postgres - we already have versions of tuples.
> Looks like we only need to do three things:
> 1. Disable autovacuum (autovacuum = off)

"The Wheel of Time turns, and Ages come and pass, leaving memories
that become legend. Legend fades to myth, and even myth is long
forgotten when the Age that gave it birth comes again"

I think you'll find it a lot harder to get this to work than just
disabling autovacuum. Notably HOT updates can get cleaned up (and even
non-HOT updates can now leave tombstone dead line pointers iirc) even
if vacuum hasn't run.

We do have the infrastructure to deal with that. c.f.
vacuum_defer_cleanup_age. So in _theory_ you could create a snapshot
with xmin older than recent_global_xmin as long as it's not more than
vacuum_defer_cleanup_age older. But the devil will be in the details.
It does mean that you'll be making recent_global_xmin move backwards
which it has always been promised to *not* do

Then there's another issue that logical replication has had to deal
with -- catalog changes. You can't start looking at tuples that have a
different structure than the current catalog unless you can figure out
how to use the logical replication infrastructure to use the old
catalogs. That's a huge problem to bite off and probably can just be
left for another day if you can find a way to reliably detect the
problem and raise an error if the schema is inconsistent.

Postgres used to have time travel. I think it's come up more than once
in the pasts as something that can probably never come back due to
other decisions made. If more decisions have made it possible again
that will be fascinating.

-- 
greg


Re: AS OF queries

От
Michael Paquier
Дата:
On Fri, Dec 22, 2017 at 11:08:02PM +0000, Greg Stark wrote:
> On 20 December 2017 at 12:45, Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>
> > It seems to me that it will be not so difficult to implement them in
> > Postgres - we already have versions of tuples.
> > Looks like we only need to do three things:
> > 1. Disable autovacuum (autovacuum = off)
>
> "The Wheel of Time turns, and Ages come and pass, leaving memories
> that become legend. Legend fades to myth, and even myth is long
> forgotten when the Age that gave it birth comes again"

I would be amazed if you have been able to finish the 14 volumes of the
series. There is a lot of content to take.

> Postgres used to have time travel. I think it's come up more than once
> in the pasts as something that can probably never come back due to
> other decisions made. If more decisions have made it possible again
> that will be fascinating.

This subject is showing up a couple of times lately, things would
be interested to see. What I am sure about is that people are not
willing to emulate that with triggers and two extra columns per table.
--
Michael

Вложения

Re: AS OF queries

От
konstantin knizhnik
Дата:
On Dec 23, 2017, at 2:08 AM, Greg Stark wrote:

> On 20 December 2017 at 12:45, Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>
>> It seems to me that it will be not so difficult to implement them in
>> Postgres - we already have versions of tuples.
>> Looks like we only need to do three things:
>> 1. Disable autovacuum (autovacuum = off)
>
> "The Wheel of Time turns, and Ages come and pass, leaving memories
> that become legend. Legend fades to myth, and even myth is long
> forgotten when the Age that gave it birth comes again"
>
> I think you'll find it a lot harder to get this to work than just
> disabling autovacuum. Notably HOT updates can get cleaned up (and even
> non-HOT updates can now leave tombstone dead line pointers iirc) even
> if vacuum hasn't run.
>

Yeh, I suspected that just disabling autovacuum was not enough.
I heard (but do no know too much) about microvacuum and hot updates.
This is why I was a little bit surprised when me test didn't show lost of updated versions.
May be it is because of vacuum_defer_cleanup_age.

> We do have the infrastructure to deal with that. c.f.
> vacuum_defer_cleanup_age. So in _theory_ you could create a snapshot
> with xmin older than recent_global_xmin as long as it's not more than
> vacuum_defer_cleanup_age older. But the devil will be in the details.
> It does mean that you'll be making recent_global_xmin move backwards
> which it has always been promised to *not* do

But what if I just forbid to change recent_global_xmin?
If it is stalled at FirstNormalTransactionId and never changed?
Will it protect all versions from been deleted?

>
> Then there's another issue that logical replication has had to deal
> with -- catalog changes. You can't start looking at tuples that have a
> different structure than the current catalog unless you can figure out
> how to use the logical replication infrastructure to use the old
> catalogs. That's a huge problem to bite off and probably can just be
> left for another day if you can find a way to reliably detect the
> problem and raise an error if the schema is inconsistent.


Yes, catalog changes this is another problem of time travel.
I do not know any suitable way to handle several different catalog snapshots in one query.
But I think that there are a lot of cases where time travels without possibility of database schema change still will
beuseful. 
The question is how we should handle such catalog changes if them are happen. Ideally we should not allow to move back
beyond this point. 
Unfortunately it is not so easy to implement.


>
> Postgres used to have time travel. I think it's come up more than once
> in the pasts as something that can probably never come back due to
> other decisions made. If more decisions have made it possible again
> that will be fascinating.
>
> --
> greg



Re: AS OF queries

От
Alvaro Hernandez
Дата:

On 21/12/17 15:00, Konstantin Knizhnik wrote:
>
>
> On 20.12.2017 19:26, Tom Lane wrote:
>> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>>> On 12/20/17 10:29, Tom Lane wrote:
>>>> Please say that's just an Oracle-ism and not SQL standard, because 
>>>> it's
>>>> formally ambiguous.
>>> The SQL standard syntax appears to be something like
>>> "tablename" [ AS OF SYSTEM TIME 'something' ] [ [ AS ] "alias" ]
>>> That's not going to be fun to parse.
>> Bleah.  In principle we could look two tokens ahead so as to recognize
>> "AS OF SYSTEM", but base_yylex is already a horrid mess with one-token
>> lookahead; I don't much want to try to extend it to that.
>>
>> Possibly the most workable compromise is to use lookahead to convert
>> "AS OF" to "AS_LA OF", and then we could either just break using OF
>> as an alias, or add an extra production that allows "AS_LA OF" to
>> be treated as "AS alias" if it's not followed by the appropriate
>> stuff.
>>
>> It's a shame that the SQL committee appears to be so ignorant of
>> standard parsing technology.
>>
>>             regards, tom lane
>
> Thank you for suggestion with AS_LA: it really works.
> Actually instead of AS_LA I just return ASOF token if next token after 
> AS is OF.
> So now it is possible to write query in this way:
>
>     select * from foo as of timestamp '2017-12-21 14:12:15.1867';
>
> There is still one significant difference of my prototype 
> implementation with SQL standard: it associates timestamp with select 
> statement, not with particular table.
> It seems to be more difficult to support and I am not sure that 
> joining tables from different timelines has much sense.
> But certainly it also can be fixed.

     If the standard is "AS OF SYSTEM TIME" and we're going to deviate 
and go for "AS OF TIMESTAMP", I'd recommend then, if possible, to:

- Make "TIMESTAMP" optional, i.e., "AS OF [TIMESTAMP] <timestamp>"

- Augment the syntax to support also a transaction id, similar to 
Oracle's "AS OF SCN <scn>": "AS OF TRANSACTION <txid>".


     Merry Christmas,

     Álvaro

-- 

Alvaro Hernandez


-----------
OnGres



Re: AS OF queries

От
Craig Ringer
Дата:
On 24 December 2017 at 04:53, konstantin knizhnik <k.knizhnik@postgrespro.ru> wrote:


But what if I just forbid to change recent_global_xmin?
If it is stalled at FirstNormalTransactionId and never changed?
Will it protect all versions from been deleted?

That's totally impractical, you'd have unbounded bloat and a nonfunctional system in no time.

You'd need a mechanism - akin to what we have with replication slots - to set a threshold for age.

> Then there's another issue that logical replication has had to deal
> with -- catalog changes. You can't start looking at tuples that have a
> different structure than the current catalog unless you can figure out
> how to use the logical replication infrastructure to use the old
> catalogs. That's a huge problem to bite off and probably can just be
> left for another day if you can find a way to reliably detect the
> problem and raise an error if the schema is inconsistent.


Yes, catalog changes this is another problem of time travel.
I do not know any suitable way to handle several different catalog snapshots in one query.

I doubt it's practical unless you can extract it to subplans that can be materialized separately. Even then, UDTs, rowtype results, etc...
 
The question is how we should handle such catalog changes if them are happen. Ideally we should not allow to move back beyond  this point.
Unfortunately it is not so easy to implement.

I think you can learn a lot from studying logical decoding here.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: AS OF queries

От
Konstantin Knizhnik
Дата:


On 25.12.2017 06:26, Craig Ringer wrote:
On 24 December 2017 at 04:53, konstantin knizhnik <k.knizhnik@postgrespro.ru> wrote:


But what if I just forbid to change recent_global_xmin?
If it is stalled at FirstNormalTransactionId and never changed?
Will it protect all versions from been deleted?

That's totally impractical, you'd have unbounded bloat and a nonfunctional system in no time.

You'd need a mechanism - akin to what we have with replication slots - to set a threshold for age.

Well, there are systems with "never delete" and "append only" semantic.
For example, I have participated in SciDB project: database for scientific applications.
One of the key requirements for scientific researches is reproducibility.
From the database point of view it means that we need to store all raw data and never delete it.
If you performed some measurements and made some conclusions based on this results, then everybody should be able to repeat it, even if later
you find some errors in input data and made corrections or just add more data.
So one of the SciDB requirements was to store all versions. Delete operation should just mark data as been deleted (although later we have to add true delete:)

But I agree with you: in most cases more flexible policy of managing versions is needed.
I am not sure that it should be similar with logical replication slot.
Here semantic is quite clear: we preserve segments of WAL until them are replicated to the subscribers.
With time travel situation is less obscure: we may specify some threshold for age - keep data for example for one year.
But what if somebody later wants to access  older data? At this moment them are already lost...

It seems to me that version pinning policy mostly depends on source of the data.
If  them have "append only" semantic (like as raw scientific data, trading data, measurements from IoT sensors...)
then it will be desirable to keep all version forever.
If we speak about OLTP tables (like accounts in pgbench), then may be time travel is not the proper mechanism for such data at all.

I think that in addition to logged/unlogged tables it will be useful to support historical/non-historical tables. Historical table should support time travel, while
non-historical (default) acts like normal table. It is already possible in Postgres to disable autovacuum for particular tables.
But unfortunately trick with snapshot (doesn't matter how we setup oldest xmin horizon) affect all tables.
There is similar (but not the same) problem with logical replication: assume that we need to replicate only one small table. But we have to pin in WAL all updates of other huge table which is not involved in logical replication at all.



> Then there's another issue that logical replication has had to deal
> with -- catalog changes. You can't start looking at tuples that have a
> different structure than the current catalog unless you can figure out
> how to use the logical replication infrastructure to use the old
> catalogs. That's a huge problem to bite off and probably can just be
> left for another day if you can find a way to reliably detect the
> problem and raise an error if the schema is inconsistent.


Yes, catalog changes this is another problem of time travel.
I do not know any suitable way to handle several different catalog snapshots in one query.

I doubt it's practical unless you can extract it to subplans that can be materialized separately. Even then, UDTs, rowtype results, etc...
 

Well, I am really not sure about user's demands to time travel. This is one of the reasons of initiating this discussion in hackers... May be it is not the best place for such discussion, because there are mostly Postgres developers and not users...
At least, from experience of few SciDB customers, I can tell that we didn't have problems with schema evolution: mostly schema is simple, static and well defined.
There was problems with incorrect import of data (this is why we have to add real delete), with splitting data in chunks (partitioning),...

The question is how we should handle such catalog changes if them are happen. Ideally we should not allow to move back beyond  this point.
Unfortunately it is not so easy to implement.

I think you can learn a lot from studying logical decoding here.


Working with multimaster and shardman I have to learn a lot about logical replication.
It is really powerful and flexible mechanism ... with a lot of limitations and problems: lack of catalog replication, inefficient bulk insert, various race conditions,...
But I think that time travel and logical replication are really serving different goals so require different approaches.



--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: AS OF queries

От
Masahiko Sawada
Дата:
On Thu, Dec 21, 2017 at 3:57 AM, Alvaro Hernandez <aht@ongres.com> wrote:
>
>
> On 20/12/17 14:48, Konstantin Knizhnik wrote:
>
>
>
> On 20.12.2017 16:12, Laurenz Albe wrote:
>
> Konstantin Knizhnik wrote:
>
> I wonder if Postgres community is interested in supporting time travel
> queries in PostgreSQL (something like AS OF queries in Oracle:
> https://docs.oracle.com/cd/B14117_01/appdev.101/b10795/adfns_fl.htm).
> As far as I know something similar is now developed for MariaDB.
>
> I think that would be a good thing to have that could make
> the DBA's work easier - all the requests to restore a table
> to the state from an hour ago.
>
>
> Please notice that it is necessary to configure postgres in proper way in
> order to be able to perform time travels.
>
>
>     This makes sense. BTW, I believe this feature would be an amazing
> addition to PostgreSQL.
>
>
> If you do not disable autovacuum, then old versions will be just cleaned-up.
> If transaction commit timestamps are not tracked, then it is not possible to
> locate required timeline.
>
> So DBA should make a decision in advance whether this feature is needed or
> not.
> It is not a proper instrument for restoring/auditing existed database which
> was not configured to keep all versions.
>
> May be it is better to add special configuration parameter for this feature
> which should implicitly toggle
> autovacuum and track_commit_timestamp parameters).
>
>
>     Downthread a "moving xid horizon" is proposed. I believe this is not too
> user friendly. I'd rather use a timestamp horizon (e.g. "up to 2 days ago").
> Given that the commit timestamp is tracked, I don't think this is an issue.
> This is the same as the undo_retention in Oracle, which is expressed in
> seconds.

I agree but since we cannot have same xid beyond xid wraparounds we
would have to remove old tuples even if we're still in the time
interval

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: AS OF queries

От
Hannu Krosing
Дата:
On 20.12.2017 14:45, Konstantin Knizhnik wrote:
> I wonder if Postgres community is interested in supporting time travel
> queries in PostgreSQL (something like AS OF queries in Oracle:
> https://docs.oracle.com/cd/B14117_01/appdev.101/b10795/adfns_fl.htm).
> As far as I know something similar is now developed for MariaDB.
>
> It seems to me that it will be not so difficult to implement them in
> Postgres - we already have versions of tuples.
> Looks like we only need to do three things:
> 1. Disable autovacuum (autovacuum = off)
In the design for original University Postgres ( which was a full
history database geared towards WORM drives )
it was the task of vacuum to move old tuples to "an archive" from where
the AS OF queries would then fetch
them as needed.

This might also be a good place to do Commit LSN to Commit Timestamp
translation

Hannu

> 2. Enable commit timestamp (track_commit_timestamp = on)
> 3. Add asofTimestamp to snapshot and patch XidInMVCCSnapshot to
> compare commit timestamps when it is specified in snapshot.
>
>
> Attached please find my prototype implementation of it.
> Most of the efforts are needed to support asof timestamp in grammar
> and add it to query plan.
> I failed to support AS OF clause (as in Oracle) because of
> shift-reduce conflicts with aliases,
> so I have to introduce new ASOF keyword. May be yacc experts can
> propose how to solve this conflict without introducing new keyword...
>
> Please notice that now ASOF timestamp is used only for data snapshot,
> not for catalog snapshot.
> I am not sure that it is possible (and useful) to travel through
> database schema history...
>
> Below is an example of how it works:
>
> postgres=# create table foo(pk serial primary key, ts timestamp
> default now(), val text);
> CREATE TABLE
> postgres=# insert into foo (val) values ('insert');
> INSERT 0 1
> postgres=# insert into foo (val) values ('insert');
> INSERT 0 1
> postgres=# insert into foo (val) values ('insert');
> INSERT 0 1
> postgres=# select * from foo;
>  pk |             ts             |  val
> ----+----------------------------+--------
>   1 | 2017-12-20 14:59:17.715453 | insert
>   2 | 2017-12-20 14:59:22.933753 | insert
>   3 | 2017-12-20 14:59:27.87712  | insert
> (3 rows)
>
> postgres=# select * from foo asof timestamp '2017-12-20 14:59:25';
>  pk |             ts             |  val
> ----+----------------------------+--------
>   1 | 2017-12-20 14:59:17.715453 | insert
>   2 | 2017-12-20 14:59:22.933753 | insert
> (2 rows)
>
> postgres=# select * from foo asof timestamp '2017-12-20 14:59:20';
>  pk |             ts             |  val
> ----+----------------------------+--------
>   1 | 2017-12-20 14:59:17.715453 | insert
> (1 row)
>
> postgres=# update foo set val='upd',ts=now() where pk=1;
> UPDATE 1
> postgres=# select * from foo asof timestamp '2017-12-20 14:59:20';
>  pk |             ts             |  val
> ----+----------------------------+--------
>   1 | 2017-12-20 14:59:17.715453 | insert
> (1 row)
>
> postgres=# select * from foo;
>  pk |             ts             |  val
> ----+----------------------------+--------
>   2 | 2017-12-20 14:59:22.933753 | insert
>   3 | 2017-12-20 14:59:27.87712  | insert
>   1 | 2017-12-20 15:09:17.046047 | upd
> (3 rows)
>
> postgres=# update foo set val='upd2',ts=now() where pk=1;
> UPDATE 1
> postgres=# select * from foo asof timestamp '2017-12-20 15:10';
>  pk |             ts             |  val
> ----+----------------------------+--------
>   2 | 2017-12-20 14:59:22.933753 | insert
>   3 | 2017-12-20 14:59:27.87712  | insert
>   1 | 2017-12-20 15:09:17.046047 | upd
> (3 rows)
>
>
> Comments and feedback are welcome:)
>

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
https://2ndquadrant.com/



Re: AS OF queries

От
Jeff Janes
Дата:
On Thu, Dec 21, 2017 at 6:00 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
 
There is still one significant difference of my prototype implementation with SQL standard: it associates timestamp with select statement, not with particular table.
It seems to be more difficult to support and I am not sure that joining tables from different timelines has much sense.
But certainly it also can be fixed.

I think the main use I would find for this feature is something like:

select * from foo except select * from foo as old_foo as of '<some time>';

So I would be grateful if you can make that work.  Also, I think conforming to the standards is pretty important where it is feasible to do that.

Cheers,

Jeff

Re: AS OF queries

От
legrand legrand
Дата:
would actual syntax

WITH old_foo AS
(select * from foo as of '<some time>')
select * from foo except select * from old_foo;

work in replacement for

select * from foo except select * from foo as old_foo as of '<some time>';

?

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: AS OF queries

От
David Fetter
Дата:
On Tue, Dec 26, 2017 at 03:43:36PM -0700, legrand legrand wrote:
> would actual syntax
> 
> WITH old_foo AS
> (select * from foo as of '<some time>')
> select * from foo except select * from old_foo;
> 
> work in replacement for
> 
> select * from foo except select * from foo as old_foo as of '<some time>';
> 
> ?

If there has to be a WITH, or (roughly) equivalently, a sub-select for
each relation, the queries get very hairy very quickly.  It would
nevertheless be better for the people who need the feature to have it
this way than not to have it at all.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: AS OF queries

От
Craig Ringer
Дата:
On 25 December 2017 at 15:59, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:


On 25.12.2017 06:26, Craig Ringer wrote:
On 24 December 2017 at 04:53, konstantin knizhnik <k.knizhnik@postgrespro.ru> wrote:


But what if I just forbid to change recent_global_xmin?
If it is stalled at FirstNormalTransactionId and never changed?
Will it protect all versions from been deleted?

That's totally impractical, you'd have unbounded bloat and a nonfunctional system in no time.

You'd need a mechanism - akin to what we have with replication slots - to set a threshold for age.

Well, there are systems with "never delete" and "append only" semantic.
For example, I have participated in SciDB project: database for scientific applications.
One of the key requirements for scientific researches is reproducibility.
From the database point of view it means that we need to store all raw data and never delete it.

PostgreSQL can't cope with that for more than 2^31 xacts, you have to "forget" details of which xacts created/updated tuples and the contents of deleted tuples, or you exceed our xid limit. You'd need 64-bit XIDs, or a redo-buffer based heap model (like the zheap stuff) with redo buffers marked with an xid epoch, or something like that.
 
I am not sure that it should be similar with logical replication slot.
Here semantic is quite clear: we preserve segments of WAL until them are replicated to the subscribers.

Er, what?

This isn't to do with restart_lsn. That's why I mentioned *logical* replication slots.

I'm talking about how they interact with GetOldestXmin using their xmin and catalog_xmin.

You probably won't want to re-use slots, but you'll want something akin to that, a transaction age threshold. Otherwise your system has a finite end date where it can no longer function due to xid count, or if you solve that, it'll slowly choke on table bloat etc. I guess if you're willing to accept truly horrible performance...
 
With time travel situation is less obscure: we may specify some threshold for age - keep data for example for one year.

Sure. You'd likely do that by mapping commit timestamps => xids and using an xid threshold though.
 
But unfortunately trick with snapshot (doesn't matter how we setup oldest xmin horizon) affect all tables.

You'd need to be able to pass more info into HeapTupleSatisfiesMVCC etc. I expect you'd probably add a new snapshot type (like logical decoding did with historic snapshots), that has a new Satisfies function. But you'd have to be able to ensure all snapshot Satisfies callers had the required extra info - like maybe a Relation - which could be awkward for some call sites.

The user would have to be responsible for ensuring sanity of FK relationships etc when specifying different snapshots for different relations.

Per-relation time travel doesn't seem totally impractical so long as you can guarantee that there is some possible snapshot for which the catalogs defining all the relations and types are simultaneously valid, i.e. there's no disjoint set of catalog changes. Avoiding messy performance implications with normal queries might not even be too bad if you use a separate snapshot model, so long as you can avoid callsites having to do extra work in the normal case.

Dealing with dropped columns and rewrites would be a pain though. You'd have to preserve the dropped column data when you re-projected the rewrite tuples.
 
There is similar (but not the same) problem with logical replication: assume that we need to replicate only one small table. But we have to pin in WAL all updates of other huge table which is not involved in logical replication at all.

I don't really see how that's similar. It's concerned with WAL, wheras what you're looking at is heaps and bloat from old versions. Completely different, unless you propose to somehow reconstruct data from old WAL to do historic queries, which would be o_O ...
 
Well, I am really not sure about user's demands to time travel. This is one of the reasons of initiating this discussion in hackers... May be it is not the best place for such discussion, because there are mostly Postgres developers and not users...
At least, from experience of few SciDB customers, I can tell that we didn't have problems with schema evolution: mostly schema is simple, static and well defined.
There was problems with incorrect import of data (this is why we have to add real delete), with splitting data in chunks (partitioning),...

Every system I've ever worked with that has a "static" schema has landed up not being so static after all.

I'm sure there are exceptions, but if you can't cope with catalog changes you've excluded the immense majority of users. Even the ones who promise they don't ever need to change anything ... land up changing things.
 
The question is how we should handle such catalog changes if them are happen. Ideally we should not allow to move back beyond  this point.
Unfortunately it is not so easy to implement.

I think you can learn a lot from studying logical decoding here.


Working with multimaster and shardman I have to learn a lot about logical replication.
It is really powerful and flexible mechanism ... with a lot of limitations and problems: lack of catalog replication, inefficient bulk insert, various race conditions,...
But I think that time travel and logical replication are really serving different goals so require different approaches.

Of course. I'm pointing out that we solve the catalog-change problem using historic snapshots, and THAT is what you'd be wanting to look at. Also what it does with the rewrite map.

However, you'd have a nightmare of a time getting the syscache to deliver you different data depending on which table's catalogs you're looking for. And what if there's some UDT that appears in >1 table with different AS OF times, but with different definitions at different times? Yuck.

More importantly you can't construct a historic snapshot at some arbitrary point in time. It depends on the maintenance of state that's done with logical decoding and xlogreader. So I don't know how you'd construct a historic snapshot for "June 24 at 2:01 am".

Ignoring concerns with catalog changes sounds convenient but in practice it's a total waste of time IMO. If nothing else there's temp tables to deal with. 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: AS OF queries

От
Konstantin Knizhnik
Дата:


On 27.12.2017 10:29, Craig Ringer wrote:
On 25 December 2017 at 15:59, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:


On 25.12.2017 06:26, Craig Ringer wrote:
On 24 December 2017 at 04:53, konstantin knizhnik <k.knizhnik@postgrespro.ru> wrote:


But what if I just forbid to change recent_global_xmin?
If it is stalled at FirstNormalTransactionId and never changed?
Will it protect all versions from been deleted?

That's totally impractical, you'd have unbounded bloat and a nonfunctional system in no time.

You'd need a mechanism - akin to what we have with replication slots - to set a threshold for age.

Well, there are systems with "never delete" and "append only" semantic.
For example, I have participated in SciDB project: database for scientific applications.
One of the key requirements for scientific researches is reproducibility.
From the database point of view it means that we need to store all raw data and never delete it.

PostgreSQL can't cope with that for more than 2^31 xacts, you have to "forget" details of which xacts created/updated tuples and the contents of deleted tuples, or you exceed our xid limit. You'd need 64-bit XIDs, or a redo-buffer based heap model (like the zheap stuff) with redo buffers marked with an xid epoch, or something like that.

Yes, but PgPro-EE already has 64-bit xids and we have spent a lot of time trying to push it to community.

 
I am not sure that it should be similar with logical replication slot.
Here semantic is quite clear: we preserve segments of WAL until them are replicated to the subscribers.

Er, what?

This isn't to do with restart_lsn. That's why I mentioned *logical* replication slots.

I'm talking about how they interact with GetOldestXmin using their xmin and catalog_xmin.

You probably won't want to re-use slots, but you'll want something akin to that, a transaction age threshold. Otherwise your system has a finite end date where it can no longer function due to xid count, or if you solve that, it'll slowly choke on table bloat etc. I guess if you're willing to accept truly horrible performance...

Definitely supporting time travel through frequently updated data may cause database bloat and awful performance.
I still think that this feature will be mostly interesting for append-only/rarely updated data.

In any case I have set vacuum_defer_cleanup_age = 1000000 and run pgbench during several limits.
There was no significant performance degradation.

Unfortunately  replication slots, neither  vacuum_defer_cleanup_age  allows to keep versions just for particular table(s).
And it seems to be the major problem I do not know how to solve now.

 
With time travel situation is less obscure: we may specify some threshold for age - keep data for example for one year.

Sure. You'd likely do that by mapping commit timestamps => xids and using an xid threshold though.
 
But unfortunately trick with snapshot (doesn't matter how we setup oldest xmin horizon) affect all tables.

You'd need to be able to pass more info into HeapTupleSatisfiesMVCC etc. I expect you'd probably add a new snapshot type (like logical decoding did with historic snapshots), that has a new Satisfies function. But you'd have to be able to ensure all snapshot Satisfies callers had the required extra info - like maybe a Relation - which could be awkward for some call sites.


Yes, it seems to be the only possible choice.

The user would have to be responsible for ensuring sanity of FK relationships etc when specifying different snapshots for different relations.

Per-relation time travel doesn't seem totally impractical so long as you can guarantee that there is some possible snapshot for which the catalogs defining all the relations and types are simultaneously valid, i.e. there's no disjoint set of catalog changes. Avoiding messy performance implications with normal queries might not even be too bad if you use a separate snapshot model, so long as you can avoid callsites having to do extra work in the normal case.

Dealing with dropped columns and rewrites would be a pain though. You'd have to preserve the dropped column data when you re-projected the rewrite tuples.
 
There is similar (but not the same) problem with logical replication: assume that we need to replicate only one small table. But we have to pin in WAL all updates of other huge table which is not involved in logical replication at all.

I don't really see how that's similar. It's concerned with WAL, wheras what you're looking at is heaps and bloat from old versions. Completely different, unless you propose to somehow reconstruct data from old WAL to do historic queries, which would be o_O ...
 
Well, I am really not sure about user's demands to time travel. This is one of the reasons of initiating this discussion in hackers... May be it is not the best place for such discussion, because there are mostly Postgres developers and not users...
At least, from experience of few SciDB customers, I can tell that we didn't have problems with schema evolution: mostly schema is simple, static and well defined.
There was problems with incorrect import of data (this is why we have to add real delete), with splitting data in chunks (partitioning),...

Every system I've ever worked with that has a "static" schema has landed up not being so static after all.

I'm sure there are exceptions, but if you can't cope with catalog changes you've excluded the immense majority of users. Even the ones who promise they don't ever need to change anything ... land up changing things.

JSON? :)

 
The question is how we should handle such catalog changes if them are happen. Ideally we should not allow to move back beyond  this point.
Unfortunately it is not so easy to implement.

I think you can learn a lot from studying logical decoding here.


Working with multimaster and shardman I have to learn a lot about logical replication.
It is really powerful and flexible mechanism ... with a lot of limitations and problems: lack of catalog replication, inefficient bulk insert, various race conditions,...
But I think that time travel and logical replication are really serving different goals so require different approaches.

Of course. I'm pointing out that we solve the catalog-change problem using historic snapshots, and THAT is what you'd be wanting to look at. Also what it does with the rewrite map.

However, you'd have a nightmare of a time getting the syscache to deliver you different data depending on which table's catalogs you're looking for. And what if there's some UDT that appears in >1 table with different AS OF times, but with different definitions at different times? Yuck.

More importantly you can't construct a historic snapshot at some arbitrary point in time. It depends on the maintenance of state that's done with logical decoding and xlogreader. So I don't know how you'd construct a historic snapshot for "June 24 at 2:01 am".

Ignoring concerns with catalog changes sounds convenient but in practice it's a total waste of time IMO. If nothing else there's temp tables to deal with. 


Assume we have query

select * from A as old_a as of timestamp '2016-12-01', A as new_a as of timestamp '2017-12-01' where old_a.old_id = new_a.new_id;

where schema of A was changed during this year. We have to carefully specify proper historical snapshots in all places of parse and optimizer deadling with this tables...
I afraid that it will be too complicated.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: AS OF queries

От
PostgreSQL - Hans-Jürgen Schönig
Дата:

On 12/20/2017 01:45 PM, Konstantin Knizhnik wrote:
> I wonder if Postgres community is interested in supporting time travel
> queries in PostgreSQL (something like AS OF queries in Oracle:
> https://docs.oracle.com/cd/B14117_01/appdev.101/b10795/adfns_fl.htm).
> As far as I know something similar is now developed for MariaDB.
>
> It seems to me that it will be not so difficult to implement them in
> Postgres - we already have versions of tuples.
> Looks like we only need to do three things:
> 1. Disable autovacuum (autovacuum = off)
> 2. Enable commit timestamp (track_commit_timestamp = on)
> 3. Add asofTimestamp to snapshot and patch XidInMVCCSnapshot to
> compare commit timestamps when it is specified in snapshot.
>

that sounds really awesome ... i would love to see that.
my question is: while MVCC is fine when a tuple is still there ...
what are you going to do with TRUNCATE and so on?
it is not uncommon that a table is truncated frequently. in this case
MVCC won't help.
what are your thoughts on this ?

    many thanks,

        hans

-- 
Hans-Jürgen Schönig
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com



Re: AS OF queries

От
Konstantin Knizhnik
Дата:

On 27.12.2017 17:14, PostgreSQL - Hans-Jürgen Schönig wrote:
>
> On 12/20/2017 01:45 PM, Konstantin Knizhnik wrote:
>> I wonder if Postgres community is interested in supporting time travel
>> queries in PostgreSQL (something like AS OF queries in Oracle:
>> https://docs.oracle.com/cd/B14117_01/appdev.101/b10795/adfns_fl.htm).
>> As far as I know something similar is now developed for MariaDB.
>>
>> It seems to me that it will be not so difficult to implement them in
>> Postgres - we already have versions of tuples.
>> Looks like we only need to do three things:
>> 1. Disable autovacuum (autovacuum = off)
>> 2. Enable commit timestamp (track_commit_timestamp = on)
>> 3. Add asofTimestamp to snapshot and patch XidInMVCCSnapshot to
>> compare commit timestamps when it is specified in snapshot.
>>
> that sounds really awesome ... i would love to see that.
> my question is: while MVCC is fine when a tuple is still there ...
> what are you going to do with TRUNCATE and so on?
> it is not uncommon that a table is truncated frequently. in this case
> MVCC won't help.
> what are your thoughts on this ?

You should not use drop/truncate if you want to access old versions:)
Yes, truncate is much more faster than delete but it is because it 
operates on file level.
I think that it is quite natural limitation.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: AS OF queries

От
Konstantin Knizhnik
Дата:


On 27.12.2017 00:52, Jeff Janes wrote:
On Thu, Dec 21, 2017 at 6:00 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
 
There is still one significant difference of my prototype implementation with SQL standard: it associates timestamp with select statement, not with particular table.
It seems to be more difficult to support and I am not sure that joining tables from different timelines has much sense.
But certainly it also can be fixed.

I think the main use I would find for this feature is something like:

select * from foo except select * from foo as old_foo as of '<some time>';

So I would be grateful if you can make that work.  Also, I think conforming to the standards is pretty important where it is feasible to do that.

Cheers,

Jeff

I attach ne version of the patch which supports "standard" syntax, where AS OF clause is associated with table reference.
So it is possible to write query like:

    select * from SomeTable as t as of timestamp '2017-12-27 14:54:40' where id=100;

Also I introduced "time_travel" GUC which implicitly assigns some others GUCs:

        track_commit_timestamp = true;
        vacuum_defer_cleanup_age = 1000000000;
        vacuum_freeze_min_age = 1000000000;
        autovacuum_freeze_max_age = 2000000000;
        autovacuum_multixact_freeze_max_age = 2000000000;
        autovacuum_start_daemon = false;

So it disables autovacuum and microvacuum and enable commit timestamps tracking.
It provides access in the past up to milliard of transactions.

There is still no way to keep all versions only for particular tables or truncate too old versions.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Вложения

Re: AS OF queries

От
Peter van Hardenberg
Дата:
On Wed, Dec 27, 2017 at 7:37 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:


On 27.12.2017 00:52, Jeff Janes wrote:
On Thu, Dec 21, 2017 at 6:00 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
 
There is still one significant difference of my prototype implementation with SQL standard: it associates timestamp with select statement, not with particular table.
It seems to be more difficult to support and I am not sure that joining tables from different timelines has much sense.
But certainly it also can be fixed.

I think the main use I would find for this feature is something like:

select * from foo except select * from foo as old_foo as of '<some time>';


Just a quick report from the world of ORMs and web applications. 

Today the idiomatic approach for an ORM like Ruby on Rails is to support temporal(ish) queries using three additional TIMESTAMP_TZ columns: "created_at", "updated_at" and "deleted_at". This idiom is bundled up into a plugin called "acts_as_paranoid" (See: https://github.com/rubysherpas/paranoia). We used this extensively at Heroku in our production code for auditability reasons.

In general, this gets implemented on a per-table basis and usually has no expiry short of manual cleanup. (It would be interesting to contemplate how an end-user would clean up a table without losing their entire history in the event of some kind of bug or bloat.)

I think a quality PostgreSQL-core implementation would be a fantastic enhancement, though it would obviously introduce a bunch of interesting decisions around how to handle things like referential integrity.

Personally, I frequently used these columns to query for things like "how many users were created in each of the last twelve months", and the ability to index on those dates was often important.

I'm confident that if this feature made it into PostgreSQL there would be interested people in downstream communities that would take advantage of it.

Hope all that helps,

--
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt."—Kurt Vonnegut

Re: AS OF queries

От
Konstantin Knizhnik
Дата:
Attached please find new version of AS OF patch which allows to specify 
time travel period.
Older versions outside this period may be reclaimed by autovacuum.
This behavior is controlled by "time_travel_period" parameter.

Zero value of this parameter disables time travel and postgres behaves 
in standard way.
Actually you can still use AS AF construction but there is no warranty 
that requested versions are not reclaimed and result of query actually 
belongs to the specified time slice.

Value -1 means infinite history: versions are never reclaimed and 
autovacuum is disabled.

And positive value of this parameter specifies maximal time travel 
period in seconds.
As in case of disabled time travel, you can specify AS OF timestamp 
older than this period.
But there is no warranty that requested versions still exist.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Вложения

Re: AS OF queries

От
Peter Eisentraut
Дата:
On 12/28/17 11:36, Konstantin Knizhnik wrote:
> Attached please find new version of AS OF patch which allows to specify 
> time travel period.
> Older versions outside this period may be reclaimed by autovacuum.
> This behavior is controlled by "time_travel_period" parameter.

So where are we on using quasi SQL-standard syntax for a nonstandard
interpretation?  I think we could very well have a variety of standard
and nonstandard AS OF variants, including by commit timestamp, xid,
explicit range columns, etc.  But I'd like to see a discussion on that,
perhaps in a documentation update, which this patch is missing.

I have questions about corner cases.  What happens when multiple tables
are queried with different AS OF clauses?  Can there be apparent RI
violations?  What happens when the time_travel_period is changed during
a session?  How can we check how much old data is available, and how can
we check how much space it uses?  What happens if no old data for the
selected AS OF is available?  How does this interact with catalog
changes, such as changes to row-level security settings?  (Do we apply
the current or the past settings?)

This patch should probably include a bunch of tests to cover these and
other scenarios.

(Maybe "period" isn't the best name, because it implies a start and an
end.  How about something with "age"?)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: AS OF queries

От
Konstantin Knizhnik
Дата:


On 28.12.2017 20:28, Peter Eisentraut wrote:
On 12/28/17 11:36, Konstantin Knizhnik wrote:
Attached please find new version of AS OF patch which allows to specify 
time travel period.
Older versions outside this period may be reclaimed by autovacuum.
This behavior is controlled by "time_travel_period" parameter.
So where are we on using quasi SQL-standard syntax for a nonstandard
interpretation?  I think we could very well have a variety of standard
and nonstandard AS OF variants, including by commit timestamp, xid,
explicit range columns, etc.  But I'd like to see a discussion on that,
perhaps in a documentation update, which this patch is missing.
SQL:2011 defines rules for creation and querying of temporal tables.
I have not read this standard myself, I just take information about it from wikipedia:
https://en.wikipedia.org/wiki/SQL:2011
According to this standard time-sliced queries are specified using

AS OF SYSTEM TIME
 and VERSIONS BETWEEN SYSTEM TIME ... AND ...clauses.

Looks like it is supported now only by Oracle. IBM DB, MS-SQL, are providing similar functionality in slightly different way.
I am not sure whether strict support of SQL:2011 standard is critical and which other functionality we need.

I have questions about corner cases.  What happens when multiple tables
are queried with different AS OF clauses?
It is possible.

  Can there be apparent RI
violations? 
Right now AS OF is used only in selects, not in update statements. So I do not understand how integrity constraints can be violated.

 What happens when the time_travel_period is changed during
a session?  
Right now it depends on autovacuum: how fast it will be able to reclaim old version.
Actually I I do not see much sense in changing time travel period  during session.
In asof-4.patch time_travel_period is postmaster level GUC which can not be changed in session.
But I have changed policy for it for SIGHUP to make experiments with it more easier.


How can we check how much old data is available, and how can
we check how much space it uses? 
Physical space used by the database/relation can be determined using standard functions, for example pg_total_relation_size.
I do not know any simple way to get total number of all stored versions.
 What happens if no old data for the
selected AS OF is available? 
It will just return the version closest to the specified timestamp.

 How does this interact with catalog
changes, such as changes to row-level security settings?  (Do we apply
the current or the past settings?)
Catalog changes are not currently supported.
And I do not have good understanding how to support it if query involves two different timeslice with different versions of the table.
Too much places in parser/optimizer have to be change to support such "historical collisions".


This patch should probably include a bunch of tests to cover these and
other scenarios.
Right now I have added just one test: asof.sql.
It requires "track_commit_timestamp" option to be switched on and it is postmaster level GUC.
So I have added for it postgresql.asof.conf and asof_schedule.
This test should be launched using the following command:

make check EXTRA_REGRESS_OPTS="--schedule=asof_schedule --temp-config=postgresql.asof.config"

If there is some better way to include this test in standard regression tests, please let me know.
(Maybe "period" isn't the best name, because it implies a start and an
end.  How about something with "age"?)
Well I am not an English native speaker. So I can not conclude what is more natural.
"period" is widely used in topics related with temporal tables (just count occurrences of this word at https://en.wikipedia.org/wiki/SQL:2011)
Age is not used here at all.
From my point of view age is something applicable to person, building, monument,...
It is not possible to say about "ge of time travel". In science fiction "time machines" frequently have limitations: you can not got more than N years in the past.
How we can name this N? Is it "period", "age" or something else?

I attached yet another version of the patch which includes test for AS OF query.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Вложения

Re: AS OF queries

От
Peter Eisentraut
Дата:
On 12/29/17 06:28, Konstantin Knizhnik wrote:
>>   Can there be apparent RI
>> violations? 
> Right now AS OF is used only in selects, not in update statements. So I
> do not understand how integrity constraints can be violated.

I mean, if you join tables connected by a foreign key, you can expect a
certain shape of result, for example at least one match per PK row.  But
if you select from each table "as of" a different timestamp, then that
won't hold.  That could also throw off any optimizations we might come
up with in that area, such as cross-table statistics.  Not saying it
can't or shouldn't be done, but there might be some questions.

>>  What happens if no old data for the
>> selected AS OF is available? 
> It will just return the version closest to the specified timestamp.

That seems strange.  Shouldn't that be an error?

>>  How does this interact with catalog
>> changes, such as changes to row-level security settings?  (Do we apply
>> the current or the past settings?)
> Catalog changes are not currently supported.
> And I do not have good understanding how to support it if query involves
> two different timeslice with different versions of the table.
> Too much places in parser/optimizer have to be change to support such
> "historical collisions".

Right, it's probably very hard to do.  But I think it somehow should be
recognized that catalog changes took place between the selected
timestamp(s) and now and an error or notice should be produced.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: AS OF queries

От
legrand legrand
Дата:
Maybe that a simple check of the asof_timestamp value like:

asof_timestamp >= now() - time_travel_period
AND 
asof_timestamp >= latest_table_ddl

would permit to raise a warning or an error message saying that query result
can not be garanteed with this asof_timestamp value.


latest_table_ddl being found with

SELECT greatest( max(pg_xact_commit_timestamp( rel.xmin )),
max(pg_xact_commit_timestamp( att.xmin ))) as latest_table_ddl 
FROM      pg_catalog.pg_attribute att     
INNER JOIN pg_catalog.pg_class rel     
ON att.attrelid = rel.oid WHERE    rel.relname = '<asof_tablename>' and
rel.relowner= ...

(tested with add/alter/drop column and drop/create/truncate table)

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: AS OF queries

От
Konstantin Knizhnik
Дата:

On 02.01.2018 21:12, Peter Eisentraut wrote:
> On 12/29/17 06:28, Konstantin Knizhnik wrote:
>>>    Can there be apparent RI
>>> violations?
>> Right now AS OF is used only in selects, not in update statements. So I
>> do not understand how integrity constraints can be violated.
> I mean, if you join tables connected by a foreign key, you can expect a
> certain shape of result, for example at least one match per PK row.  But
> if you select from each table "as of" a different timestamp, then that
> won't hold.  That could also throw off any optimizations we might come
> up with in that area, such as cross-table statistics.  Not saying it
> can't or shouldn't be done, but there might be some questions.

Now I understand your statement. Yes, combining different timelines in 
the same query can violate integrity constraint.
In theory there can be some query plans which will be executed 
incorrectly  because of this constraint violation.
I do not know concrete examples of such plans right now, but I can not 
prove that such problem can  not happen.

>
>>>   What happens if no old data for the
>>> selected AS OF is available?
>> It will just return the version closest to the specified timestamp.
> That seems strange.  Shouldn't that be an error?

I will add an option raising error in this case.
I do not want to always throw error, because Postgres is very 
conservative in reclaiming old space. And the fact that version is not 
used by any snapshot doesn't mean that it will be immediately deleted. 
So there is still chance to peek-up old data although it is out of the 
specified time travel period.


>
>>>   How does this interact with catalog
>>> changes, such as changes to row-level security settings?  (Do we apply
>>> the current or the past settings?)
>> Catalog changes are not currently supported.
>> And I do not have good understanding how to support it if query involves
>> two different timeslice with different versions of the table.
>> Too much places in parser/optimizer have to be change to support such
>> "historical collisions".
> Right, it's probably very hard to do.  But I think it somehow should be
> recognized that catalog changes took place between the selected
> timestamp(s) and now and an error or notice should be produced.
>
There is one challenge: right now AS OF timestamps are not required to 
be constants: them can be calculated dynamically during query execution. 
So at the time of query compilation it is not possible to check whether 
specified timestamps observe catalog changes or not.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: AS OF queries

От
Konstantin Knizhnik
Дата:

On 03.01.2018 23:49, legrand legrand wrote:
> Maybe that a simple check of the asof_timestamp value like:
>
> asof_timestamp >= now() - time_travel_period
> AND
> asof_timestamp >= latest_table_ddl
>
> would permit to raise a warning or an error message saying that query result
> can not be garanteed with this asof_timestamp value.
>
>
> latest_table_ddl being found with
>
> SELECT greatest( max(pg_xact_commit_timestamp( rel.xmin )),
> max(pg_xact_commit_timestamp( att.xmin ))) as latest_table_ddl
> FROM      pg_catalog.pg_attribute att     
> INNER JOIN pg_catalog.pg_class rel    
> ON att.attrelid = rel.oid WHERE    rel.relname = '<asof_tablename>' and
> rel.relowner= ...
>
> (tested with add/alter/drop column and drop/create/truncate table)

Well, it can be done.
But performing this query on each access to the table seems to be bad 
idea: in case of nested loop join it can cause significant degrade of 
performance.
The obvious solution is to calculate this latest_table_ddl timestamp 
once and store it it somewhere (in ScanState?)
But I am not sure that this check is actually needed.
If table is changed in some incompatible way, then we will get error in 
any case.
If table change is not critical for this query (for example some column 
was added or removed which is not used in this query),
then should we really throw error in this case?

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: AS OF queries

От
legrand legrand
Дата:
> But performing this query on each access to the table seems to be bad
> idea: in case of nested loop join it can cause significant degrade of
> performance.

this could be a pre-plan / pre-exec check, no more.

> But I am not sure that this check is actually needed.
> If table is changed in some incompatible way, then we will get error in
> any case.

It seems that with path v3, a query with asof_timestamp 
set before a truncate or alter table doesn't throw any error, 
just gives an empty result (even if there was data).

> If table change is not critical for this query (for example some column
> was added or removed which is not used in this query),
> then should we really throw error in this case? 

no error is needed if result is correct.

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: AS OF queries

От
Konstantin Knizhnik
Дата:

On 10.01.2018 16:02, legrand legrand wrote:
>> But performing this query on each access to the table seems to be bad
>> idea: in case of nested loop join it can cause significant degrade of
>> performance.
> this could be a pre-plan / pre-exec check, no more.

AS-OF timestamp can be taken from outer table, so it is necessary to 
repeat this check at each nested loop join iteration.

>
>> But I am not sure that this check is actually needed.
>> If table is changed in some incompatible way, then we will get error in
>> any case.
> It seems that with path v3, a query with asof_timestamp
> set before a truncate or alter table doesn't throw any error,
> just gives an empty result (even if there was data).

Sorry, truncate is not compatible with AS OF. It is performed at file 
level and deletes old old version.
So if you want to use time travel, you should not use truncate.

>
>> If table change is not critical for this query (for example some column
>> was added or removed which is not used in this query),
>> then should we really throw error in this case?
> no error is needed if result is correct.

Does it mean that no explicit check is needed that table metadata was 
not checked after specified timeslice?


Attached please find new version of the AS OF patch which throws error 
if specified AS OF timestamp is older that time travel horizon and 
"check_asof_timestamp" parameter is set to true (by default it is 
switched off).


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Вложения

Re: AS OF queries

От
legrand legrand
Дата:
> Sorry, truncate is not compatible with AS OF. It is performed at file
> level and deletes old old version.
> So if you want to use time travel, you should not use truncate. 

As time travel doesn't support truncate, I would prefer it to be checked.
If no check is performed, ASOF queries (with timestamp before truncate ) 
would return no data even when there was: this could be considered as a
wrong result.

if a truncate is detected, an error should be raised, saying data is no more
available before truncate timestamp.


> Does it mean that no explicit check is needed that table metadata was
> not checked after specified timeslice? 

Not sure, it would depend on metadata modification type ...
adding/dropping a columns seems working, 
what about altering a column type or dropping / recreating a table ?

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


Re: AS OF queries

От
Bruce Momjian
Дата:
On Sat, Dec 23, 2017 at 11:53:19PM +0300, konstantin knizhnik wrote:
> 
> On Dec 23, 2017, at 2:08 AM, Greg Stark wrote:
> 
> > On 20 December 2017 at 12:45, Konstantin Knizhnik
> > <k.knizhnik@postgrespro.ru> wrote:
> > 
> >> It seems to me that it will be not so difficult to implement them in
> >> Postgres - we already have versions of tuples.
> >> Looks like we only need to do three things:
> >> 1. Disable autovacuum (autovacuum = off)
> > 
> > "The Wheel of Time turns, and Ages come and pass, leaving memories
> > that become legend. Legend fades to myth, and even myth is long
> > forgotten when the Age that gave it birth comes again"
> > 
> > I think you'll find it a lot harder to get this to work than just
> > disabling autovacuum. Notably HOT updates can get cleaned up (and even
> > non-HOT updates can now leave tombstone dead line pointers iirc) even
> > if vacuum hasn't run.
> > 
> 
> Yeh, I suspected that just disabling autovacuum was not enough.
> I heard (but do no know too much) about microvacuum and hot updates.
> This is why I was a little bit surprised when me test didn't show lost of updated versions.
> May be it is because of vacuum_defer_cleanup_age.

Well vacuum and single-page pruning do 3 things:

1.  remove expired updated rows
2.  remove deleted row
3.  remove rows from aborted transactions

While time travel doesn't want #1 and #2, it probably wants #3.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: AS OF queries

От
Konstantin Knizhnik
Дата:

On 26.01.2018 03:55, Bruce Momjian wrote:
> On Sat, Dec 23, 2017 at 11:53:19PM +0300, konstantin knizhnik wrote:
>> On Dec 23, 2017, at 2:08 AM, Greg Stark wrote:
>>
>>> On 20 December 2017 at 12:45, Konstantin Knizhnik
>>> <k.knizhnik@postgrespro.ru> wrote:
>>>
>>>> It seems to me that it will be not so difficult to implement them in
>>>> Postgres - we already have versions of tuples.
>>>> Looks like we only need to do three things:
>>>> 1. Disable autovacuum (autovacuum = off)
>>> "The Wheel of Time turns, and Ages come and pass, leaving memories
>>> that become legend. Legend fades to myth, and even myth is long
>>> forgotten when the Age that gave it birth comes again"
>>>
>>> I think you'll find it a lot harder to get this to work than just
>>> disabling autovacuum. Notably HOT updates can get cleaned up (and even
>>> non-HOT updates can now leave tombstone dead line pointers iirc) even
>>> if vacuum hasn't run.
>>>
>> Yeh, I suspected that just disabling autovacuum was not enough.
>> I heard (but do no know too much) about microvacuum and hot updates.
>> This is why I was a little bit surprised when me test didn't show lost of updated versions.
>> May be it is because of vacuum_defer_cleanup_age.
> Well vacuum and single-page pruning do 3 things:
>
> 1.  remove expired updated rows
> 2.  remove deleted row
> 3.  remove rows from aborted transactions
>
> While time travel doesn't want #1 and #2, it probably wants #3.
>
Rows of aborted transactions are in any case excluded by visibility checks.
Definitely skipping them costs some time, so large percent of aborted 
transactions  may affect query speed.
But query speed is reduced in any case if in order to support time 
travel we prohibit or postpone vacuum.

What is the expected relation of committed and aborted transactions? I 
expected that it should be much bigger than one (especially if we take 
in account
only read-write transaction which has really updated database). In this 
case number of versions created by aborted transaction should be much 
smaller than number of versions created by updated/delete of successful 
transactions. So them should not have significant impact on performance.



-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: AS OF queries

От
Bruce Momjian
Дата:
On Fri, Jan 26, 2018 at 10:56:06AM +0300, Konstantin Knizhnik wrote:
> >>Yeh, I suspected that just disabling autovacuum was not enough.
> >>I heard (but do no know too much) about microvacuum and hot updates.
> >>This is why I was a little bit surprised when me test didn't show lost of updated versions.
> >>May be it is because of vacuum_defer_cleanup_age.
> >Well vacuum and single-page pruning do 3 things:
> >
> >1.  remove expired updated rows
> >2.  remove deleted row
> >3.  remove rows from aborted transactions
> >
> >While time travel doesn't want #1 and #2, it probably wants #3.
> >
> Rows of aborted transactions are in any case excluded by visibility checks.
> Definitely skipping them costs some time, so large percent of aborted
> transactions  may affect query speed.
> But query speed is reduced in any case if in order to support time travel we
> prohibit or postpone vacuum.
> 
> What is the expected relation of committed and aborted transactions? I
> expected that it should be much bigger than one (especially if we take in
> account
> only read-write transaction which has really updated database). In this case
> number of versions created by aborted transaction should be much smaller
> than number of versions created by updated/delete of successful
> transactions. So them should not have significant impact on performance.

Uh, I think the big question is whether we are ready to agreed that a
time-travel database will _never_ have aborted rows removed.  The
aborted rows are clearly useless for time travel, so the question is
whether we ever want to remove them.  I would think at some point we do.

Also, I am not sure we have any statistics on how many aborted rows are
in each table.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +