Re: AS OF queries

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: AS OF queries
Дата
Msg-id e89279f7-6eee-d8d4-bb5e-d45a4e76519e@postgrespro.ru
обсуждение исходный текст
Ответ на Re: AS OF queries  (Craig Ringer <craig@2ndquadrant.com>)
Ответы Re: AS OF queries  (Craig Ringer <craig@2ndquadrant.com>)
Список pgsql-hackers


On 25.12.2017 06:26, Craig Ringer wrote:
On 24 December 2017 at 04:53, konstantin knizhnik <k.knizhnik@postgrespro.ru> wrote:


But what if I just forbid to change recent_global_xmin?
If it is stalled at FirstNormalTransactionId and never changed?
Will it protect all versions from been deleted?

That's totally impractical, you'd have unbounded bloat and a nonfunctional system in no time.

You'd need a mechanism - akin to what we have with replication slots - to set a threshold for age.

Well, there are systems with "never delete" and "append only" semantic.
For example, I have participated in SciDB project: database for scientific applications.
One of the key requirements for scientific researches is reproducibility.
From the database point of view it means that we need to store all raw data and never delete it.
If you performed some measurements and made some conclusions based on this results, then everybody should be able to repeat it, even if later
you find some errors in input data and made corrections or just add more data.
So one of the SciDB requirements was to store all versions. Delete operation should just mark data as been deleted (although later we have to add true delete:)

But I agree with you: in most cases more flexible policy of managing versions is needed.
I am not sure that it should be similar with logical replication slot.
Here semantic is quite clear: we preserve segments of WAL until them are replicated to the subscribers.
With time travel situation is less obscure: we may specify some threshold for age - keep data for example for one year.
But what if somebody later wants to access  older data? At this moment them are already lost...

It seems to me that version pinning policy mostly depends on source of the data.
If  them have "append only" semantic (like as raw scientific data, trading data, measurements from IoT sensors...)
then it will be desirable to keep all version forever.
If we speak about OLTP tables (like accounts in pgbench), then may be time travel is not the proper mechanism for such data at all.

I think that in addition to logged/unlogged tables it will be useful to support historical/non-historical tables. Historical table should support time travel, while
non-historical (default) acts like normal table. It is already possible in Postgres to disable autovacuum for particular tables.
But unfortunately trick with snapshot (doesn't matter how we setup oldest xmin horizon) affect all tables.
There is similar (but not the same) problem with logical replication: assume that we need to replicate only one small table. But we have to pin in WAL all updates of other huge table which is not involved in logical replication at all.



> Then there's another issue that logical replication has had to deal
> with -- catalog changes. You can't start looking at tuples that have a
> different structure than the current catalog unless you can figure out
> how to use the logical replication infrastructure to use the old
> catalogs. That's a huge problem to bite off and probably can just be
> left for another day if you can find a way to reliably detect the
> problem and raise an error if the schema is inconsistent.


Yes, catalog changes this is another problem of time travel.
I do not know any suitable way to handle several different catalog snapshots in one query.

I doubt it's practical unless you can extract it to subplans that can be materialized separately. Even then, UDTs, rowtype results, etc...
 

Well, I am really not sure about user's demands to time travel. This is one of the reasons of initiating this discussion in hackers... May be it is not the best place for such discussion, because there are mostly Postgres developers and not users...
At least, from experience of few SciDB customers, I can tell that we didn't have problems with schema evolution: mostly schema is simple, static and well defined.
There was problems with incorrect import of data (this is why we have to add real delete), with splitting data in chunks (partitioning),...

The question is how we should handle such catalog changes if them are happen. Ideally we should not allow to move back beyond  this point.
Unfortunately it is not so easy to implement.

I think you can learn a lot from studying logical decoding here.


Working with multimaster and shardman I have to learn a lot about logical replication.
It is really powerful and flexible mechanism ... with a lot of limitations and problems: lack of catalog replication, inefficient bulk insert, various race conditions,...
But I think that time travel and logical replication are really serving different goals so require different approaches.



--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] taking stdbool.h into use
Следующее
От: Konstantin Knizhnik
Дата:
Сообщение: Re: Huge backend memory footprint