Re: journaling / time travel

Поиск
Список
Период
Сортировка
От phb07
Тема Re: journaling / time travel
Дата
Msg-id db3869ef-2696-fc79-cec4-85478dbc814e@apra.asso.fr
обсуждение исходный текст
Ответ на journaling / time travel  (Willy-Bas Loos <willybas@gmail.com>)
Список pgsql-general

On 09/19/2016 08:48 AM, Willy-Bas Loos wrote:
> Since records can
> be changed afterwards, it has been argued that we should have
> "journaling", meaning that every change to the data is saved in a
> separate schema that holds a "journaling" copy of each table

I don't think this is especially unusual. I've worked on similar 
projects that maintain history for regulatory reasons. I can't speak to 
your legal situation, but I don't think keeping history is a strange choice.

Re the security argument: it seems to me it depends on whether you can 
restrict access to the journal while providing access to the normal 
tables. I guess journaling would help for some threats but not others, 
right? If regular users can't touch the journal, then I agree keeping 
one could make sense.

Are you saying the journaling portion is 624GB on top of 1.1TB? Or that 
of the 1.1TB, 624GB of it is from journaling? Either way it doesn't seem 
like a massive cost to me.

I don't think PITR is an appropriate solution to keeping a 10-year 
history of changes.

It sounds like you're not looking for a green-field solution, but just 
trying to get perspective on what others are doing. Some resources for 
this that might help you:

https://www.youtube.com/watch?v=TRgni5q0YM8
https://github.com/arkhipov/temporal_tables (aka 
http://pgxn.org/dist/temporal_tables/)
http://www.cs.arizona.edu/~rts/tdbbook.pdf (also available in print)
https://www.amazon.com/Bitemporal-Data-Practice-Tom-Johnston/dp/0124080677

The two books contain multiple approaches to storing history, each with 
tradeoffs for ease-of-use, disk space, etc. Reading them might be 
overkill for you. If you want to choose one, I'd read Snodgrass. The 
first few chapters are tedious, but it gets better.

Temporal databases are in a funny in-between zone where there is lots of 
research, but standard tools are fairly underdeveloped. Postgres 
recently added range types and exclusion constraints, which are 
important primitives for building a temporal system, but it is still a 
long way from SQL:2011, and SQL:2011 is itself a long way from 
everything you might want. One thing that seems lacking to me, even in 
the research, is how to handle DDL changes. You should be glad that you 
only care about audit history and not subject history too, because going 
bi-temporal is where you really cross over into lack of available tools, 
outside of a few commercial offerings. (Teradata has temporal support, 
using a Snodgrass-like approach that pre-dates the standard.)

Paul

You may also have a look on the E-Maj extension. It is available for instance here on pgxn.
Its purpose is to record all updates on tables either to examine them or to "rollback" them later. But it should be also usable for auditing.
The documentation in the package includes a detailed reference guide, but also a presentation that allows to quickly have a view on the extention.
Even if you think E-Maj doesn't fit your need, feel free to give me a feedback. It may give me thoughts for enhancements ;-)

Regards. Philippe.
 

В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: Transactions and functions
Следующее
От: Vitaly Burovoy
Дата:
Сообщение: Re: jsonb_set for nested new item?