Обсуждение: historical log of data records

Поиск
Список
Период
Сортировка

historical log of data records

От
Sanjay Minni
Дата:
I need to keep a copy of old data as the rows are changed. 

For a general RDBMS I could think of keeping all the data in the same table with a flag to indicate older copies of updated /  deleted rows or keep a parallel table and copy these rows into the parallel data under program / trigger control. Each has its pros and cons. 

In Postgres would i have to follow the same methods or are there any features / packages available ?

regards
Sanjay Minni

Re: historical log of data records

От
Laurenz Albe
Дата:
On Tue, 2021-11-16 at 13:56 +0530, Sanjay Minni wrote:
> I need to keep a copy of old data as the rows are changed. 
> 
> For a general RDBMS I could think of keeping all the data in the same table with a flag
> to indicate older copies of updated /  deleted rows or keep a parallel table and copy
> these rows into the parallel data under program / trigger control. Each has its pros and cons. 
> 
> In Postgres would i have to follow the same methods or are there any features / packages available ?

Yes, I would use one of these methods.

The only feature I can think of that may help is partitioning: if you have one partition
for the current data and one for the deleted data, then updating the flag would
automatically move the row between partitions, so you don't need a trigger.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: historical log of data records

От
Sanjay Minni
Дата:
Are you referring to Table Partitioning ?

with warm regards
Sanjay Minni
+91-9900-902902



On Tue, 16 Nov 2021 at 14:50, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Tue, 2021-11-16 at 13:56 +0530, Sanjay Minni wrote:
> I need to keep a copy of old data as the rows are changed. 
>
> For a general RDBMS I could think of keeping all the data in the same table with a flag
> to indicate older copies of updated /  deleted rows or keep a parallel table and copy
> these rows into the parallel data under program / trigger control. Each has its pros and cons. 
>
> In Postgres would i have to follow the same methods or are there any features / packages available ?

Yes, I would use one of these methods.

The only feature I can think of that may help is partitioning: if you have one partition
for the current data and one for the deleted data, then updating the flag would
automatically move the row between partitions, so you don't need a trigger.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Re: historical log of data records

От
Alban Hertroys
Дата:
> On 16 Nov 2021, at 10:20, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Tue, 2021-11-16 at 13:56 +0530, Sanjay Minni wrote:
>> I need to keep a copy of old data as the rows are changed.
>>
>> For a general RDBMS I could think of keeping all the data in the same table with a flag
>> to indicate older copies of updated /  deleted rows or keep a parallel table and copy
>> these rows into the parallel data under program / trigger control. Each has its pros and cons.
>>
>> In Postgres would i have to follow the same methods or are there any features / packages available ?
>
> Yes, I would use one of these methods.
>
> The only feature I can think of that may help is partitioning: if you have one partition
> for the current data and one for the deleted data, then updating the flag would
> automatically move the row between partitions, so you don't need a trigger.

Are you building (something like) a data-vault? If so, keep in mind that you will have a row for every update, not just
asingle deleted row. Enriching the data can be really useful in such cases. 

For a data-vault at a previous employer, we determined how to treat new rows by comparing a (md5) hash of the new and
oldrows, adding the hash and a validity interval to the stored rows. Historic data went to a separate table for each
respectivecurrent table. 

The current tables “inherited” the PK’s from the tables on the source systems (this was a data-warehouse DB). Obviously
thatsame PK can not be applied to the historic tables where there _will_ be duplicates, although they should be at
non-overlappingvalidity intervals. 

Alternatively, since this is time-series data, it would probably be a good idea to store that in a way optimised for
that.TimescaleDB comes to mind, or arrays as per Pavel’s suggestion at
https://stackoverflow.com/questions/68440130/time-series-data-on-postgresql.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Re: historical log of data records

От
Sanjay Minni
Дата:
Alban,

Its a simple financial transaction processing application, the application permits editing / updating / deleting of entered data even multiple times but audit trail of the data tracing through all versions to its original must be preserved.  
(as outlined - Programmatically i could approach it by keeping a parallel set of tables and copying the row being replaced into the parallel table set, or, keeping all record versions in a single table only and a flag to indicate the final / current version)
I am looking is there are better ways to do it


with warm regards
Sanjay Minni
+91-9900-902902



On Tue, 16 Nov 2021 at 15:57, Alban Hertroys <haramrae@gmail.com> wrote:

> On 16 Nov 2021, at 10:20, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Tue, 2021-11-16 at 13:56 +0530, Sanjay Minni wrote:
>> I need to keep a copy of old data as the rows are changed.
>>
>> For a general RDBMS I could think of keeping all the data in the same table with a flag
>> to indicate older copies of updated /  deleted rows or keep a parallel table and copy
>> these rows into the parallel data under program / trigger control. Each has its pros and cons.
>>
>> In Postgres would i have to follow the same methods or are there any features / packages available ?
>
> Yes, I would use one of these methods.
>
> The only feature I can think of that may help is partitioning: if you have one partition
> for the current data and one for the deleted data, then updating the flag would
> automatically move the row between partitions, so you don't need a trigger.

Are you building (something like) a data-vault? If so, keep in mind that you will have a row for every update, not just a single deleted row. Enriching the data can be really useful in such cases.

For a data-vault at a previous employer, we determined how to treat new rows by comparing a (md5) hash of the new and old rows, adding the hash and a validity interval to the stored rows. Historic data went to a separate table for each respective current table.

The current tables “inherited” the PK’s from the tables on the source systems (this was a data-warehouse DB). Obviously that same PK can not be applied to the historic tables where there _will_ be duplicates, although they should be at non-overlapping validity intervals.

Alternatively, since this is time-series data, it would probably be a good idea to store that in a way optimised for that. TimescaleDB comes to mind, or arrays as per Pavel’s suggestion at https://stackoverflow.com/questions/68440130/time-series-data-on-postgresql.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

Re: historical log of data records

От
"Peter J. Holzer"
Дата:
On 2021-11-16 13:56:51 +0530, Sanjay Minni wrote:
> I need to keep a copy of old data as the rows are changed. 
>
> For a general RDBMS I could think of keeping all the data in the same table
> with a flag to indicate older copies of updated /  deleted rows or keep a
> parallel table and copy these rows into the parallel data under program /
> trigger control. Each has its pros and cons. 
>
> In Postgres would i have to follow the same methods or are there any features /
> packages available ?

There is an extension (I think it's this one:
https://pgxn.org/dist/temporal_tables/) which we've used for one
project so far. I'm not sure whether I would use it for further
projects: It does what it's supposed to do and it's nice not having to
reinvent the wheel. But it did cause some problems during a posgresql
upgrade (nothing insurmountable, just a bit of a hassle) and IIRC also
with Django migration scripts. So, given that it's easy to roll your own
I probably would (but I do suffer a bit from NIH syndrome in general, so
that's not surprising).

I would strongly advise to use triggers. In my experience implementing
that in the application logic is a recipe for inconsistencies. Someone
will always forget to duplicate a row or set the wrong timestamp or
whatever (BTDT).

I don't have a strong preference between just using valid_from/valid_to
fields in the same table(s) and using separate history tables. Both
work, one or the other may be slightly more convenient depending on the
use case. Since you mentioned that the purpose is auditing, I'd probably
lean towards separate tables in your case.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: historical log of data records

От
Laurenz Albe
Дата:
On Tue, 2021-11-16 at 15:24 +0530, Sanjay Minni wrote:
> 
> On Tue, 16 Nov 2021 at 14:50, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > On Tue, 2021-11-16 at 13:56 +0530, Sanjay Minni wrote:
> > > I need to keep a copy of old data as the rows are changed. 
> > > 
> > > For a general RDBMS I could think of keeping all the data in the same table with a flag
> > > to indicate older copies of updated /  deleted rows or keep a parallel table and copy
> > > these rows into the parallel data under program / trigger control. Each has its pros and cons. 
> > > 
> > > In Postgres would i have to follow the same methods or are there any features / packages available ?
> > 
> > Yes, I would use one of these methods.
> > 
> > The only feature I can think of that may help is partitioning: if you have one partition
> > for the current data and one for the deleted data, then updating the flag would
> > automatically move the row between partitions, so you don't need a trigger.
>
> Are you referring to Table Partitioning ?
> https://www.postgresql.org/docs/14/ddl-partitioning.html

Yes, exactly.

Yours,
Laurenz Albe