Обсуждение: [GENERAL] data transformation and replication

Поиск
Список
Период
Сортировка

[GENERAL] data transformation and replication

От
"Armand Pirvu (home)"
Дата:
Hi

Here it is a scenario which I am faced with  and I am hoping to find a pointer/tip/help

db1 is the OLTP system
db2 is the Reporting system

The data from db1 needs to get to db2, but the database on those two have tables with different layout/structure and
hencedata will need to suffer some transformation in between in real time 

I was looking at something like

db1 -> db2 replicates the same set of tables and with the same structures using pglogical for example
db2.tbl1 -> db2.tbl2 data gets massages/transformed based on what replicates from db1.tbl1 using triggers and functions


Other than that I reckon db1 -> db2 would be trigger based using something like slonik maybe (?) and data
massage/transformationgets moved from db2 to db1 machine and then db1.tbl2 -> db2.tbl2 using pglogical  


Is this doable ? If so any pointers as to where to look about it ?


Many thanks
Armand




Re: [GENERAL] data transformation and replication

От
Adrian Klaver
Дата:
On 05/08/2017 12:46 PM, Armand Pirvu (home) wrote:
> Hi
>
> Here it is a scenario which I am faced with  and I am hoping to find a pointer/tip/help
>
> db1 is the OLTP system
> db2 is the Reporting system
>
> The data from db1 needs to get to db2, but the database on those two have tables with different layout/structure and
hencedata will need to suffer some transformation in between in real time 
>
> I was looking at something like
>
> db1 -> db2 replicates the same set of tables and with the same structures using pglogical for example
> db2.tbl1 -> db2.tbl2 data gets massages/transformed based on what replicates from db1.tbl1 using triggers and
functions
>
>
> Other than that I reckon db1 -> db2 would be trigger based using something like slonik maybe (?) and data
massage/transformationgets moved from db2 to db1 machine and then db1.tbl2 -> db2.tbl2 using pglogical 

I was following you until the last part, "... moved from db2 to db1
machine and then db1.tbl2 -> db2.tbl2 ..."

Is this correct?

If so why db1 --> db2 --> db1 --> db2?

A complete answer is going to depend on at least an outline of what you
mean by massage/transform?

>
>
> Is this doable ? If so any pointers as to where to look about it ?
>
>
> Many thanks
> Armand
>
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [GENERAL] data transformation and replication

От
"Armand Pirvu (home)"
Дата:


My bad


db1 I have two tables t1 and t2 (or more)
db2 has one table t3 for example which can get data aggregated from one or more multiple tables from the above set . I can updates/inserts/deletes in db1.t1 and/or db1.t2 which combined may mean related data in db.t3 would need to be inserted/deleted/updated. Think of it like ETL processing if you will. This is what I mean by data massaging/transformation


db1 and db2 are two different servers.


So I was initially thinking that I can have on db2 the same set of tables from db1, replication being done using pglogical. Once data gets to db2 t1 and t2, I can have on db2 a set of functions/triggers which can transform the data and as such do the relevant inserts/updates/delete from db2.t3

Apparently though that is not possible unless I am missing something

I reached that conclusion by using a trigger and a function like the auditing one to track insers/updates/deletes in an audit table

Having these said I was thinking

(a) - 
On db1 I will have the t3 table as is on dsb2. All data transformation goes into db1.t3 which on it's turn will replicate to db2.t3 using pglogical

(b) -
On db2 I will have the t1 t2 as they are on db1. Those are replicated using Slony/Bucardo. Once data lands on db2.t1 and db2.t2 another set of triggers/functions responsible for data transformation will do the inserts/deletes/updates in db2.t3

I wold much prefer pglogical approach as stated in the what I see as a failed case 


If the only options is Slony/Bucardo , so be it. but that begs the following questions
- which one has the smallest overhead ?
- which one is the easiest to manage ?
- which one is the most reliable ?
- I recall data transformation can be used in Bucardo but did not see any examples on that. Any pointers ?

Thanks
Armand



On May 8, 2017, at 4:49 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 05/08/2017 12:46 PM, Armand Pirvu (home) wrote:
Hi

Here it is a scenario which I am faced with  and I am hoping to find a pointer/tip/help

db1 is the OLTP system
db2 is the Reporting system

The data from db1 needs to get to db2, but the database on those two have tables with different layout/structure and hence data will need to suffer some transformation in between in real time

I was looking at something like

db1 -> db2 replicates the same set of tables and with the same structures using pglogical for example
db2.tbl1 -> db2.tbl2 data gets massages/transformed based on what replicates from db1.tbl1 using triggers and functions


Other than that I reckon db1 -> db2 would be trigger based using something like slonik maybe (?) and data massage/transformation gets moved from db2 to db1 machine and then db1.tbl2 -> db2.tbl2 using pglogical

I was following you until the last part, "... moved from db2 to db1 machine and then db1.tbl2 -> db2.tbl2 ..."

Is this correct?

If so why db1 --> db2 --> db1 --> db2?

A complete answer is going to depend on at least an outline of what you mean by massage/transform?



Is this doable ? If so any pointers as to where to look about it ?


Many thanks
Armand






-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: [GENERAL] data transformation and replication

От
Adrian Klaver
Дата:
On 05/08/2017 08:31 PM, Armand Pirvu (home) wrote:
>
>
> My bad
>
>
> db1 I have two tables t1 and t2 (or more)
> db2 has one table t3 for example which can get data aggregated from one
> or more multiple tables from the above set . I can
> updates/inserts/deletes in db1.t1 and/or db1.t2 which combined may mean
> related data in db.t3 would need to be inserted/deleted/updated. Think
> of it like ETL processing if you will. This is what I mean by data
> massaging/transformation
>
>
> db1 and db2 are two different servers.

What are the Postgres versions?

>
>
> So I was initially thinking that I can have on db2 the same set of
> tables from db1, replication being done using pglogical. Once data gets
> to db2 t1 and t2, I can have on db2 a set of functions/triggers which
> can transform the data and as such do the relevant
> inserts/updates/delete from db2.t3
>
> Apparently though that is not possible unless I am missing something

Probably this:

https://www.2ndquadrant.com/en/resources/pglogical/pglogical-docs/

"4.11 Triggers

Apply process and the initial COPY process both run with
session_replication_role set to replica which means that ENABLE REPLICA
and ENABLE ALWAYS triggers will be fired."

https://www.postgresql.org/docs/9.6/static/sql-altertable.html

"DISABLE/ENABLE [ REPLICA | ALWAYS ] TRIGGER

   ...  The trigger firing mechanism is also affected by the
configuration variable session_replication_role. Simply enabled triggers
will fire when the replication role is "origin" (the default) or
"local". Triggers configured as ENABLE REPLICA will only fire if the
session is in "replica" mode, and triggers configured as ENABLE ALWAYS
will fire regardless of the current replication mode.
...
"

So did you ENABLE REPLICA or ALWAYS on the db2 table triggers?


>
> I reached that conclusion by using a trigger and a function like the
> auditing one to track insers/updates/deletes in an audit table
>
> Having these said I was thinking
>
> (a) -
> On db1 I will have the t3 table as is on dsb2. All data transformation
> goes into db1.t3 which on it's turn will replicate to db2.t3 using pglogical
>
> (b) -
> On db2 I will have the t1 t2 as they are on db1. Those are replicated
> using Slony/Bucardo. Once data lands on db2.t1 and db2.t2 another set of
> triggers/functions responsible for data transformation will do the
> inserts/deletes/updates in db2.t3
>
> I wold much prefer pglogical approach as stated in the what I see as a
> failed case
>
>
> If the only options is Slony/Bucardo , so be it. but that begs the
> following questions
> - which one has the smallest overhead ?
> - which one is the easiest to manage ?
> - which one is the most reliable ?
> - I recall data transformation can be used in Bucardo but did not see
> any examples on that. Any pointers ?
>
> Thanks
> Armand
>
>
>
> On May 8, 2017, at 4:49 PM, Adrian Klaver <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>> wrote:
>
>> On 05/08/2017 12:46 PM, Armand Pirvu (home) wrote:
>>> Hi
>>>
>>> Here it is a scenario which I am faced with  and I am hoping to find
>>> a pointer/tip/help
>>>
>>> db1 is the OLTP system
>>> db2 is the Reporting system
>>>
>>> The data from db1 needs to get to db2, but the database on those two
>>> have tables with different layout/structure and hence data will need
>>> to suffer some transformation in between in real time
>>>
>>> I was looking at something like
>>>
>>> db1 -> db2 replicates the same set of tables and with the same
>>> structures using pglogical for example
>>> db2.tbl1 -> db2.tbl2 data gets massages/transformed based on what
>>> replicates from db1.tbl1 using triggers and functions
>>>
>>>
>>> Other than that I reckon db1 -> db2 would be trigger based using
>>> something like slonik maybe (?) and data massage/transformation gets
>>> moved from db2 to db1 machine and then db1.tbl2 -> db2.tbl2 using
>>> pglogical
>>
>> I was following you until the last part, "... moved from db2 to db1
>> machine and then db1.tbl2 -> db2.tbl2 ..."
>>
>> Is this correct?
>>
>> If so why db1 --> db2 --> db1 --> db2?
>>
>> A complete answer is going to depend on at least an outline of what
>> you mean by massage/transform?
>>
>>>
>>>
>>> Is this doable ? If so any pointers as to where to look about it ?
>>>
>>>
>>> Many thanks
>>> Armand
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Adrian Klaver
>> adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [GENERAL] data transformation and replication

От
"Armand Pirvu (home)"
Дата:
9.5 both

But the enable always trigger I missed that


Once that set it runs


Thank you for your help


Armand

On May 9, 2017, at 8:26 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 05/08/2017 08:31 PM, Armand Pirvu (home) wrote:
My bad
db1 I have two tables t1 and t2 (or more)
db2 has one table t3 for example which can get data aggregated from one or more multiple tables from the above set . I can updates/inserts/deletes in db1.t1 and/or db1.t2 which combined may mean related data in db.t3 would need to be inserted/deleted/updated. Think of it like ETL processing if you will. This is what I mean by data massaging/transformation
db1 and db2 are two different servers.

What are the Postgres versions?

So I was initially thinking that I can have on db2 the same set of tables from db1, replication being done using pglogical. Once data gets to db2 t1 and t2, I can have on db2 a set of functions/triggers which can transform the data and as such do the relevant inserts/updates/delete from db2.t3
Apparently though that is not possible unless I am missing something

Probably this:

https://www.2ndquadrant.com/en/resources/pglogical/pglogical-docs/

"4.11 Triggers

Apply process and the initial COPY process both run with session_replication_role set to replica which means that ENABLE REPLICA and ENABLE ALWAYS triggers will be fired."

https://www.postgresql.org/docs/9.6/static/sql-altertable.html

"DISABLE/ENABLE [ REPLICA | ALWAYS ] TRIGGER

 ...  The trigger firing mechanism is also affected by the configuration variable session_replication_role. Simply enabled triggers will fire when the replication role is "origin" (the default) or "local". Triggers configured as ENABLE REPLICA will only fire if the session is in "replica" mode, and triggers configured as ENABLE ALWAYS will fire regardless of the current replication mode.
...
"

So did you ENABLE REPLICA or ALWAYS on the db2 table triggers?


I reached that conclusion by using a trigger and a function like the auditing one to track insers/updates/deletes in an audit table
Having these said I was thinking
(a) -
On db1 I will have the t3 table as is on dsb2. All data transformation goes into db1.t3 which on it's turn will replicate to db2.t3 using pglogical
(b) -
On db2 I will have the t1 t2 as they are on db1. Those are replicated using Slony/Bucardo. Once data lands on db2.t1 and db2.t2 another set of triggers/functions responsible for data transformation will do the inserts/deletes/updates in db2.t3
I wold much prefer pglogical approach as stated in the what I see as a failed case
If the only options is Slony/Bucardo , so be it. but that begs the following questions
- which one has the smallest overhead ?
- which one is the easiest to manage ?
- which one is the most reliable ?
- I recall data transformation can be used in Bucardo but did not see any examples on that. Any pointers ?
Thanks
Armand
On May 8, 2017, at 4:49 PM, Adrian Klaver <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
On 05/08/2017 12:46 PM, Armand Pirvu (home) wrote:
Hi

Here it is a scenario which I am faced with  and I am hoping to find a pointer/tip/help

db1 is the OLTP system
db2 is the Reporting system

The data from db1 needs to get to db2, but the database on those two have tables with different layout/structure and hence data will need to suffer some transformation in between in real time

I was looking at something like

db1 -> db2 replicates the same set of tables and with the same structures using pglogical for example
db2.tbl1 -> db2.tbl2 data gets massages/transformed based on what replicates from db1.tbl1 using triggers and functions


Other than that I reckon db1 -> db2 would be trigger based using something like slonik maybe (?) and data massage/transformation gets moved from db2 to db1 machine and then db1.tbl2 -> db2.tbl2 using pglogical

I was following you until the last part, "... moved from db2 to db1 machine and then db1.tbl2 -> db2.tbl2 ..."

Is this correct?

If so why db1 --> db2 --> db1 --> db2?

A complete answer is going to depend on at least an outline of what you mean by massage/transform?



Is this doable ? If so any pointers as to where to look about it ?


Many thanks
Armand






--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>


-- 
Adrian Klaver
adrian.klaver@aklaver.com