Обсуждение: Re: Hick ups in Postgresql Logical Replication

Поиск
Список
Период
Сортировка

Re: Hick ups in Postgresql Logical Replication

От
pavan95
Дата:
Hi,

I have configured Logical Replication. One Publisher and One Subscriber.
When I query the current LSN on publisher and subscriber, I see that there
is a lag between them.

For example if a published table consists of 3278 records, then on the
subscriber I see only 3275 records. So how to sync that lagged 3 records?

Is there a way to know the number of transaction to be replayed or from
which time the transactions are to be replayed from publisher to subscriber,
with the help of LSN 



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-admin-f2076596.html


Re: Hick ups in Postgresql Logical Replication

От
Mark Kirkwood
Дата:
Hi,

A lag would suggest that the subscriber is still catching up. However, 
there might be something else going on. We'd probably need to see how 
you actually did your set up, and also how you are deducing that there 
is a lag (e.g pg_stat_subscription)?

Don't forget to examine your logs for errors too.

regards

Mark


On 14/10/18 05:22, pavan95 wrote:
> Hi,
>
> I have configured Logical Replication. One Publisher and One Subscriber.
> When I query the current LSN on publisher and subscriber, I see that there
> is a lag between them.
>
> For example if a published table consists of 3278 records, then on the
> subscriber I see only 3275 records. So how to sync that lagged 3 records?
>
> Is there a way to know the number of transaction to be replayed or from
> which time the transactions are to be replayed from publisher to subscriber,
> with the help of LSN
>
>
>
> --
> Sent from: http://www.postgresql-archive.org/PostgreSQL-admin-f2076596.html
>



Re: Hick ups in Postgresql Logical Replication

От
pavan95
Дата:
Hi Mark,

Thanks for your response.

> We'd probably need to see how you actually did your set up, and also how
> you are deducing that there is a lag (e.g pg_stat_subscription)? 
  
*Setup*:

*Publisher *:

Out of n number of tables n-1 tables are opted for replication of DML
changes.

*Subscriber *:

Subscription is created on the subscriber for the corresponding publication.

*At times, I find(developer complains) about difference in records between
some of the published tables.*

For example, after successful configuration of replication, say it was
configured successfully on Oct 1'st 2018 and verified that it is
replicating. Then some n days in future, a developer complains a table named
abc is not getting replicated(meaning the data inserted/updated/deleted for
that table abc on publisher is not getting synced with subscriber).

*Findings*:

Here I found that for 1 of the published table the row count is not same on
both publisher and subscriber:

Publisher :  select count(*) from table_A;       No.of Rows= 3598

Subscriber: select count(*) from table_A:       No.of Rows= 3590

*Then my analysis will go like* :

1) Check the error log for any kind of errors-----------*No errors found*

2) Validate that the table structure(no.of.columns) on both primary and
subscriber.

As it is production, I need to to restore the data with minimum time. So I
will go for methods:

1). Truncate the table with cascade option--------which truncates the
records and resyncs the table and sometimes it will not work out.

2). If first approach will not work out, then I will go with dropping the
subscription and again configure the replication with copy_data=false
option.-----It works out.


As this can't be the baselined approach to be followed whenever this
complaint arises, a doubt raised in my head. 

1) Can we restore that only 3 records(or more from other tables) on the
online subscriber???

In this case LSN's on both the primary and secondary are different. So how
to identify the difference LSN's for that 3 records in that table? 

Assuming for 3 records there are 3 different LSN's, is there a way to
restore that only 3 LSN's data??

Can we list what all the LSN's are present in a single archive log
segment(000000010000000200000003). And how can I restore that archive
segment alone on the subscriber.

To be in specific, 

1) Can I get List of LSN's in an archive WAL segment file?
2) How to find that 3 records are in which archive file based on the LSN?
3) How can I restore that 3 records alone, in this case ?

Please ask if there is something which I needed to explain on my part.

Looking forward to hear from you. Thanks in Advance.





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-admin-f2076596.html


Re: Hick ups in Postgresql Logical Replication

От
Mark Kirkwood
Дата:
Hi


On 15/10/18 23:11, pavan95 wrote:
> Hi Mark,
>
> Thanks for your response.
>
>> We'd probably need to see how you actually did your set up, and also how
>> you are deducing that there is a lag (e.g pg_stat_subscription)?
>    
> *Setup*:
>
> *Publisher *:
>
> Out of n number of tables n-1 tables are opted for replication of DML
> changes.
>
> *Subscriber *:
>
> Subscription is created on the subscriber for the corresponding publication.
>
> *At times, I find(developer complains) about difference in records between
> some of the published tables.*
>
> For example, after successful configuration of replication, say it was
> configured successfully on Oct 1'st 2018 and verified that it is
> replicating. Then some n days in future, a developer complains a table named
> abc is not getting replicated(meaning the data inserted/updated/deleted for
> that table abc on publisher is not getting synced with subscriber).
>
> *Findings*:
>
> Here I found that for 1 of the published table the row count is not same on
> both publisher and subscriber:
>
> Publisher :  select count(*) from table_A;       No.of Rows= 3598
>
> Subscriber: select count(*) from table_A:       No.of Rows= 3590
>

I suggest trying to make a self contained test case illustrating this 
problem (maybe just a single table replicated). By that I mean an exact 
sequence of commands (so we can try it too and see if you have found a bug)!

With respect to the question about wal files and LSN, see the function 
pg_walfile_name(), e.g:

postgres=# SELECT pg_current_wal_lsn();
  pg_current_wal_lsn
--------------------
  0/51EB3BA8
(1 row)

postgres=# SELECT pg_walfile_name(pg_current_wal_lsn());
      pg_walfile_name
--------------------------
  000000010000000000000051
(1 row)


The other 2 questions will be more easily tackled when we know the exact 
set of commands you are using to do set things up.


regards
Mark


Re: Hick ups in Postgresql Logical Replication

От
pavan95
Дата:
>With respect to the question about wal files and LSN, see the function
pg_walfile_name(), e.g: 

>postgres=# SELECT pg_current_wal_lsn(); 
>  pg_current_wal_lsn 
>-------------------- 
>  0/51EB3BA8 
>(1 row) 

>postgres=# SELECT pg_walfile_name(pg_current_wal_lsn()); 
>      pg_walfile_name 
>-------------------------- 
>  000000010000000000000051 
>(1 row) 

Hi Mark,

My question was like how to get all the LSN's that are available in a WAL
segment. For example, take a segment named "000000010000000000000051". 

So in the above archive log segment, how to get all the LSN's of all txn's
which belong to that segment ?

*The idea for asking this is, I will get the remote_lsn on the subscriber
which was successfully applied from the view "pg_replication_origin_status".

Based on the LSN, I will check in which archive segment that particular LSN
is present(as you said above). 

And will find all the later LSN's(later to remote_lsn). 

So I can say that these many transactions are yet to be applied to the
target.*

Regards,
Pavan,
9841380956



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-admin-f2076596.html


Re: Hick ups in Postgresql Logical Replication

От
Mark Kirkwood
Дата:
Hi again,


On 16/10/18 18:26, pavan95 wrote:
>
> Hi Mark,
>
> My question was like how to get all the LSN's that are available in a WAL
> segment. For example, take a segment named "000000010000000000000051".
>
> So in the above archive log segment, how to get all the LSN's of all txn's
> which belong to that segment ?
>
> *The idea for asking this is, I will get the remote_lsn on the subscriber
> which was successfully applied from the view "pg_replication_origin_status".
>
> Based on the LSN, I will check in which archive segment that particular LSN
> is present(as you said above).
>
> And will find all the later LSN's(later to remote_lsn).
>
> So I can say that these many transactions are yet to be applied to the
> target.*
>
>
Well I think you are going to have to do a bit of work here to determine 
when the file name changes (e.g my system):

bench=# SELECT pg_walfile_name('0/51EB3BAE');
      pg_walfile_name
--------------------------
  000000010000000000000051
(1 row)

bench=# SELECT pg_walfile_name('0/50000001');
      pg_walfile_name
--------------------------
  000000010000000000000050
(1 row)

However, I'm not sure you are chasing the right problem. It seems to me 
that you need to grapple with what is (possibly) wrong with the 
streaming wal protocol used by logical rep, not the wal file business.

Again, I recommend you provide a precise test case that shows the 
problem, we can surely help you then!

regards
Mark