Обсуждение: Re: Hick ups in Postgresql Logical Replication
Hi, I have configured Logical Replication. One Publisher and One Subscriber. When I query the current LSN on publisher and subscriber, I see that there is a lag between them. For example if a published table consists of 3278 records, then on the subscriber I see only 3275 records. So how to sync that lagged 3 records? Is there a way to know the number of transaction to be replayed or from which time the transactions are to be replayed from publisher to subscriber, with the help of LSN -- Sent from: http://www.postgresql-archive.org/PostgreSQL-admin-f2076596.html
Hi, A lag would suggest that the subscriber is still catching up. However, there might be something else going on. We'd probably need to see how you actually did your set up, and also how you are deducing that there is a lag (e.g pg_stat_subscription)? Don't forget to examine your logs for errors too. regards Mark On 14/10/18 05:22, pavan95 wrote: > Hi, > > I have configured Logical Replication. One Publisher and One Subscriber. > When I query the current LSN on publisher and subscriber, I see that there > is a lag between them. > > For example if a published table consists of 3278 records, then on the > subscriber I see only 3275 records. So how to sync that lagged 3 records? > > Is there a way to know the number of transaction to be replayed or from > which time the transactions are to be replayed from publisher to subscriber, > with the help of LSN > > > > -- > Sent from: http://www.postgresql-archive.org/PostgreSQL-admin-f2076596.html >
Hi Mark, Thanks for your response. > We'd probably need to see how you actually did your set up, and also how > you are deducing that there is a lag (e.g pg_stat_subscription)? *Setup*: *Publisher *: Out of n number of tables n-1 tables are opted for replication of DML changes. *Subscriber *: Subscription is created on the subscriber for the corresponding publication. *At times, I find(developer complains) about difference in records between some of the published tables.* For example, after successful configuration of replication, say it was configured successfully on Oct 1'st 2018 and verified that it is replicating. Then some n days in future, a developer complains a table named abc is not getting replicated(meaning the data inserted/updated/deleted for that table abc on publisher is not getting synced with subscriber). *Findings*: Here I found that for 1 of the published table the row count is not same on both publisher and subscriber: Publisher : select count(*) from table_A; No.of Rows= 3598 Subscriber: select count(*) from table_A: No.of Rows= 3590 *Then my analysis will go like* : 1) Check the error log for any kind of errors-----------*No errors found* 2) Validate that the table structure(no.of.columns) on both primary and subscriber. As it is production, I need to to restore the data with minimum time. So I will go for methods: 1). Truncate the table with cascade option--------which truncates the records and resyncs the table and sometimes it will not work out. 2). If first approach will not work out, then I will go with dropping the subscription and again configure the replication with copy_data=false option.-----It works out. As this can't be the baselined approach to be followed whenever this complaint arises, a doubt raised in my head. 1) Can we restore that only 3 records(or more from other tables) on the online subscriber??? In this case LSN's on both the primary and secondary are different. So how to identify the difference LSN's for that 3 records in that table? Assuming for 3 records there are 3 different LSN's, is there a way to restore that only 3 LSN's data?? Can we list what all the LSN's are present in a single archive log segment(000000010000000200000003). And how can I restore that archive segment alone on the subscriber. To be in specific, 1) Can I get List of LSN's in an archive WAL segment file? 2) How to find that 3 records are in which archive file based on the LSN? 3) How can I restore that 3 records alone, in this case ? Please ask if there is something which I needed to explain on my part. Looking forward to hear from you. Thanks in Advance. -- Sent from: http://www.postgresql-archive.org/PostgreSQL-admin-f2076596.html
Hi On 15/10/18 23:11, pavan95 wrote: > Hi Mark, > > Thanks for your response. > >> We'd probably need to see how you actually did your set up, and also how >> you are deducing that there is a lag (e.g pg_stat_subscription)? > > *Setup*: > > *Publisher *: > > Out of n number of tables n-1 tables are opted for replication of DML > changes. > > *Subscriber *: > > Subscription is created on the subscriber for the corresponding publication. > > *At times, I find(developer complains) about difference in records between > some of the published tables.* > > For example, after successful configuration of replication, say it was > configured successfully on Oct 1'st 2018 and verified that it is > replicating. Then some n days in future, a developer complains a table named > abc is not getting replicated(meaning the data inserted/updated/deleted for > that table abc on publisher is not getting synced with subscriber). > > *Findings*: > > Here I found that for 1 of the published table the row count is not same on > both publisher and subscriber: > > Publisher : select count(*) from table_A; No.of Rows= 3598 > > Subscriber: select count(*) from table_A: No.of Rows= 3590 > I suggest trying to make a self contained test case illustrating this problem (maybe just a single table replicated). By that I mean an exact sequence of commands (so we can try it too and see if you have found a bug)! With respect to the question about wal files and LSN, see the function pg_walfile_name(), e.g: postgres=# SELECT pg_current_wal_lsn(); pg_current_wal_lsn -------------------- 0/51EB3BA8 (1 row) postgres=# SELECT pg_walfile_name(pg_current_wal_lsn()); pg_walfile_name -------------------------- 000000010000000000000051 (1 row) The other 2 questions will be more easily tackled when we know the exact set of commands you are using to do set things up. regards Mark
>With respect to the question about wal files and LSN, see the function pg_walfile_name(), e.g: >postgres=# SELECT pg_current_wal_lsn(); > pg_current_wal_lsn >-------------------- > 0/51EB3BA8 >(1 row) >postgres=# SELECT pg_walfile_name(pg_current_wal_lsn()); > pg_walfile_name >-------------------------- > 000000010000000000000051 >(1 row) Hi Mark, My question was like how to get all the LSN's that are available in a WAL segment. For example, take a segment named "000000010000000000000051". So in the above archive log segment, how to get all the LSN's of all txn's which belong to that segment ? *The idea for asking this is, I will get the remote_lsn on the subscriber which was successfully applied from the view "pg_replication_origin_status". Based on the LSN, I will check in which archive segment that particular LSN is present(as you said above). And will find all the later LSN's(later to remote_lsn). So I can say that these many transactions are yet to be applied to the target.* Regards, Pavan, 9841380956 -- Sent from: http://www.postgresql-archive.org/PostgreSQL-admin-f2076596.html
Hi again, On 16/10/18 18:26, pavan95 wrote: > > Hi Mark, > > My question was like how to get all the LSN's that are available in a WAL > segment. For example, take a segment named "000000010000000000000051". > > So in the above archive log segment, how to get all the LSN's of all txn's > which belong to that segment ? > > *The idea for asking this is, I will get the remote_lsn on the subscriber > which was successfully applied from the view "pg_replication_origin_status". > > Based on the LSN, I will check in which archive segment that particular LSN > is present(as you said above). > > And will find all the later LSN's(later to remote_lsn). > > So I can say that these many transactions are yet to be applied to the > target.* > > Well I think you are going to have to do a bit of work here to determine when the file name changes (e.g my system): bench=# SELECT pg_walfile_name('0/51EB3BAE'); pg_walfile_name -------------------------- 000000010000000000000051 (1 row) bench=# SELECT pg_walfile_name('0/50000001'); pg_walfile_name -------------------------- 000000010000000000000050 (1 row) However, I'm not sure you are chasing the right problem. It seems to me that you need to grapple with what is (possibly) wrong with the streaming wal protocol used by logical rep, not the wal file business. Again, I recommend you provide a precise test case that shows the problem, we can surely help you then! regards Mark