Обсуждение: Logical Replication Sync Validation

Поиск
Список
Период
Сортировка

Logical Replication Sync Validation

От
Robert Sjöblom
Дата:
Greetings,

We have the following setup:
A pg10 cluster with logical replication to a pg15 cluster, set up with
the following query:
CREATE PUBLICATION "dbname_pub" FOR ALL TABLES;

We do an initial sync of DDL with pg_dump. The purpose of the
replication is to perform an online upgrade with minimal downtime.

At the moment where we would switch to pg15 being the primary/lead we
will stop writes to pg10; at that point we will validate that we are
fully in sync, tear down pg10 and send writes to pg15. Our question is
how we can validate our sync status. Given that there have been no DDL
changes on pg10 (new tables, for instance), would the following check be
sufficient?

Compare byte diff between WAL LSNs:
SELECT abs(pg_wal_lsn_diff(write_lsn, flush_lsn)) AS lag FROM
pg_stat_replication;

If the byte diff is 0, I would assume that we're fully in sync. Is this
understanding correct?

Another idea we've had would be to use CTID to fetch the last row
(update/insert) in each table on both sides and compare row content, is
this feasible? Is it safe to rely on CTIDs across logical replication?

best regards,
Robert Sjöblom

--
Innehållet i detta e-postmeddelande är konfidentiellt och avsett endast för
adressaten.Varje spridning, kopiering eller utnyttjande av innehållet är
förbjuden utan tillåtelse av avsändaren. Om detta meddelande av misstag
gått till fel adressat vänligen radera det ursprungliga meddelandet och
underrätta avsändaren via e-post



Re: Logical Replication Sync Validation

От
Christophe Pettus
Дата:

> On Apr 18, 2023, at 01:20, Robert Sjöblom <robert.sjoblom@fortnox.se> wrote:
> Another idea we've had would be to use CTID to fetch the last row (update/insert) in each table on both sides and
comparerow content, is this feasible? Is it safe to rely on CTIDs across logical replication? 

No.  CTIDs aren't sent over as part of logical replication, and there's no way to compare them between a publisher and
asubscriber. 


Re: Logical Replication Sync Validation

От
Christophe Pettus
Дата:

> On Apr 18, 2023, at 03:45, Robert Sjöblom <robert.sjoblom@fortnox.se> wrote:
> I'm aware of that. But you can, however, do something like:
>
> SELECT * FROM FOO WHERE CTID = (SELECT MAX(CTID) FROM FOO);
>
> on both sides. The idea being that if I change FOO, the CTID of the changed row will not be the same on both sides,
butthe CTID will point to the changed row on both sides. Or am I not understanding it correctly? 

CTIDs are not monotonically increasing; they're locator in the file that holds the table data.  There's no reason to
believethat a record on the subscriber will happen to land in any particular position. 


Re: Logical Replication Sync Validation

От
Laurenz Albe
Дата:
On Tue, 2023-04-18 at 10:20 +0200, Robert Sjöblom wrote:
> At the moment where we would switch to pg15 being the primary/lead we
> will stop writes to pg10; at that point we will validate that we are
> fully in sync, tear down pg10 and send writes to pg15. Our question is
> how we can validate our sync status. Given that there have been no DDL
> changes on pg10 (new tables, for instance), would the following check be
> sufficient?
>
> Compare byte diff between WAL LSNs:
> SELECT abs(pg_wal_lsn_diff(write_lsn, flush_lsn)) AS lag FROM
> pg_stat_replication;
>
> If the byte diff is 0, I would assume that we're fully in sync. Is this
> understanding correct?

I would compare the last replayed log sequence number with the current
position on the primary:

SELECT abs(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) AS lag
FROM pg_stat_replication;

If that is 0, you are good.

Yours,
Laurenz Albe