Обсуждение: Logical Replication Sync Validation
Greetings, We have the following setup: A pg10 cluster with logical replication to a pg15 cluster, set up with the following query: CREATE PUBLICATION "dbname_pub" FOR ALL TABLES; We do an initial sync of DDL with pg_dump. The purpose of the replication is to perform an online upgrade with minimal downtime. At the moment where we would switch to pg15 being the primary/lead we will stop writes to pg10; at that point we will validate that we are fully in sync, tear down pg10 and send writes to pg15. Our question is how we can validate our sync status. Given that there have been no DDL changes on pg10 (new tables, for instance), would the following check be sufficient? Compare byte diff between WAL LSNs: SELECT abs(pg_wal_lsn_diff(write_lsn, flush_lsn)) AS lag FROM pg_stat_replication; If the byte diff is 0, I would assume that we're fully in sync. Is this understanding correct? Another idea we've had would be to use CTID to fetch the last row (update/insert) in each table on both sides and compare row content, is this feasible? Is it safe to rely on CTIDs across logical replication? best regards, Robert Sjöblom -- Innehållet i detta e-postmeddelande är konfidentiellt och avsett endast för adressaten.Varje spridning, kopiering eller utnyttjande av innehållet är förbjuden utan tillåtelse av avsändaren. Om detta meddelande av misstag gått till fel adressat vänligen radera det ursprungliga meddelandet och underrätta avsändaren via e-post
> On Apr 18, 2023, at 01:20, Robert Sjöblom <robert.sjoblom@fortnox.se> wrote: > Another idea we've had would be to use CTID to fetch the last row (update/insert) in each table on both sides and comparerow content, is this feasible? Is it safe to rely on CTIDs across logical replication? No. CTIDs aren't sent over as part of logical replication, and there's no way to compare them between a publisher and asubscriber.
> On Apr 18, 2023, at 03:45, Robert Sjöblom <robert.sjoblom@fortnox.se> wrote: > I'm aware of that. But you can, however, do something like: > > SELECT * FROM FOO WHERE CTID = (SELECT MAX(CTID) FROM FOO); > > on both sides. The idea being that if I change FOO, the CTID of the changed row will not be the same on both sides, butthe CTID will point to the changed row on both sides. Or am I not understanding it correctly? CTIDs are not monotonically increasing; they're locator in the file that holds the table data. There's no reason to believethat a record on the subscriber will happen to land in any particular position.
On Tue, 2023-04-18 at 10:20 +0200, Robert Sjöblom wrote: > At the moment where we would switch to pg15 being the primary/lead we > will stop writes to pg10; at that point we will validate that we are > fully in sync, tear down pg10 and send writes to pg15. Our question is > how we can validate our sync status. Given that there have been no DDL > changes on pg10 (new tables, for instance), would the following check be > sufficient? > > Compare byte diff between WAL LSNs: > SELECT abs(pg_wal_lsn_diff(write_lsn, flush_lsn)) AS lag FROM > pg_stat_replication; > > If the byte diff is 0, I would assume that we're fully in sync. Is this > understanding correct? I would compare the last replayed log sequence number with the current position on the primary: SELECT abs(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) AS lag FROM pg_stat_replication; If that is 0, you are good. Yours, Laurenz Albe