Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От Bertrand Drouvot
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id ZjtD9QwH9EB5c37e@ip-10-97-1-34.eu-west-3.compute.internal
обсуждение исходный текст
Ответ на RE: Synchronizing slots from primary to standby  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Список pgsql-hackers
Hi,

On Mon, Apr 29, 2024 at 11:58:09AM +0000, Zhijie Hou (Fujitsu) wrote:
> On Monday, April 29, 2024 5:11 PM shveta malik <shveta.malik@gmail.com> wrote:
> > 
> > On Mon, Apr 29, 2024 at 11:38 AM shveta malik <shveta.malik@gmail.com>
> > wrote:
> > >
> > > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu)
> > > <houzj.fnst@fujitsu.com> wrote:
> > > >
> > > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Thu, Mar 14, 2024 at 02:22:44AM +0000, Zhijie Hou (Fujitsu) wrote:
> > > > > > Hi,
> > > > > >
> > > > > > Since the standby_slot_names patch has been committed, I am
> > > > > > attaching the last doc patch for review.
> > > > > >
> > > > >
> > > > > Thanks!
> > > > >
> > > > > 1 ===
> > > > >
> > > > > +   continue subscribing to publications now on the new primary
> > > > > + server
> > > > > without
> > > > > +   any data loss.
> > > > >
> > > > > I think "without any data loss" should be re-worded in this
> > > > > context. Data loss in the sense "data committed on the primary and
> > > > > not visible on the subscriber in case of failover" can still occurs (in case
> > synchronous replication is not used).
> > > > >
> > > > > 2 ===
> > > > >
> > > > > +   If the result (<literal>failover_ready</literal>) of both above steps is
> > > > > +   true, existing subscriptions will be able to continue without data
> > loss.
> > > > > +  </para>
> > > > >
> > > > > I don't think that's true if synchronous replication is not used.
> > > > > Say,
> > > > >
> > > > > - synchronous replication is not used
> > > > > - primary is not able to reach the standby anymore and
> > > > > standby_slot_names is set
> > > > > - new data is inserted into the primary
> > > > > - then not replicated to subscriber (due to standby_slot_names)
> > > > >
> > > > > Then I think the both above steps will return true but data would
> > > > > be lost in case of failover.
> > > >
> > > > Thanks for the comments, attach the new version patch which reworded
> > > > the above places.

Thanks!

> Here is the V3 doc patch.

Thanks! A few comments:

1 ===

+   losing any data that has been flushed to the new primary server.

Worth to add a few words about possible data loss, something like?

Please note that in case synchronous replication is not used and standby_slot_names
is set correctly, it might be possible to lose data that would have been committed
on the old primary server (in case the standby was not reachable during that time 
for example).

2 ===

+test_sub=# SELECT
+               array_agg(slotname) AS slots
+           FROM
+           ((
+               SELECT r.srsubid AS subid, CONCAT('pg_', srsubid, '_sync_', srrelid, '_', ctl.system_identifier) AS
slotname
+               FROM pg_control_system() ctl, pg_subscription_rel r, pg_subscription s
+               WHERE r.srsubstate = 'f' AND s.oid = r.srsubid AND s.subfailover
+           ) UNION (

I guess this format comes from ReplicationSlotNameForTablesync(). What about
creating a SQL callable function on top of it and make use of it in the query
above? (that would ensure to keep the doc up to date even if the format changes
in ReplicationSlotNameForTablesync()).

3 ===

+test_sub=# SELECT
+               MAX(remote_lsn) AS remote_lsn_on_subscriber
+           FROM
+           ((
+               SELECT (CASE WHEN r.srsubstate = 'f' THEN pg_replication_origin_progress(CONCAT('pg_', r.srsubid, '_',
r.srrelid),false)
 
+                           WHEN r.srsubstate IN ('s', 'r') THEN r.srsublsn END) AS remote_lsn
+               FROM pg_subscription_rel r, pg_subscription s
+               WHERE r.srsubstate IN ('f', 's', 'r') AND s.oid = r.srsubid AND s.subfailover
+           ) UNION (
+               SELECT pg_replication_origin_progress(CONCAT('pg_', s.oid), false) AS remote_lsn
+               FROM pg_subscription s
+               WHERE s.subfailover
+           ));

What about adding a join to pg_replication_origin to get rid of the "hardcoded"
format "CONCAT('pg_', r.srsubid, '_', r.srrelid)" and "CONCAT('pg_', s.oid)"?

Idea behind 2 ===  and 3 === is to have the queries as generic as possible and
not rely on a hardcoded format (that would be more difficult to maintain should
those formats change in the future).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Richard Guo
Дата:
Сообщение: Re: A problem about partitionwise join
Следующее
От: Matthias van de Meent
Дата:
Сообщение: Expand applicability of aggregate's sortop optimization