Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

Поиск
Список
Период
Сортировка
От Shlok Kyal
Тема Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Дата
Msg-id CANhcyEWQDRciF6iuWZLC6eKrcQ3sL2QZpD-jekW9aCVxpzHDzw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: POC: enable logical decoding when wal_level = 'replica' without a server restart  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Список pgsql-hackers
On Sat, 23 Aug 2025 at 03:51, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Aug 21, 2025 at 8:11 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Aug 21, 2025 at 10:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Wed, Aug 20, 2025 at 3:11 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > Please find a few comments:
> > >
> > > Thank you for reviewing the patch!
> > >
> > > >
> > > > 1)
> > > > ReplicationSlotsDropDBSlots:
> > > > + bool dropped = false;
> > > >
> > > > We can name 'dropped ' as 'dropped_logical' similar to ReplicationSlotCleanup.
> > >
> > > I think we don't necessarily need to add 'logical' because this
> > > function attempts to drop only logical slots unlike
> > > ReplicationSlotCleanup().
> >
> > Okay, I see. I missed that point earlier.
> >
> > >
> > > >
> > > > 2)
> > > > ReplicationSlotsDropDBSlots()
> > > > +
> > > > + if (dropped && nlogicalslots == 0)
> > > > + DisableLogicalDecodingIfNecessary();
> > > >
> > > > I could not understand the need of 'nlogicalslots' condition here?
> > > > Once we increment 'nlogicalslots', there is no way we can skip the
> > > > loop without dropping the slot with the only exception of ERROR-ing
> > > > out if active_pid is non NULL. So if the loop has completed and we
> > > > have reached this sage, won't it essentially mean 'nlogicalslots' is 0
> > > > in both cases: a) we actually dropped any slot;  b) we did not find
> > > > any slot to drop.  Or am I missing something?
> > >
> > > I think I should have incremented nlogicalslots even for logical slots
> > > on other databases. What I want to do here is to call
> > > DisableLogicalDecodingIfNecessary() only when we have dropped at least
> > > one logical slots and there is no logical slots on the whole database
> > > cluster as a result. If we have logical slots only on the current
> > > database, we eventually reach the above 'if' statement with
> > > dropped=true and nlogicalslots=0. On the other hand, if we have
> > > logical slots also on other databases, we reach there with
> > > dropped=true and nlogicalslots>0, meaning we don't want to disable
> > > logical decoding. Does it make sense?
> > >
> >
> > Yes, it makes sense after incrementing 'nlogicalslots' even for other databases.
> >
> > > >
> > > > Same is the case with ReplicationSlotCleanup().
> > > >
> > > > 3)
> > > > Few typos:
> > > >
> > > > + /*
> > > > + * Update shmem flags. We don't need to care about the order of setting
> > > > + * global flag and writing the WAL record this case since writes are not
> > > > + * allowed yet.
> > > > + */
> > > >
> > > > this case --> in  this case
> > > >
> > > > + * This is the authoritative value used by the all process to determine
> > > >
> > > > 'used by all the processes'
> > >
> > > Fixed.
> > >
> > > > 049_effective_wal_level.pl:
> > > > 4)
> > > >
> > > > Few typos:
> > > > +# Initialize standby2 ndoe form the backup 'my_backup'.
> > > >
> > > > ndoe form --> node from
> > > >
> > > > +# Test the race condition between the startup and logical decoding
> > > > statuc change.
> > > >
> > > > statuc --> status
> > >
> > > Fixed.
> > >
> > > >
> > > > 5)
> > > > +# Promote the standby2 node that has one logical slot. So the logical decoding
> > > > +# keeps enabled even after the promotion.
> > > > +$standby2->promote;
> > > > +test_wal_level($standby2, "replica|logical",
> > > > + "effective_wal_level keeps 'logical' even after the promotion");
> > > > +$standby2->safe_psql('postgres',
> > > > + qq[select pg_create_logical_replication_slot('standby2_slot2', 'pgoutput')]
> > > > +);
> > > > +$standby2->stop;
> > > >
> > > > Do we need 'pg_create_logical_replication_slot' here?
> > >
> > > Yes, I put it to check if we can create a logical slot even after the
> > > promotion. I've added the comment to explain it.
> > >
> >
> > Okay, makes sense.
> >
> > > >
> > > > 6)
> > > >
> > > > +test_wal_level($primary, "replica|replica",
> > > > + "effective_wal_level got decreased to 'replica' on primary");
> > > > +test_wal_level($standby3, "logical|replica",
> > > > + "effective_wal_level got decreased to 'replica' on standby");
> > > > +test_wal_level($cascade, "replica|replica",
> > > > + "effective_wal_level got decreased to 'logical' on standby");
> > > > +
> > > >
> > > > Last one should also say:  decreased to 'replica' (instead of logical)
> > >
> > > Fixed.
> > >
> > > I've attached the updated patch.
>
> I found that we don't need to expose LogicalDecodingCtlData in
> logicalctl.h header file. I've updated some cosmetic changes including
> that point.
>
> I think the patch is getting pretty good shape and am aiming at
> getting this patch committed during the September commitfest. Is there
> any further tests and verifications we need? Of course further patch
> reviews are also welcome.
>

Hi Sawada-san,

I reviewed the latest patch and have following comments:

1. In commit message, word 'slot' is missing:
When the first logical replication is created, the system
automatically increases the effective WAL level to maintain

Instead it should be:
When the first logical replication slot is created, ...

2. In slot.c:
+/*
+ * Returns if there is at least in-use logical replication slot.
+ */

Should we update it to:
Returns true if there is at least one in-use logical replication slot.

3. Due to recent commit [1], we cannot use "sync_replication_slots" =
on when wal_level < logical.
We get following error on standby:
2025-08-25 16:37:04.757 IST [2901542] FATAL:  replication slot
synchronization ("sync_replication_slots" = on) requires "wal_level"
>= "logical"
If we set the wal_level = logical on standby, then this error does not
appear and a slot sync worker is spawned.

With this patch, I think we can allow use of "sync_replication_slots"
= on when wal_level >= replica as standby will be dependent on
effective_wal_level on primary. Thoughts?
I also see that with patch, the use of pg_sync_replication_slots()
works with wal_level = replica.

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=12da45742cfd15d9fab151b25400d96a1febcbde

Thanks,
Shlok Kyal



В списке pgsql-hackers по дате отправления: