Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
| От | Masahiko Sawada |
|---|---|
| Тема | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
| Дата | |
| Msg-id | CAD21AoDHHUw=A3t59=1DUjtWROxcCCkXPsU3Ax1khUNoNm6M8w@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (shveta malik <shveta.malik@gmail.com>) |
| Ответы |
Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
|
| Список | pgsql-hackers |
On Thu, Dec 4, 2025 at 7:59 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Fri, Dec 5, 2025 at 12:23 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Dec 4, 2025 at 1:30 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Thu, Dec 4, 2025 at 1:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > >
> > > > I've attached the updated patch that incorporated the review comments
> > > > and is rebased to the current HEAD.
> > > >
> > >
> > > Thanks, a few things:
> > >
> > >
> > > 1)
> > > + The system automatically increases the
> > > + effective WAL level to <literal>logical</literal> when
> > > creating the first
> > > + logical replication slot, and decreases it back to
> > > <literal>replica</literal>
> > > + when dropping the last logical replication slot. The current
> > > effective WAL
> > > + level can be monitored through <xref
> > > linkend="guc-effective-wal-level"/>
> > > + parameter.
> > >
> > > Shouldn't we mention about invalidation of slot along with dropping of
> > > slot? I have suggested this earlier but I think it got missed to be
> > > addressed.
> >
> > Fixed. Sorry I missed it.
> >
> > >
> > > 2)
> > > + else if (sync_replication_slots)
> > > + {
> > > + /*
> > > + * Signal the postmaster to launch the slotsync worker.
> > > + *
> > > + * XXX: For simplicity, we keep the slotsync worker running
> > > + * even after logical decoding is disabled. A future
> > > + * improvement can consider starting and stopping the worker
> > > + * based on logical decoding status change.
> > > + */
> > > + kill(PostmasterPid, SIGUSR1);
> > > + }
> > > + }
> > > +
> > > + /* Update the status on shared memory */
> > > + UpdateLogicalDecodingStatus(logical_decoding, true);
> > >
> > > I see that we have moved 'Update' post slotsync's start attempt. This
> > > leaves a possibility that that slot-sync is started sooner than last
> > > 'Update' call and thus slotsync may exit with:
> > > replication slot synchronization requires "effective_wal_level" >=
> > > "logical" on the primary
> > >
> > > I see that update was prior to the slotsync step in earlier patches.
> > > Why have we moved it to a later stage?
> >
> > You're right. I've reconsidered the operation order and reverted the
> > previous change. On further thought, I think it should not happen to
> > decode the STATUS_CHANGE record because when applying the
> > STATUS_CHANGE record, we disable logical decoding first and then
> > invalidate the slots. There is no window where a logical slot can
> > creep in on the standby, and on the primary it cannot restart after
> > changing wal_level < replica if there is a pre-existing slot. If my
> > understanding is right, we should put a sanity check in xlog_decode().
> >
> > >
> > > 3)
> > > + /* Return if someone already started to enable logical decoding */
> > >
> > > Shall we update:
> > > Return if someone already started to enable logical decoding, or if it
> > > is already enabled.
> >
> > Fixed.
> >
> > I've attached the updated patches. The 0002 patch is a patch to
> > address the above review comments. If it looks good, I'll merge these
> > changes.
> >
>
> Thanks, I will review.
>
> While testing one of the scenarios on v33 patch, I observed a strange
> behaviour. I could not reproduce it again. I am not sure if it is due
> to the patch. Please have a look once.
> The scenario is:
>
> pub: create a slot to enable logical decoding, create table, insert few records
> standby: create slot and try to get-changes, get-changes hit some error:
>
> postgres=# SELECT pg_create_logical_replication_slot('slot13',
> 'pgoutput', false, false, false);
> pg_create_logical_replication_slot
> ------------------------------------
> (slot13,0/0306D448)
> (1 row)
>
> postgres=# select pg_logical_slot_get_binary_changes('slot13', NULL,
> NULL, 'proto_version', '4', 'publication_names', 'pub');
> ERROR: cannot query non-catalog table "pg_class" during logical decoding
>
Hmm, ISTM that the root cause is that we don't synchronously update
the XLogLogicalInfo cache on the standby. A backend on the standby who
started when logical decoding is disabled keeps holding
XLogLogicalInfo = false until the promotion. So even if the startup
process enables logical decoding when applying the STATUS_CHANGE
record, logical decoding is enabled on the standby but the backend
process would start logical decoding with XLogLogicalInfo = false,
resulting in RelationIsAccessibleInLogicalDecoding() returns false. I
missed the fact that we check XLogLogicalInfoActive() even read paths.
Will fix it.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
В списке pgsql-hackers по дате отправления: