Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Дата
Msg-id CAA4eK1+sxFJV0ZpxU0NyL7UDKu3nn5J6Jo7J4cL+yjq4MSGYiw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: POC: enable logical decoding when wal_level = 'replica' without a server restart  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers
On Tue, Nov 18, 2025 at 1:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sun, Nov 16, 2025 at 9:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> >
> > In v26-0002-FIXUP-remove-status_change_allowed-flag, by using
> > status_change_inprogress, we ensure that no backend is allowed to
> > toggle the logical_wal/decoding status till startup process marks the
> > recovery state as recovery_done. I am trying to think what problem
> > this part of design prevents. I have considered the following
> > scenarios:
> >
> > Scenario-1:
> > 1. Startup process enables logical_wal and logical_decoding. Writes
> > WAL record for it
> > 2. Backend disables logical_decoding, writes WAL for it, and disables
> > logical_wal.
> > 3. Startup process sets recovery_done and allows wal_writes
> >
> > Say, instead of using status_change_inprogress to prevent doing
> > step-2, if we had used recovery_in_progress kind of flag then how is
> > it possible for backends to create any problem for the current node or
> > cascaded standbys? I think the only way a problem can happen is if we
> > write the WAL to disable_logical decoding after any backend could have
> > written a non-logical WAL information record. Can that happen if we
> > use the recovery_in_progress flag to prevent disable of logical_wal?
> > If so, how?
>
> The main idea of holding status_change_inprogress until the recovery
> end is to prevent concurrent toggling the logical decoding status. In
> your scenario, IIUC backends cannot write any WAL yet at step-2 since
> it's allowed at step-3. It would end up with a FATAL error actually.
> One alternative is to make processes call LocalSetXLogInsertAllowed()
> so that they can write WAL even during recovery, but I don't use it as
> I'm concerned that it could lead to other problems. On the other hand,
> we cannot let the backend to disable logical_decoding and logical_wal
> without WAL warite at step-2 because otherwise the cascaded standby
> won't disable logical decoding.
>

Why can't we postpone disabling logical WAL, decoding to the next
cycle of checkpointer when RecoveryInProgress() is true without
relying on status_change_inprogress? So, this will lead to a window
where there are no logical slots but still the effective_wal_level is
logical. However, this could be true even without considering this
problem because the checkpointer can take some time to disable the
logical WAL and decoding.

The other problematic case to consider is during promotion, the
startup has marked logical decoding as disabled but not yet marked
recovery-done. Then the backend created a slot and returned without
marking logical decoding as enabled due to relying on
RecoveryInProgress(). Then the start-up marked Recovery-Done. Now we
have a logical slot present, but logical decoding is disabled. I think
we can simply disallow the creation of a logical slot in this window
(where effective_wal_level is 'replica' and RecoveryInProgress() is
true).

If the above is feasible and sounds reasonable, then we don't even
need the status_change_inprogress flag, at least not during the
start-up flow.

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления: