Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
| От | Amit Kapila |
|---|---|
| Тема | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
| Дата | |
| Msg-id | CAA4eK1+sxFJV0ZpxU0NyL7UDKu3nn5J6Jo7J4cL+yjq4MSGYiw@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (Masahiko Sawada <sawada.mshk@gmail.com>) |
| Список | pgsql-hackers |
On Tue, Nov 18, 2025 at 1:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Sun, Nov 16, 2025 at 9:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > In v26-0002-FIXUP-remove-status_change_allowed-flag, by using > > status_change_inprogress, we ensure that no backend is allowed to > > toggle the logical_wal/decoding status till startup process marks the > > recovery state as recovery_done. I am trying to think what problem > > this part of design prevents. I have considered the following > > scenarios: > > > > Scenario-1: > > 1. Startup process enables logical_wal and logical_decoding. Writes > > WAL record for it > > 2. Backend disables logical_decoding, writes WAL for it, and disables > > logical_wal. > > 3. Startup process sets recovery_done and allows wal_writes > > > > Say, instead of using status_change_inprogress to prevent doing > > step-2, if we had used recovery_in_progress kind of flag then how is > > it possible for backends to create any problem for the current node or > > cascaded standbys? I think the only way a problem can happen is if we > > write the WAL to disable_logical decoding after any backend could have > > written a non-logical WAL information record. Can that happen if we > > use the recovery_in_progress flag to prevent disable of logical_wal? > > If so, how? > > The main idea of holding status_change_inprogress until the recovery > end is to prevent concurrent toggling the logical decoding status. In > your scenario, IIUC backends cannot write any WAL yet at step-2 since > it's allowed at step-3. It would end up with a FATAL error actually. > One alternative is to make processes call LocalSetXLogInsertAllowed() > so that they can write WAL even during recovery, but I don't use it as > I'm concerned that it could lead to other problems. On the other hand, > we cannot let the backend to disable logical_decoding and logical_wal > without WAL warite at step-2 because otherwise the cascaded standby > won't disable logical decoding. > Why can't we postpone disabling logical WAL, decoding to the next cycle of checkpointer when RecoveryInProgress() is true without relying on status_change_inprogress? So, this will lead to a window where there are no logical slots but still the effective_wal_level is logical. However, this could be true even without considering this problem because the checkpointer can take some time to disable the logical WAL and decoding. The other problematic case to consider is during promotion, the startup has marked logical decoding as disabled but not yet marked recovery-done. Then the backend created a slot and returned without marking logical decoding as enabled due to relying on RecoveryInProgress(). Then the start-up marked Recovery-Done. Now we have a logical slot present, but logical decoding is disabled. I think we can simply disallow the creation of a logical slot in this window (where effective_wal_level is 'replica' and RecoveryInProgress() is true). If the above is feasible and sounds reasonable, then we don't even need the status_change_inprogress flag, at least not during the start-up flow. -- With Regards, Amit Kapila.
В списке pgsql-hackers по дате отправления: