Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
От | Masahiko Sawada |
---|---|
Тема | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Дата | |
Msg-id | CAD21AoAXZmk3os1c4Hf=Gzd82-rH=g6sPGRyhWC5i4eMQTxYrw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (Masahiko Sawada <sawada.mshk@gmail.com>) |
Список | pgsql-hackers |
On Fri, Sep 5, 2025 at 3:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Sep 4, 2025 at 11:15 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > od On Tue, Sep 2, 2025 at 8:11 PM Hayato Kuroda (Fujitsu) > > <kuroda.hayato@fujitsu.com> wrote: > > > > > > Dear Sawada-san, > > > > > > Here are my comments. > > > > > > 01. > > > ``` > > > checkPoint.logicalDecodingEnabled = IsLogicalDecodingEnabled(); > > > ``` > > > > > > Per my analysis, the value is always false here because StartupLogicalDecodingStatus > > > is not called yet. Can we use "false" directly? > > > > I think that it's better to read the shared flag instead of directly > > setting false since LogicalDecodingCtl is already initialized. > > > > > > > > 02. > > > ``` > > > elog(DEBUG1, "waiting for %d transactions to complete", running->xcnt); > > > ``` > > > > > > Here plural form is always used even if the running transaction is only one. > > > How about something like: > > > ``` > > > Number of transactions to wait finishing: %d > > > ``` > > > > Hmm, not sure it improves the message. I think we don't care much > > about plurals in debug messages. And in our convention the main log > > message doesn't start a capital character. > > > > > > > > > > 03. > > > ``` > > > while (RecoveryInProgress()) > > > { > > > pgstat_report_wait_start(WAIT_EVENT_LOGICAL_DECODING_STATUS_CHANGE_DELAY); > > > pg_usleep(100000L); /* wait for 100 msec */ > > > pgstat_report_wait_end(); > > > } > > > ``` > > > > > > I found a stuck case here: if a backend process within the loop and startup waits > > > a signal is processed, both of them can stuck. The backend waits the recovery > > > state to be DONE, and the startup waits all processes handle consume the signal. > > > IIUC we must add CHECK_FOR_INTERRUPTS() or ProcessProcSignalBarrier(). > > > > > > Actual steps: > > > > > > 0. constructed a streaming replication system, which the only primary server had > > > a logical slot. I.e., the effective_wal_level was logical. > > > 1. connected to a standby node > > > 2. attached to the backend process via gdb > > > 3. added a breakpoint at create_logical_replication_slot > > > 4. called pg_create_logical_replication_slot() on the backend. > > > the backend will stop before ReplicationSlotCreate(). > > > 5. from another terminal, attached to the startup process via gdb > > > 6. added a breakpoint at UpdateLogicalDecodingStatusEndOfRecovery() > > > 7. from another terminal, send a promote signal to the standby. > > > The startup will stop at UpdateLogicalDecodingStatusEndOfRecovery() > > > 8. executed steps on startup process, untill delay_status_change was updated > > > and LogicalDecodingControlLock was released. > > > 9. detached from the backend process. It would stop at the loop in > > > start_logical_decoding_status_change(). > > > 10. detached from the startup process. It would wait all processes handled the > > > signal, but the backend won't do. > > > > Good find! I'll fix the problem by adding CHECK_FOR_INTERRUPTS() as > > you suggested. > > I've attached the updated patch that incorporated all comments I got so far. Sorry, I've attached the wrong version. Please find the attached correct one. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Вложения
В списке pgsql-hackers по дате отправления: