Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
От | Masahiko Sawada |
---|---|
Тема | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Дата | |
Msg-id | CAD21AoB4txbyZm=8+pm9-o4XR8dCuzXdTP=3Ymr5k8T6YPjWMg@mail.gmail.com обсуждение исходный текст |
Ответ на | RE: POC: enable logical decoding when wal_level = 'replica' without a server restart ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
Ответы |
Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
|
Список | pgsql-hackers |
od On Tue, Sep 2, 2025 at 8:11 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Sawada-san, > > Here are my comments. > > 01. > ``` > checkPoint.logicalDecodingEnabled = IsLogicalDecodingEnabled(); > ``` > > Per my analysis, the value is always false here because StartupLogicalDecodingStatus > is not called yet. Can we use "false" directly? I think that it's better to read the shared flag instead of directly setting false since LogicalDecodingCtl is already initialized. > > 02. > ``` > elog(DEBUG1, "waiting for %d transactions to complete", running->xcnt); > ``` > > Here plural form is always used even if the running transaction is only one. > How about something like: > ``` > Number of transactions to wait finishing: %d > ``` Hmm, not sure it improves the message. I think we don't care much about plurals in debug messages. And in our convention the main log message doesn't start a capital character. > > 03. > ``` > while (RecoveryInProgress()) > { > pgstat_report_wait_start(WAIT_EVENT_LOGICAL_DECODING_STATUS_CHANGE_DELAY); > pg_usleep(100000L); /* wait for 100 msec */ > pgstat_report_wait_end(); > } > ``` > > I found a stuck case here: if a backend process within the loop and startup waits > a signal is processed, both of them can stuck. The backend waits the recovery > state to be DONE, and the startup waits all processes handle consume the signal. > IIUC we must add CHECK_FOR_INTERRUPTS() or ProcessProcSignalBarrier(). > > Actual steps: > > 0. constructed a streaming replication system, which the only primary server had > a logical slot. I.e., the effective_wal_level was logical. > 1. connected to a standby node > 2. attached to the backend process via gdb > 3. added a breakpoint at create_logical_replication_slot > 4. called pg_create_logical_replication_slot() on the backend. > the backend will stop before ReplicationSlotCreate(). > 5. from another terminal, attached to the startup process via gdb > 6. added a breakpoint at UpdateLogicalDecodingStatusEndOfRecovery() > 7. from another terminal, send a promote signal to the standby. > The startup will stop at UpdateLogicalDecodingStatusEndOfRecovery() > 8. executed steps on startup process, untill delay_status_change was updated > and LogicalDecodingControlLock was released. > 9. detached from the backend process. It would stop at the loop in > start_logical_decoding_status_change(). > 10. detached from the startup process. It would wait all processes handled the > signal, but the backend won't do. Good find! I'll fix the problem by adding CHECK_FOR_INTERRUPTS() as you suggested. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
В списке pgsql-hackers по дате отправления: