Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
От | Masahiko Sawada |
---|---|
Тема | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Дата | |
Msg-id | CAD21AoDsOkZQPOp=fm4hYAeHffCcKwm8m2RnvA5h_q39A-GENA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (Amit Kapila <amit.kapila16@gmail.com>) |
Список | pgsql-hackers |
On Thu, Sep 11, 2025 at 9:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Sep 11, 2025 at 11:16 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Wed, Sep 10, 2025 at 11:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Sat, Sep 6, 2025 at 3:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > I've attached the updated patch that incorporated all comments I got so far. > > > > > > > > > > * > > > + /* > > > + * While all processes are using the new status, there could be some > > > + * transactions that might have started with the old status. So wait > > > + * for the running transactions to complete so that logical decoding > > > + * doesn't include transactions that wrote WAL with insufficient > > > + * information. > > > + */ > > > + running = GetRunningTransactionData(); > > > + LWLockRelease(ProcArrayLock); > > > + LWLockRelease(XidGenLock); > > > + > > > + elog(DEBUG1, "waiting for %d transactions to complete", running->xcnt); > > > + > > > + for (int i = 0; i < running->xcnt; i++) > > > + { > > > + TransactionId xid = running->xids[i]; > > > + > > > + if (TransactionIdIsCurrentTransactionId(xid)) > > > + continue; > > > + > > > + XactLockTableWait(xid, NULL, NULL, XLTW_None); > > > + } > > > > > > When building a snapshot during the start of logical decoding, we > > > anyway wait for running transactions to finish via the snapbuild > > > machinery. So, why do we need it here? And if it is needed, can we > > > update the comments to explain why it is required in spite of > > > snapbuild machinery doing similar thing? > > > > Fair point. I don't see any reason we need to wait here. Will remove this step. > > > > We can add a comment there explaining why we don't wait for > in-progress transactions. This will also be important if we miss > anything and later need to handle it similarly. Yes, I'll add comments. > > One thing related to this which needs a discussion is after this > change, it is possible that part of the transaction contains > additional logical_wal_info. I couldn't think of a problem due to this > but users using pg_waldump or other WAL reading utilities could > question this. One possibility is that we always start including > logical_wal_info for the next new transaction but not sure if that is > required. It would be good if other people involved in the discussion > or otherwise could share their opinion on this point. I believe it's safe to write logical information to WAL records even when not strictly required, and it won't be a problem in practice. FYI a similar thing is true for full page writes; full page writes could be included in WAL records even when not strictly required (see UpdateFullPageWrites() for details). > > > > * Is it a good idea to enable/disable decoding for temporary logical > > > slots? The temporary slots are released during ERROR or at session > > > end, is that a good time to do the disable processing that even > > > requires WAL writing. > > > > I think the same is true for slots with RS_EPEMERAL state. Since it > > could confuse users if automatic effective_wal_level change is > > supported only for non-temporary slots, I personally would like not to > > push aside temporary slots. I agree that it might not be a good time > > to disable processing during process shutdown time; in addition to > > requiring WAL record, it also requires waits for concurrent state > > change processings while it holds all interrupts, which could easily > > involve dead-locks. > > > > Yes, all such processing during ERROR and shutdown sounds scary and a > source for problems. > > > It might be worth considering doing the disable > > process in a lazy way. For example, other processes (like > > checkpointer) periodically checks the logical decoding status and > > disables it if necessary. > > > > Yeah, doing lazily sounds reasonable to me. We need to do lazily only > for ERROR cases, otherwise, during a normal drop_slot, it may be okay. > But OTOH, while dropping the slot as a part of subscription drop, it > could be risky because if due to any reason, the disabling took more > time, the subscription drop operation would look like hang or in worse > case, the connection can time out. True. I thought that disabling logical decoding in a synchronous way is more preferable for users since it's guaranteed effective_wal_level gets decreased to 'replica' when drop-slot completes. However, one hypothesis is that users would not be interested in whether effective_wal_level is 'replica' or 'logical' but in being able to create logical slots even when wal_level is set to 'logical'. That is, if we use the lazy disabling approach for all cases, users would have to wait for effective_wal_level to be decreased to 'replica' if they want to check. But if users don't check that often in practice, the lazy approach would be a better way. > For the shutdown sequence, can't we think of resetting effective_wal > after a restart? Does it mean that effective_wal_level keeps 'logical' until the next server starts? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
В списке pgsql-hackers по дате отправления: