Re: Exit walsender before confirming remote flush in logical replication
От | Andres Freund |
---|---|
Тема | Re: Exit walsender before confirming remote flush in logical replication |
Дата | |
Msg-id | DB1C23E1-938E-4074-8E6E-ECEA7A8A1EB5@anarazel.de обсуждение исходный текст |
Ответ на | Re: Exit walsender before confirming remote flush in logical replication (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
Re: Exit walsender before confirming remote flush in logical replication
|
Список | pgsql-hackers |
Hi, On February 5, 2023 8:29:19 PM PST, Amit Kapila <amit.kapila16@gmail.com> wrote: >On Sat, Feb 4, 2023 at 6:31 PM Andres Freund <andres@anarazel.de> wrote: >> >> On 2023-02-02 11:21:54 +0530, Amit Kapila wrote: >> > The main problem we want to solve here is to avoid shutdown failing in >> > case walreceiver/applyworker is busy waiting for some lock or for some >> > other reason as shown in the email [1]. >> >> Isn't handling this part of the job of wal_sender_timeout? >> > >In some cases, it is not clear whether we can handle it by >wal_sender_timeout. Consider a case of a time-delayed replica where >the applyworker will keep sending some response/alive message so that >walsender doesn't timeout in that (during delay) period. In that case, >because walsender won't timeout, the shutdown will fail (with the >failed message) even though it will be complete after the walsender is >able to send all the WAL and shutdown. The time-delayed replica patch >is still under discussion [1]. Also, for large values of >wal_sender_timeout, it will wait till the walsender times out and can >return with a failed message. > >> >> I don't at all agree that it's ok to just stop replicating changes >> because we're blocked on network IO. The patch justifies this with: >> >> > Currently, at shutdown, walsender processes wait to send all pending data and >> > ensure the all data is flushed in remote node. This mechanism was added by >> > 985bd7 for supporting clean switch over, but such use-case cannot be supported >> > for logical replication. This commit remove the blocking in the case. >> >> and at the start of the thread with: >> >> > In case of logical replication, however, we cannot support the use-case that >> > switches the role publisher <-> subscriber. Suppose same case as above, additional >> > transactions are committed while doing step2. To catch up such changes subscriber >> > must receive WALs related with trans, but it cannot be done because subscriber >> > cannot request WALs from the specific position. In the case, we must truncate all >> > data in new subscriber once, and then create new subscription with copy_data >> > = true. >> >> But that seems a too narrow view to me. Imagine you want to decomission >> the current primary, and instead start to use the logical standby as the >> primary. For that you'd obviously want to replicate the last few >> changes. But with the proposed change, that'd be hard to ever achieve. >> > >I think that can still be achieved with the idea being discussed which >is to keep allowing sending the WAL for smart shutdown mode but not >for other modes(fast, immediate). I don't know whether it is a good >idea or not but Kuroda-San has produced a POC patch for it. We can >instead choose to improve our docs related to shutdown to explain a >bit more about the shutdown's interaction with (logical and physical) >replication. As of now, it says: (“Smart” mode disallows new >connections, then waits for all existing clients to disconnect. If the >server is in hot standby, recovery and streaming replication will be >terminated once all clients have disconnected.)[2]. Here, it is not >clear that shutdown will wait for sending and flushing all the WALs. >The information for fast and immediate modes is even lesser which >makes it more difficult to understand what kind of behavior is >expected in those modes. > >[1] - https://commitfest.postgresql.org/42/3581/ >[2] - https://www.postgresql.org/docs/devel/app-pg-ctl.html > Smart shutdown is practically unusable. I don't think it makes sense to tie behavior of walsender to it in any way. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Damir BelyalovДата:
Сообщение: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Следующее
От: Tom LaneДата:
Сообщение: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)