Обсуждение: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?
Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?
От
Craig Ringer
Дата:
Hi all While adding support for logical decoding on standby I noticed that the walsender doesn't respect SIGUSR1 with PROCSIG_RECOVERY_CONFLICT_DATABASE set. It blindly assumes that it means there's new WAL: WalSndSignals(void) { ... pqsignal(SIGUSR1, WalSndXLogSendHandler); /* request WAL sending */ ... } since all WalSndXLogSendHandler does is set the latch. Handling recovery conflicts in the walsender is neccessary for logical decoding on standby, so that we can replay drop database. All the recovery conflict machinery is currently contained in postgres.c and not used by, or accessible to, the walsender. It actually works to just set procsignal_sigusr1_handler as walsender's SIGUSR1 handler, but I'm not convinced it's safe: Most of the procsignals don't make sense for walsender and could possibly attempts things that use state the walsender doesn't have set up. The comments on procsignal say that callers should tolerate getting the wrong signal due to possible races: * Also, because of race conditions, it's important that all the signals be * defined so that no harm is done if a process mistakenly receives one. (procsignal.h) I'm wondering about adding a new state flag IsWalSender and have RecoveryConflictInterrupt() ignore most conflict reasons if IsWalSender is true. Though it strikes me that during logical decoding on standby, the walsender could quite possibly conflict with other things too, so it'd be better to make it safe to handle all the conflict cases within the walsender. Anyway, this PoC passes regression tests and allows drop database on a standby to succeed when a slot is in-use. Not for commit as-is. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?
От
Craig Ringer
Дата:
On 18 November 2016 at 10:57, Craig Ringer <craig@2ndquadrant.com> wrote: > Hi all > > While adding support for logical decoding on standby I noticed that > the walsender doesn't respect > SIGUSR1 with PROCSIG_RECOVERY_CONFLICT_DATABASE set. I have an update on this after further study. Surprisingly I haven't found a way to crash the walsender with it, but as expected it is also not really correct as-is. It's fine for terminating walsender sessions on database drop, or for holding a lock on an object too long. Termination due to conflict with vacuum is more problematic because: * it'll try to terminate a logical decoding walsender even if there's no real conflict, since only normal relation blocks were affected, not user catalogs or system catalogs; and * most of the time its attempts to terminate won't do anything because there's no xact running on the walsender at the time it checks for termination. So that's definitely going to need more work. I'm still interested in hearing comments from experienced folks about whether it's sane to do this at all, rather than have similar duplicate signal handling for the walsender. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Nov 21, 2016 at 3:21 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > I'm still interested in hearing comments from experienced folks about > whether it's sane to do this at all, rather than have similar > duplicate signal handling for the walsender. Well, I mean, if it's reasonable to share code in a given situation, that is generally better than NOT sharing code... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?
От
Kyotaro HORIGUCHI
Дата:
Hello, At Mon, 21 Nov 2016 21:39:07 -0500, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoZh6M5bTmtZZm1vBpFGHmeNgESe4UrJ3-OwKnu56SKB7g@mail.gmail.com> > On Mon, Nov 21, 2016 at 3:21 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > > I'm still interested in hearing comments from experienced folks about > > whether it's sane to do this at all, rather than have similar > > duplicate signal handling for the walsender. > > Well, I mean, if it's reasonable to share code in a given situation, > that is generally better than NOT sharing code... Walsender handles SIGUSR1 completely different way from normal backends. The procsignal_sigusr1_handler is designed to work as the peer of SendProcSignal (not ProcSendSignal:). Walsender is just using a latch to be woken up. It has nothing to do with SendProcSignal. IMHO, I don't think walsender is allowed to just share the backends' handler for a practical reason that pss_signalFlags can harm. If you need to expand the function of SIGUSR1 of walsender, more thought would be needed. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?
От
Craig Ringer
Дата:
On 22 November 2016 at 11:35, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > Hello, > > At Mon, 21 Nov 2016 21:39:07 -0500, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoZh6M5bTmtZZm1vBpFGHmeNgESe4UrJ3-OwKnu56SKB7g@mail.gmail.com> >> On Mon, Nov 21, 2016 at 3:21 AM, Craig Ringer <craig@2ndquadrant.com> wrote: >> > I'm still interested in hearing comments from experienced folks about >> > whether it's sane to do this at all, rather than have similar >> > duplicate signal handling for the walsender. >> >> Well, I mean, if it's reasonable to share code in a given situation, >> that is generally better than NOT sharing code... > > Walsender handles SIGUSR1 completely different way from normal > backends. The procsignal_sigusr1_handler is designed to work as > the peer of SendProcSignal (not ProcSendSignal:). Walsender is > just using a latch to be woken up. It has nothing to do with > SendProcSignal. Indeed, at the moment it doesn't. I'm proposing to change that, since walsenders doing logical decoding on a standby need to be notified of conflicts with recovery, many of which are the same as for normal user backends and bgworkers. The main exception is snapshot conflicts, where logical decoding walsenders don't care about conflicts with specific tables, they want to be terminated if the effective catalog_xmin on the upstream increases past their needed catalog_xmin. They don't care about non-catalog (or non-heap-catalog) tables. So I expect we'd just want to ignore PROCSIG_RECOVERY_CONFLICT_SNAPSHOT when running a walsender and handle conflict with catalog_xmin increases separately. > IMHO, I don't think walsender is allowed to just share the > backends' handler for a practical reason that pss_signalFlags can > harm. I'm not sure I see the problem. The ProcSignalReason argument to RecoveryConflictInterrupt states that: * Also, because of race conditions, it's important that all the signals be* defined so that no harm is done if a processmistakenly receives one. > If you need to expand the function of SIGUSR1 of walsender, more > thought would be needed. Yeah, I definitely don't think it's as simple as just using procsignal_sigusr1_handler as-is. I expect we'd likely create a new global IsWalSender and ignore some RecoveryConflictInterrupt cases when in a walsender, at least PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, and probably add a new case for catalog_xmin conflicts that's only acted on when IsWalSender. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?
От
Kyotaro HORIGUCHI
Дата:
Hello, At Tue, 22 Nov 2016 12:35:56 +0800, Craig Ringer <craig@2ndquadrant.com> wrote in <CAMsr+YF9-pojdPukTO7XGi+G1w=aRkv=tV7xBG0OPQdD+xX+-Q@mail.gmail.com> > On 22 November 2016 at 11:35, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > Hello, > > > > At Mon, 21 Nov 2016 21:39:07 -0500, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoZh6M5bTmtZZm1vBpFGHmeNgESe4UrJ3-OwKnu56SKB7g@mail.gmail.com> > >> On Mon, Nov 21, 2016 at 3:21 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > >> > I'm still interested in hearing comments from experienced folks about > >> > whether it's sane to do this at all, rather than have similar > >> > duplicate signal handling for the walsender. > >> > >> Well, I mean, if it's reasonable to share code in a given situation, > >> that is generally better than NOT sharing code... > > > > Walsender handles SIGUSR1 completely different way from normal > > backends. The procsignal_sigusr1_handler is designed to work as > > the peer of SendProcSignal (not ProcSendSignal:). Walsender is > > just using a latch to be woken up. It has nothing to do with > > SendProcSignal. > > Indeed, at the moment it doesn't. I'm proposing to change that, since > walsenders doing logical decoding on a standby need to be notified of > conflicts with recovery, many of which are the same as for normal user > backends and bgworkers. > > The main exception is snapshot conflicts, where logical decoding > walsenders don't care about conflicts with specific tables, they want > to be terminated if the effective catalog_xmin on the upstream > increases past their needed catalog_xmin. They don't care about > non-catalog (or non-heap-catalog) tables. So I expect we'd just want > to ignore PROCSIG_RECOVERY_CONFLICT_SNAPSHOT when running a walsender > and handle conflict with catalog_xmin increases separately. Thank you for the explanation. It seems to be reasonable for walsender to have the same mechanism with procsignal_sigusr1_handler. Sharing a handler or having another one is a matter of design. (But sharing it might not be better if walsender handles only two reasons.) > > IMHO, I don't think walsender is allowed to just share the > > backends' handler for a practical reason that pss_signalFlags can > > harm. > > I'm not sure I see the problem. The ProcSignalReason argument to > RecoveryConflictInterrupt states that: > > * Also, because of race conditions, it's important that all the signals be > * defined so that no harm is done if a process mistakenly receives one. It won't cause actual problem, but it is not managed on the *current* walsender. I meant that taking any action based on unmanaged variable seems to be a flaw of design. But anyway no problem if it is redesigned to be used on walsender. > > If you need to expand the function of SIGUSR1 of walsender, more > > thought would be needed. > > Yeah, I definitely don't think it's as simple as just using > procsignal_sigusr1_handler as-is. I expect we'd likely create a new > global IsWalSender and ignore some RecoveryConflictInterrupt cases > when in a walsender, at least PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, and > probably add a new case for catalog_xmin conflicts that's only acted > on when IsWalSender. The global is unncecessary if walsender have a different handler from normal backends. If there's at least one or few additional reasons for signal, sharing SendProcSignal and having dedicate handler might be better. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?
От
Kyotaro HORIGUCHI
Дата:
No that's not right. At Tue, 22 Nov 2016 17:45:34 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20161122.174534.266086549.horiguchi.kyotaro@lab.ntt.co.jp> > Hello, > > At Tue, 22 Nov 2016 12:35:56 +0800, Craig Ringer <craig@2ndquadrant.com> wrote in <CAMsr+YF9-pojdPukTO7XGi+G1w=aRkv=tV7xBG0OPQdD+xX+-Q@mail.gmail.com> > > On 22 November 2016 at 11:35, Kyotaro HORIGUCHI > > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > > Hello, > > > > > > At Mon, 21 Nov 2016 21:39:07 -0500, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoZh6M5bTmtZZm1vBpFGHmeNgESe4UrJ3-OwKnu56SKB7g@mail.gmail.com> > > >> On Mon, Nov 21, 2016 at 3:21 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > > >> > I'm still interested in hearing comments from experienced folks about > > >> > whether it's sane to do this at all, rather than have similar > > >> > duplicate signal handling for the walsender. > > >> > > >> Well, I mean, if it's reasonable to share code in a given situation, > > >> that is generally better than NOT sharing code... > > > > > > Walsender handles SIGUSR1 completely different way from normal > > > backends. The procsignal_sigusr1_handler is designed to work as > > > the peer of SendProcSignal (not ProcSendSignal:). Walsender is > > > just using a latch to be woken up. It has nothing to do with > > > SendProcSignal. > > > > Indeed, at the moment it doesn't. I'm proposing to change that, since > > walsenders doing logical decoding on a standby need to be notified of > > conflicts with recovery, many of which are the same as for normal user > > backends and bgworkers. > > > > The main exception is snapshot conflicts, where logical decoding > > walsenders don't care about conflicts with specific tables, they want > > to be terminated if the effective catalog_xmin on the upstream > > increases past their needed catalog_xmin. They don't care about > > non-catalog (or non-heap-catalog) tables. So I expect we'd just want > > to ignore PROCSIG_RECOVERY_CONFLICT_SNAPSHOT when running a walsender > > and handle conflict with catalog_xmin increases separately. > > Thank you for the explanation. It seems to be reasonable for > walsender to have the same mechanism with > procsignal_sigusr1_handler. Sharing a handler or having another > one is a matter of design. (But sharing it might not be better if > walsender handles only two reasons.) ... > > > If you need to expand the function of SIGUSR1 of walsender, more > > > thought would be needed. > > > > Yeah, I definitely don't think it's as simple as just using > > procsignal_sigusr1_handler as-is. I expect we'd likely create a new > > global IsWalSender and ignore some RecoveryConflictInterrupt cases > > when in a walsender, at least PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, and > > probably add a new case for catalog_xmin conflicts that's only acted > > on when IsWalSender. > > The global is unncecessary if walsender have a different handler > from normal backends. If there's at least one or few additional > reasons for signal, sharing SendProcSignal and having dedicate > handler might be better. If no behavior is shared among normal backend and walsender, it would be a good reason not to share the handler function. What you are willing to do seems so. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?
От
Craig Ringer
Дата:
On 22 November 2016 at 17:49, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> > Yeah, I definitely don't think it's as simple as just using >> > procsignal_sigusr1_handler as-is. I expect we'd likely create a new >> > global IsWalSender and ignore some RecoveryConflictInterrupt cases >> > when in a walsender, at least PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, and >> > probably add a new case for catalog_xmin conflicts that's only acted >> > on when IsWalSender. >> >> The global is unncecessary if walsender have a different handler >> from normal backends. If there's at least one or few additional >> reasons for signal, sharing SendProcSignal and having dedicate >> handler might be better. > > If no behavior is shared among normal backend and walsender, it > would be a good reason not to share the handler function. What > you are willing to do seems so. I've explored this some more, and it looks like using procsignal_sigusr1_handler for handling recovery conflicts in the walsender during logical decoding actually makes a lot of sense. Almost all behaviour is shared, and so far I haven't needed any special cases at all. I needed to add a new recovery signal for conflict with catalog_xmin advance on upstream, but that was it. Many of the cases make no sense for physical walsenders, so it probably makes sense to bail out early if it's a physical walsender, but for a walsender doing logical replication the only one that I don't think makes sense is conflict with snapshot, which won't get sent and is harmless if received. (The comment on it is slightly wrong anyway; it claims it's only used by normal user backends in transactions, but database conflicts are fired even when not in an xact.) -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services