Обсуждение: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?

Поиск
Список
Период
Сортировка

Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?

От
Craig Ringer
Дата:
Hi all

While adding support for logical decoding on standby I noticed that
the walsender doesn't respect
SIGUSR1 with PROCSIG_RECOVERY_CONFLICT_DATABASE set.

It blindly assumes that it means there's new WAL:

WalSndSignals(void)
{
...
        pqsignal(SIGUSR1, WalSndXLogSendHandler);       /* request WAL
sending */
...
}

since all WalSndXLogSendHandler does is set the latch.

Handling recovery conflicts in the walsender is neccessary for logical
decoding on standby, so that we can replay drop database.

All the recovery conflict machinery is currently contained in
postgres.c and not used by, or accessible to, the walsender. It
actually works to just set procsignal_sigusr1_handler as walsender's
SIGUSR1 handler, but I'm not convinced it's safe:

Most of the procsignals don't make sense for walsender and could
possibly attempts things that use state the walsender doesn't have set
up. The comments on procsignal say that callers should tolerate
getting the wrong signal due to possible races:

 * Also, because of race conditions, it's important that all the signals be
 * defined so that no harm is done if a process mistakenly receives one.

(procsignal.h)

I'm wondering about adding a new state flag IsWalSender and have
RecoveryConflictInterrupt() ignore most conflict reasons if
IsWalSender is true. Though it strikes me that during logical decoding
on standby, the walsender could quite possibly conflict with other
things too, so it'd be better to make it safe to handle all the
conflict cases within the walsender.

Anyway, this PoC passes regression tests and allows drop database on a
standby to succeed when a slot is in-use. Not for commit as-is.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?

От
Craig Ringer
Дата:
On 18 November 2016 at 10:57, Craig Ringer <craig@2ndquadrant.com> wrote:
> Hi all
>
> While adding support for logical decoding on standby I noticed that
> the walsender doesn't respect
> SIGUSR1 with PROCSIG_RECOVERY_CONFLICT_DATABASE set.

I have an update on this after further study.

Surprisingly I haven't found a way to crash the walsender with it, but
as expected it is also not really correct as-is.

It's fine for terminating walsender sessions on database drop, or for
holding a lock on an object too long. Termination due to conflict with
vacuum is more problematic because:

* it'll try to terminate a logical decoding walsender even if there's
no real conflict, since only normal relation blocks were affected, not
user catalogs or system catalogs; and

* most of the time its attempts to terminate won't do anything because
there's no xact running on the walsender at the time it checks for
termination.

So that's definitely going to need more work.

I'm still interested in hearing comments from experienced folks about
whether it's sane to do this at all, rather than have similar
duplicate signal handling for the walsender.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?

От
Robert Haas
Дата:
On Mon, Nov 21, 2016 at 3:21 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> I'm still interested in hearing comments from experienced folks about
> whether it's sane to do this at all, rather than have similar
> duplicate signal handling for the walsender.

Well, I mean, if it's reasonable to share code in a given situation,
that is generally better than NOT sharing code...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?

От
Kyotaro HORIGUCHI
Дата:
Hello,

At Mon, 21 Nov 2016 21:39:07 -0500, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoZh6M5bTmtZZm1vBpFGHmeNgESe4UrJ3-OwKnu56SKB7g@mail.gmail.com>
> On Mon, Nov 21, 2016 at 3:21 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> > I'm still interested in hearing comments from experienced folks about
> > whether it's sane to do this at all, rather than have similar
> > duplicate signal handling for the walsender.
> 
> Well, I mean, if it's reasonable to share code in a given situation,
> that is generally better than NOT sharing code...

Walsender handles SIGUSR1 completely different way from normal
backends. The procsignal_sigusr1_handler is designed to work as
the peer of SendProcSignal (not ProcSendSignal:). Walsender is
just using a latch to be woken up. It has nothing to do with
SendProcSignal.

IMHO, I don't think walsender is allowed to just share the
backends' handler for a practical reason that pss_signalFlags can
harm.

If you need to expand the function of SIGUSR1 of walsender, more
thought would be needed.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?

От
Craig Ringer
Дата:
On 22 November 2016 at 11:35, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello,
>
> At Mon, 21 Nov 2016 21:39:07 -0500, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoZh6M5bTmtZZm1vBpFGHmeNgESe4UrJ3-OwKnu56SKB7g@mail.gmail.com>
>> On Mon, Nov 21, 2016 at 3:21 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> > I'm still interested in hearing comments from experienced folks about
>> > whether it's sane to do this at all, rather than have similar
>> > duplicate signal handling for the walsender.
>>
>> Well, I mean, if it's reasonable to share code in a given situation,
>> that is generally better than NOT sharing code...
>
> Walsender handles SIGUSR1 completely different way from normal
> backends. The procsignal_sigusr1_handler is designed to work as
> the peer of SendProcSignal (not ProcSendSignal:). Walsender is
> just using a latch to be woken up. It has nothing to do with
> SendProcSignal.

Indeed, at the moment it doesn't. I'm proposing to change that, since
walsenders doing logical decoding on a standby need to be notified of
conflicts with recovery, many of which are the same as for normal user
backends and bgworkers.

The main exception is snapshot conflicts, where logical decoding
walsenders don't care about conflicts with specific tables, they want
to be terminated if the effective catalog_xmin on the upstream
increases past their needed catalog_xmin. They don't care about
non-catalog (or non-heap-catalog) tables. So I expect we'd just want
to ignore PROCSIG_RECOVERY_CONFLICT_SNAPSHOT when running a walsender
and handle conflict with catalog_xmin increases separately.

> IMHO, I don't think walsender is allowed to just share the
> backends' handler for a practical reason that pss_signalFlags can
> harm.

I'm not sure I see the problem. The ProcSignalReason argument to
RecoveryConflictInterrupt states that:
* Also, because of race conditions, it's important that all the signals be* defined so that no harm is done if a
processmistakenly receives one.
 

> If you need to expand the function of SIGUSR1 of walsender, more
> thought would be needed.

Yeah, I definitely don't think it's as simple as just using
procsignal_sigusr1_handler as-is. I expect we'd likely create a new
global IsWalSender and ignore some RecoveryConflictInterrupt cases
when in a walsender, at least PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, and
probably add a new case for catalog_xmin conflicts that's only acted
on when IsWalSender.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?

От
Kyotaro HORIGUCHI
Дата:
Hello,

At Tue, 22 Nov 2016 12:35:56 +0800, Craig Ringer <craig@2ndquadrant.com> wrote in
<CAMsr+YF9-pojdPukTO7XGi+G1w=aRkv=tV7xBG0OPQdD+xX+-Q@mail.gmail.com>
> On 22 November 2016 at 11:35, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > Hello,
> >
> > At Mon, 21 Nov 2016 21:39:07 -0500, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoZh6M5bTmtZZm1vBpFGHmeNgESe4UrJ3-OwKnu56SKB7g@mail.gmail.com>
> >> On Mon, Nov 21, 2016 at 3:21 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> >> > I'm still interested in hearing comments from experienced folks about
> >> > whether it's sane to do this at all, rather than have similar
> >> > duplicate signal handling for the walsender.
> >>
> >> Well, I mean, if it's reasonable to share code in a given situation,
> >> that is generally better than NOT sharing code...
> >
> > Walsender handles SIGUSR1 completely different way from normal
> > backends. The procsignal_sigusr1_handler is designed to work as
> > the peer of SendProcSignal (not ProcSendSignal:). Walsender is
> > just using a latch to be woken up. It has nothing to do with
> > SendProcSignal.
> 
> Indeed, at the moment it doesn't. I'm proposing to change that, since
> walsenders doing logical decoding on a standby need to be notified of
> conflicts with recovery, many of which are the same as for normal user
> backends and bgworkers.
> 
> The main exception is snapshot conflicts, where logical decoding
> walsenders don't care about conflicts with specific tables, they want
> to be terminated if the effective catalog_xmin on the upstream
> increases past their needed catalog_xmin. They don't care about
> non-catalog (or non-heap-catalog) tables. So I expect we'd just want
> to ignore PROCSIG_RECOVERY_CONFLICT_SNAPSHOT when running a walsender
> and handle conflict with catalog_xmin increases separately.

Thank you for the explanation. It seems to be reasonable for
walsender to have the same mechanism with
procsignal_sigusr1_handler. Sharing a handler or having another
one is a matter of design. (But sharing it might not be better if
walsender handles only two reasons.)

> > IMHO, I don't think walsender is allowed to just share the
> > backends' handler for a practical reason that pss_signalFlags can
> > harm.
> 
> I'm not sure I see the problem. The ProcSignalReason argument to
> RecoveryConflictInterrupt states that:
> 
>  * Also, because of race conditions, it's important that all the signals be
>  * defined so that no harm is done if a process mistakenly receives one.

It won't cause actual problem, but it is not managed on the
*current* walsender. I meant that taking any action based on
unmanaged variable seems to be a flaw of design. But anyway no
problem if it is redesigned to be used on walsender.

> > If you need to expand the function of SIGUSR1 of walsender, more
> > thought would be needed.
> 
> Yeah, I definitely don't think it's as simple as just using
> procsignal_sigusr1_handler as-is. I expect we'd likely create a new
> global IsWalSender and ignore some RecoveryConflictInterrupt cases
> when in a walsender, at least PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, and
> probably add a new case for catalog_xmin conflicts that's only acted
> on when IsWalSender.

The global is unncecessary if walsender have a different handler
from normal backends. If there's at least one or few additional
reasons for signal, sharing SendProcSignal and having dedicate
handler might be better.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?

От
Kyotaro HORIGUCHI
Дата:
No that's not right.

At Tue, 22 Nov 2016 17:45:34 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20161122.174534.266086549.horiguchi.kyotaro@lab.ntt.co.jp>
> Hello,
> 
> At Tue, 22 Nov 2016 12:35:56 +0800, Craig Ringer <craig@2ndquadrant.com> wrote in
<CAMsr+YF9-pojdPukTO7XGi+G1w=aRkv=tV7xBG0OPQdD+xX+-Q@mail.gmail.com>
> > On 22 November 2016 at 11:35, Kyotaro HORIGUCHI
> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > > Hello,
> > >
> > > At Mon, 21 Nov 2016 21:39:07 -0500, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoZh6M5bTmtZZm1vBpFGHmeNgESe4UrJ3-OwKnu56SKB7g@mail.gmail.com>
> > >> On Mon, Nov 21, 2016 at 3:21 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> > >> > I'm still interested in hearing comments from experienced folks about
> > >> > whether it's sane to do this at all, rather than have similar
> > >> > duplicate signal handling for the walsender.
> > >>
> > >> Well, I mean, if it's reasonable to share code in a given situation,
> > >> that is generally better than NOT sharing code...
> > >
> > > Walsender handles SIGUSR1 completely different way from normal
> > > backends. The procsignal_sigusr1_handler is designed to work as
> > > the peer of SendProcSignal (not ProcSendSignal:). Walsender is
> > > just using a latch to be woken up. It has nothing to do with
> > > SendProcSignal.
> > 
> > Indeed, at the moment it doesn't. I'm proposing to change that, since
> > walsenders doing logical decoding on a standby need to be notified of
> > conflicts with recovery, many of which are the same as for normal user
> > backends and bgworkers.
> > 
> > The main exception is snapshot conflicts, where logical decoding
> > walsenders don't care about conflicts with specific tables, they want
> > to be terminated if the effective catalog_xmin on the upstream
> > increases past their needed catalog_xmin. They don't care about
> > non-catalog (or non-heap-catalog) tables. So I expect we'd just want
> > to ignore PROCSIG_RECOVERY_CONFLICT_SNAPSHOT when running a walsender
> > and handle conflict with catalog_xmin increases separately.
> 
> Thank you for the explanation. It seems to be reasonable for
> walsender to have the same mechanism with
> procsignal_sigusr1_handler. Sharing a handler or having another
> one is a matter of design. (But sharing it might not be better if
> walsender handles only two reasons.)
...
> > > If you need to expand the function of SIGUSR1 of walsender, more
> > > thought would be needed.
> > 
> > Yeah, I definitely don't think it's as simple as just using
> > procsignal_sigusr1_handler as-is. I expect we'd likely create a new
> > global IsWalSender and ignore some RecoveryConflictInterrupt cases
> > when in a walsender, at least PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, and
> > probably add a new case for catalog_xmin conflicts that's only acted
> > on when IsWalSender.
> 
> The global is unncecessary if walsender have a different handler
> from normal backends. If there's at least one or few additional
> reasons for signal, sharing SendProcSignal and having dedicate
> handler might be better.

If no behavior is shared among normal backend and walsender, it
would be a good reason not to share the handler function. What
you are willing to do seems so.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?

От
Craig Ringer
Дата:
On 22 November 2016 at 17:49, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

>> > Yeah, I definitely don't think it's as simple as just using
>> > procsignal_sigusr1_handler as-is. I expect we'd likely create a new
>> > global IsWalSender and ignore some RecoveryConflictInterrupt cases
>> > when in a walsender, at least PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, and
>> > probably add a new case for catalog_xmin conflicts that's only acted
>> > on when IsWalSender.
>>
>> The global is unncecessary if walsender have a different handler
>> from normal backends. If there's at least one or few additional
>> reasons for signal, sharing SendProcSignal and having dedicate
>> handler might be better.
>
> If no behavior is shared among normal backend and walsender, it
> would be a good reason not to share the handler function. What
> you are willing to do seems so.

I've explored this some more, and it looks like using
procsignal_sigusr1_handler for handling recovery conflicts in the
walsender during logical decoding actually makes a lot of sense.
Almost all behaviour is shared, and so far I haven't needed any
special cases at all. I needed to add a new recovery signal for
conflict with catalog_xmin advance on upstream, but that was it.

Many of the cases make no sense for physical walsenders, so it
probably makes sense to bail out early if it's a physical walsender,
but for a walsender doing logical replication the only one that I
don't think makes sense is conflict with snapshot, which won't get
sent and is harmless if received.

(The comment on it is slightly wrong anyway; it claims it's only used
by normal user backends in transactions, but database conflicts are
fired even when not in an xact.)

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services