Обсуждение: Proposal for Signal Detection Refactoring

Поиск
Список
Период
Сортировка

Proposal for Signal Detection Refactoring

От
Chris Travers
Дата:
After the patch we did for the race condition here, one point which was brought up which I thought was quite relevant was the fact that checking global flags feels a little ugly.

I would like to set up a patch which refactors this as follows:

1.  We create an enum of cancellation levels:
    NO_INTERRUPT
    INTERRUPT
    QUERY_CANCEL
    PROCESS_EXIT

2.  We create a macro called PENDING_INTERRUPT_LEVEL() which returns the highest pending interrupt level.

3.  We check on whether the output of this is greater or equal to the level we need to handle and work on that basis.

This allows us to handle cases where we just want to check the interrupt level for return or cleanup purposes.  It does not apply to cases where we need to change interrupt handling temporarily as we do in a couple of cases.  I think it is cleaner than checking the QueryCancelPending and ProcDiePending global flags directly.

So here's a small patch.  I will add it for the next commit fest unless anyone has any reason I shouldn't.

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Вложения

Re: Proposal for Signal Detection Refactoring

От
Michael Paquier
Дата:
On Thu, Sep 20, 2018 at 03:08:34PM +0200, Chris Travers wrote:
> So here's a small patch.  I will add it for the next commit fest unless
> anyone has any reason I shouldn't.

-       return InterruptPending && (QueryCancelPending || ProcDiePending);
+       return PENDING_INTERRUPT_LEVEL() >= QUERY_CANCEL;

This is pretty similar to lock levels, where it is pretty hard to put a
strict monotone hierarchy when it comes to such interruptions.  The new
code does not seem like an improvment either, as for example in the code
mentioned above, you know directly what are the actions involved, which
is not the case with the new code style.
--
Michael

Вложения

Re: Proposal for Signal Detection Refactoring

От
Andres Freund
Дата:
On 2018-09-21 13:46:11 +0900, Michael Paquier wrote:
> On Thu, Sep 20, 2018 at 03:08:34PM +0200, Chris Travers wrote:
> > So here's a small patch.  I will add it for the next commit fest unless
> > anyone has any reason I shouldn't.
> 
> -       return InterruptPending && (QueryCancelPending || ProcDiePending);
> +       return PENDING_INTERRUPT_LEVEL() >= QUERY_CANCEL;
> 
> This is pretty similar to lock levels, where it is pretty hard to put a
> strict monotone hierarchy when it comes to such interruptions.

+1

Greetings,

Andres Freund


Re: Proposal for Signal Detection Refactoring

От
Chris Travers
Дата:


On Fri, Sep 21, 2018 at 6:46 AM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Sep 20, 2018 at 03:08:34PM +0200, Chris Travers wrote:
> So here's a small patch.  I will add it for the next commit fest unless
> anyone has any reason I shouldn't.

-       return InterruptPending && (QueryCancelPending || ProcDiePending);
+       return PENDING_INTERRUPT_LEVEL() >= QUERY_CANCEL;

This is pretty similar to lock levels, where it is pretty hard to put a
strict monotone hierarchy when it comes to such interruptions. 

I understand how lock levels don't fit a simple hierarchy but at least when it comes
to what is going to be aborted on a signal, I am having trouble understanding the problem here.

Does anyone have any search terms I should look into the archives regarding this issue?

I will assume this patch will be rejected then.

 
The new
code does not seem like an improvment either, as for example in the code
mentioned above, you know directly what are the actions involved, which
is not the case with the new code style.

The initial motivation for this patch was that it felt a little bit odd to be using global 
boolean flags for this sort of approach and I was concerned that if this approach
proliferated it might cause problems later.  As far as I know my previous patch was
the second use of the current pattern.

So one thing I wonder is if there are ways where abstracting this would make more sense.
 
--
Michael


--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Re: Proposal for Signal Detection Refactoring

От
Michael Paquier
Дата:
On Fri, Sep 21, 2018 at 12:35:46PM +0200, Chris Travers wrote:
> I understand how lock levels don't fit a simple hierarchy but at least
> when it comes to what is going to be aborted on a signal, I am having
> trouble understanding the problem here.

It may be possible to come with a clear hierarchy with the current
interruption types in place.  Still I am not sure that the definition
you put behind is completely correct, and I think that we need to
question as well the value of putting such restrictions for future
interruption types because they would need to fit into it.  That's quite
a heavy constraint to live with.  There is such logic with wal_level for
example, which is something I am not completely happy with either...
But this one is a story for another time, and another thread.

Regarding your patch, it seems to me that it does not improve
readability as I mentioned up-thread because you lose sight of what can
be interrupted in a given code path, which is what the current code
shows actually nicely.

There could be value in refactoring things so as all the *Pending flags
of miscadmin.h get stored into one single volatile sig_atomic_t which
uses bit-wise markers, as that's at least 4 bytes because that's stored
as an int for most platforms and can be performed as an atomic operation
safely across signals (If my memory is right;) ).  And this leaves a lot
of room for future flags.
--
Michael

Вложения

Re: Proposal for Signal Detection Refactoring

От
Chris Travers
Дата:
First, thanks for taking the time to write this.  Its very helpful.  Additional thoughts inline.

On Mon, Sep 24, 2018 at 2:12 AM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Sep 21, 2018 at 12:35:46PM +0200, Chris Travers wrote:
> I understand how lock levels don't fit a simple hierarchy but at least
> when it comes to what is going to be aborted on a signal, I am having
> trouble understanding the problem here.

It may be possible to come with a clear hierarchy with the current
interruption types in place.  Still I am not sure that the definition
you put behind is completely correct, and I think that we need to
question as well the value of putting such restrictions for future
interruption types because they would need to fit into it.

The future-safety issue is a really good one and it's one reason I kept the infinite loop patch as semantically consistent with the API as I could at the cost of some complexity.

I have another area where I think a patch would be more valuable anyway in terms of refactoring.
 
  That's quite
a heavy constraint to live with.  There is such logic with wal_level for
example, which is something I am not completely happy with either...
But this one is a story for another time, and another thread.

 From a cleanup perspective a concentric circles approach seems like it is correct to me (which would correspond to a hierarchy of interrupts) but I can see that assuming that all pending interrupts would be checked solely for cleanup reasons might be a bad assumption on my part.

Regarding your patch, it seems to me that it does not improve
readability as I mentioned up-thread because you lose sight of what can
be interrupted in a given code path, which is what the current code
shows actually nicely.

So I guess there are two fundamental questions here.

1.  Do we want to move away from checking global flags like this directly?  I think we do because it makes future changes possibly harder and more complex since there is no encapsulation of logic.  But I don't see a point in putting effort into that without consensus.

There could be value in refactoring things so as all the *Pending flags
of miscadmin.h get stored into one single volatile sig_atomic_t which
uses bit-wise markers, as that's at least 4 bytes because that's stored
as an int for most platforms and can be performed as an atomic operation
safely across signals (If my memory is right;) ).  And this leaves a lot
of room for future flags.

Yeah I will look into this.

Thanks again for taking the time to go over the concerns in detail.  It really helps.

Best Wishes,
Chris Travers
--
Michael


--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Re: Proposal for Signal Detection Refactoring

От
Chris Travers
Дата:
I did some more reading.

On Mon, Sep 24, 2018 at 10:15 AM Chris Travers <chris.travers@adjust.com> wrote:
First, thanks for taking the time to write this.  Its very helpful.  Additional thoughts inline.

On Mon, Sep 24, 2018 at 2:12 AM Michael Paquier <michael@paquier.xyz> wrote:

There could be value in refactoring things so as all the *Pending flags
of miscadmin.h get stored into one single volatile sig_atomic_t which
uses bit-wise markers, as that's at least 4 bytes because that's stored
as an int for most platforms and can be performed as an atomic operation
safely across signals (If my memory is right;) ).  And this leaves a lot
of room for future flags.

Yeah I will look into this.


Ok so having looked into this a bit more....

It looks like to be strictly conforming, you can't just use a series of flags because neither C 89 or 99 guarantee that sig_atomic_t is read/write round-trip safe in signal handlers.  In other words, if you read, are pre-empted by another signal, and then write, you may clobber what the other signal handler did to the variable.  So you need atomics, which are C11....

What I would suggest instead at least for an initial approach is:

1.  A struct of volatile bools statically stored
2.  macros for accessing/setting/clearing flags
3.  Consistent use of these macros throughout the codebase.

To make your solution work it looks like we'd need C11 atomics which would be nice and maybe at some point we decide to allow newer feature, or we could wrap this itself in checks for C11 features and provide atomic flags in the future.  It seems that the above solution would strictly comply to C89 and pose no concurrency issues.

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin



--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Re: Proposal for Signal Detection Refactoring

От
Chris Travers
Дата:

Very minor revision
On Mon, Sep 24, 2018 at 11:45 AM Chris Travers <chris.travers@adjust.com> wrote:

Ok so having looked into this a bit more....

It looks like to be strictly conforming, you can't just use a series of flags because neither C 89 or 99 guarantee that sig_atomic_t is read/write round-trip safe in signal handlers.  In other words, if you read, are pre-empted by another signal, and then write, you may clobber what the other signal handler did to the variable.  So you need atomics, which are C11....

What I would suggest instead at least for an initial approach is:

1.  A struct of volatile bools statically stored

These would be implemented as sig_atomic_t which is defined in C89 but has no atomic operators other than writing the full value.
 
2.  macros for accessing/setting/clearing flags
3.  Consistent use of these macros throughout the codebase.

To make your solution work it looks like we'd need C11 atomics which would be nice and maybe at some point we decide to allow newer feature, or we could wrap this itself in checks for C11 features and provide atomic flags in the future.  It seems that the above solution would strictly comply to C89 and pose no concurrency issues.

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin



--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin



--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Re: Proposal for Signal Detection Refactoring

От
Andres Freund
Дата:
On 2018-09-24 11:45:10 +0200, Chris Travers wrote:
> I did some more reading.
> 
> On Mon, Sep 24, 2018 at 10:15 AM Chris Travers <chris.travers@adjust.com>
> wrote:
> 
> > First, thanks for taking the time to write this.  Its very helpful.
> > Additional thoughts inline.
> >
> > On Mon, Sep 24, 2018 at 2:12 AM Michael Paquier <michael@paquier.xyz>
> > wrote:
> >
> >>
> >> There could be value in refactoring things so as all the *Pending flags
> >> of miscadmin.h get stored into one single volatile sig_atomic_t which
> >> uses bit-wise markers, as that's at least 4 bytes because that's stored
> >> as an int for most platforms and can be performed as an atomic operation
> >> safely across signals (If my memory is right;) ).  And this leaves a lot
> >> of room for future flags.
> >>
> >
> > Yeah I will look into this.
> >
> 
> 
> Ok so having looked into this a bit more....
> 
> It looks like to be strictly conforming, you can't just use a series of
> flags because neither C 89 or 99 guarantee that sig_atomic_t is read/write
> round-trip safe in signal handlers.  In other words, if you read, are
> pre-empted by another signal, and then write, you may clobber what the
> other signal handler did to the variable.  So you need atomics, which are
> C11....
> 
> What I would suggest instead at least for an initial approach is:
> 
> 1.  A struct of volatile bools statically stored
> 2.  macros for accessing/setting/clearing flags
> 3.  Consistent use of these macros throughout the codebase.
> 
> To make your solution work it looks like we'd need C11 atomics which would
> be nice and maybe at some point we decide to allow newer feature, or we
> could wrap this itself in checks for C11 features and provide atomic flags
> in the future.  It seems that the above solution would strictly comply to
> C89 and pose no concurrency issues.

It's certainly *NOT* ok to use atomics in signal handlers
indiscriminately, on some hardware / configure option combinations
they're backed by spinlocks (or even semaphores) and thus *NOT*
interrupt save.

This doesn't seem to solve an actual problem, why are we discussing
changing this? What'd be measurably improved, worth the cost of making
backpatching more painful?

Greetings,

Andres Freund


Re: Proposal for Signal Detection Refactoring

От
Michael Paquier
Дата:
On Mon, Sep 24, 2018 at 10:06:40AM -0700, Andres Freund wrote:
> This doesn't seem to solve an actual problem, why are we discussing
> changing this? What'd be measurably improved, worth the cost of making
> backpatching more painful?

My point was just to reduce the number of variables used and ease
debugger lookups with what is on the stack.

Anyway, putting the back-patching pain aside, and just for my own
knowledge...  Andres, would it be fine to just use one sig_atomic_t
field which can be set from different code paths?  Say:
typedef enum SignalPendingType {
    PENDING_INTERRUPT,
    PENDING_CANCEL_QUERY,
    PENDING_PROC_DIE,
    PENDING_RELOAD,
    PENDING_SESSION_TIMEOUT
};
extern volatile sig_atomic_t signalPendingFlags;

And then within separate signal handlers things like:
void
StatementCancelHandler(SIGNAL_ARGS)
{
    [...]
    signalPendingFlags |= PENDING_INTERRUPT | PENDING_CANCEL_QUERY;
    [...]
}

void
PostgresSigHupHandler(SIGNAL_ARGS)
{
    [...]
    signalPendingFlags |= ConfigReloadPending;
    [...]
}
--
Michael

Вложения

Re: Proposal for Signal Detection Refactoring

От
Andres Freund
Дата:
On 2018-09-25 08:57:25 +0900, Michael Paquier wrote:
> On Mon, Sep 24, 2018 at 10:06:40AM -0700, Andres Freund wrote:
> > This doesn't seem to solve an actual problem, why are we discussing
> > changing this? What'd be measurably improved, worth the cost of making
> > backpatching more painful?
> 
> My point was just to reduce the number of variables used and ease
> debugger lookups with what is on the stack.

I'm not sure a bitflag really gives you that - before gdb gives you the
plain value, afterwards you need to know the enum values and do bit math
to know.


> Anyway, putting the back-patching pain aside, and just for my own
> knowledge...  Andres, would it be fine to just use one sig_atomic_t
> field which can be set from different code paths?  Say:
> typedef enum SignalPendingType {
>     PENDING_INTERRUPT,
>     PENDING_CANCEL_QUERY,
>     PENDING_PROC_DIE,
>     PENDING_RELOAD,
>     PENDING_SESSION_TIMEOUT
> };

Well, they'd have to different bits...


> extern volatile sig_atomic_t signalPendingFlags;

Note that sig_atomic_t IIRC is only guaranteed to effectively be 8 bit
wide - so you couldn't have that many flags.


Greetings,

Andres Freund


Re: Proposal for Signal Detection Refactoring

От
Michael Paquier
Дата:
On Mon, Sep 24, 2018 at 05:23:56PM -0700, Andres Freund wrote:
> On 2018-09-25 08:57:25 +0900, Michael Paquier wrote:
>> Anyway, putting the back-patching pain aside, and just for my own
>> knowledge...  Andres, would it be fine to just use one sig_atomic_t
>> field which can be set from different code paths?  Say:
>> typedef enum SignalPendingType {
>>     PENDING_INTERRUPT,
>>     PENDING_CANCEL_QUERY,
>>     PENDING_PROC_DIE,
>>     PENDING_RELOAD,
>>     PENDING_SESSION_TIMEOUT
>> };
>
> Well, they'd have to different bits...

Sure, I forgot to write the "foo = 1 << 0" and such ;)

>> extern volatile sig_atomic_t signalPendingFlags;
>
> Note that sig_atomic_t IIRC is only guaranteed to effectively be 8 bit
> wide - so you couldn't have that many flags.

I see.  I thought that it mapped at least  int, but C99 tells that it
can be as small as char.  That's indeed quite limited...
--
Michael

Вложения

Re: Proposal for Signal Detection Refactoring

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> And then within separate signal handlers things like:
> void
> StatementCancelHandler(SIGNAL_ARGS)
> {
>     [...]
>     signalPendingFlags |= PENDING_INTERRUPT | PENDING_CANCEL_QUERY;
>     [...]
> }

AFAICS this still wouldn't work.  The machine code is still going to
look (on many machines) like "load from signalPendingFlags,
OR in some bits, store to signalPendingFlags".  So there's still a
window for another signal handler to interrupt that and store some
bits that will get lost.

You could only fix that by blocking all signal handling during the
handler, which would be expensive and rather pointless.

I do not think that it's readily possible to improve on the current
situation with one sig_atomic_t per flag.

            regards, tom lane


Re: Proposal for Signal Detection Refactoring

От
Michael Paquier
Дата:
On Mon, Sep 24, 2018 at 09:03:49PM -0400, Tom Lane wrote:
> You could only fix that by blocking all signal handling during the
> handler, which would be expensive and rather pointless.
>
> I do not think that it's readily possible to improve on the current
> situation with one sig_atomic_t per flag.

Okay, thanks for the confirmation.

At the same time, all the pending flags in miscadmin.h could be switched
to sig_atomic_t if we were to be correct, no?  The counters could be
higher than 256 so that's not really possible.
--
Michael

Вложения

Re: Proposal for Signal Detection Refactoring

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> At the same time, all the pending flags in miscadmin.h could be switched
> to sig_atomic_t if we were to be correct, no?  The counters could be
> higher than 256 so that's not really possible. 

Yeah, in principle any global variable touched by a signal handler should
be sig_atomic_t.  I don't know of any modern platform where using "bool"
is unsafe, but per the C standard it could be.  The case that would be
worrisome is if setting the variable requires a load/modify/store, which
does apply to char-sized variables on some ancient platforms.  I think
there's no need to worry for int-sized variables.

            regards, tom lane


Re: Proposal for Signal Detection Refactoring

От
Michael Paquier
Дата:
On Mon, Sep 24, 2018 at 09:38:11PM -0400, Tom Lane wrote:
> Yeah, in principle any global variable touched by a signal handler should
> be sig_atomic_t.  I don't know of any modern platform where using "bool"
> is unsafe, but per the C standard it could be.  The case that would be
> worrisome is if setting the variable requires a load/modify/store, which
> does apply to char-sized variables on some ancient platforms.  I think
> there's no need to worry for int-sized variables.

Let's change it then.  ClientConnectionLost needs also to be changed as
miscadmin.h tells that it could be used in a signal handler. What do you
think about the attached?
--
Michael

Вложения

Re: Proposal for Signal Detection Refactoring

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> Let's change it then.  ClientConnectionLost needs also to be changed as
> miscadmin.h tells that it could be used in a signal handler. What do you
> think about the attached?

Looks reasonable to me (I've not tested though).

I wonder why ClientConnectionLost isn't PGDLLIMPORT like the rest.

            regards, tom lane


Re: Proposal for Signal Detection Refactoring

От
Michael Paquier
Дата:
On Mon, Sep 24, 2018 at 10:39:40PM -0400, Tom Lane wrote:
> I wonder why ClientConnectionLost isn't PGDLLIMPORT like the rest.

Same question here.  As that's kind of a separate discussion, I left it
out.  Now I don't mind changing that at the same time as that's
harmless, and as that's only a patch for HEAD.  PGDLLIMPORT changes
don't get back-patched as well...
--
Michael

Вложения

Re: Proposal for Signal Detection Refactoring

От
Andres Freund
Дата:
On 2018-09-25 11:50:47 +0900, Michael Paquier wrote:
> PGDLLIMPORT changes don't get back-patched as well...

We've been more aggressive with that lately, and I think that's good. It
just is a unnecessarily portability barrier for extension to be
judicious in applying that.

- Andres


Re: Proposal for Signal Detection Refactoring

От
Chris Travers
Дата:


On Tue, Sep 25, 2018 at 3:03 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael@paquier.xyz> writes:
> And then within separate signal handlers things like:
> void
> StatementCancelHandler(SIGNAL_ARGS)
> {
>     [...]
>     signalPendingFlags |= PENDING_INTERRUPT | PENDING_CANCEL_QUERY;
>     [...]
> }

AFAICS this still wouldn't work.  The machine code is still going to
look (on many machines) like "load from signalPendingFlags,
OR in some bits, store to signalPendingFlags".  So there's still a
window for another signal handler to interrupt that and store some
bits that will get lost.

You could only fix that by blocking all signal handling during the
handler, which would be expensive and rather pointless.

I do not think that it's readily possible to improve on the current
situation with one sig_atomic_t per flag.


After a fair bit of reading I think there are ways of doing this in C11 but I don't think those are portable to C99.

In C99 (and, in practice C89, as the C99 committee noted there were no known C89 implementations where reading was unsafe), reading or writing a static sig_atomic_t inside a signal handler is safe, but a round-trip is *not* guaranteed not to clobber.  While I could be wrong, I think it is only in C11 that you have any round-trip operations which are guaranteed not to clobber in the language itself.

Basically we are a long way out to be able to consider these a single value as flags.

However,  what I think one could do is use a struct of volatile sig_atomic_t members and macros for checking/setting.  Simply writing a value is safe in C89 and higher.



                        regards, tom lane


--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Re: Proposal for Signal Detection Refactoring

От
Chris Travers
Дата:


On Mon, Sep 24, 2018 at 7:06 PM Andres Freund <andres@anarazel.de> wrote:
On 2018-09-24 11:45:10 +0200, Chris Travers wrote:
> I did some more reading.
>
> On Mon, Sep 24, 2018 at 10:15 AM Chris Travers <chris.travers@adjust.com>
> wrote:
>
> > First, thanks for taking the time to write this.  Its very helpful.
> > Additional thoughts inline.
> >
> > On Mon, Sep 24, 2018 at 2:12 AM Michael Paquier <michael@paquier.xyz>
> > wrote:
> >
> >>
> >> There could be value in refactoring things so as all the *Pending flags
> >> of miscadmin.h get stored into one single volatile sig_atomic_t which
> >> uses bit-wise markers, as that's at least 4 bytes because that's stored
> >> as an int for most platforms and can be performed as an atomic operation
> >> safely across signals (If my memory is right;) ).  And this leaves a lot
> >> of room for future flags.
> >>
> >
> > Yeah I will look into this.
> >
>
>
> Ok so having looked into this a bit more....
>
> It looks like to be strictly conforming, you can't just use a series of
> flags because neither C 89 or 99 guarantee that sig_atomic_t is read/write
> round-trip safe in signal handlers.  In other words, if you read, are
> pre-empted by another signal, and then write, you may clobber what the
> other signal handler did to the variable.  So you need atomics, which are
> C11....
>
> What I would suggest instead at least for an initial approach is:
>
> 1.  A struct of volatile bools statically stored
> 2.  macros for accessing/setting/clearing flags
> 3.  Consistent use of these macros throughout the codebase.
>
> To make your solution work it looks like we'd need C11 atomics which would
> be nice and maybe at some point we decide to allow newer feature, or we
> could wrap this itself in checks for C11 features and provide atomic flags
> in the future.  It seems that the above solution would strictly comply to
> C89 and pose no concurrency issues.

It's certainly *NOT* ok to use atomics in signal handlers
indiscriminately, on some hardware / configure option combinations
they're backed by spinlocks (or even semaphores) and thus *NOT*
interrupt save.

This doesn't seem to solve an actual problem, why are we discussing
changing this? What'd be measurably improved, worth the cost of making
backpatching more painful?

I am not sure if this is measurable as a problem but I am fairly sure this is tangible and is likely to become more so over time.

Right now, the current approach is to use a series of globally defined single variables for handling interrupt conditions.  Because this has grown organically over time, the sort naming of these variables doesn't have a hard degree of consistency.  Consequently, if one has code which needs to check why it was interrupted, this is both a fragile process, and one which imposes a dependency directly on internals.  We now have two different areas in the code which do handling of interrupt conditions, and two more places that non-intrusively check for whether certain kinds of interrupt conditions are pending.

My sense is that the number of places which have to be aware of these sort of things is likely to grow in the future.  Therefore I think it would be a good idea sooner rather than later to ensure that the both the structures and the mechanisms and interfaces are forward-looking, and make stability guarantees we can hold well into the future without burdening code maintenance much.  So maybe there is some short-term pain in back porting but the goal is to make sure that long-term backporting is easier, and that checking pending interrupt status is safer.

I am not sold on using bit flags here.  I don't think they are C89 or 99 safe (maybe in C11 with caveats).

My suspicion is that when I get to a second attempt at this problem, it will look much closer to what we have now, just better encapsulated.
--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Re: Proposal for Signal Detection Refactoring

От
Tom Lane
Дата:
Chris Travers <chris.travers@adjust.com> writes:
> However,  what I think one could do is use a struct of volatile
> sig_atomic_t members and macros for checking/setting.  Simply writing a
> value is safe in C89 and higher.

Yeah, we could group those flags in a struct, but what's the point?

            regards, tom lane


Re: Proposal for Signal Detection Refactoring

От
Michael Paquier
Дата:
On Mon, Sep 24, 2018 at 09:27:35PM -0700, Andres Freund wrote:
> On 2018-09-25 11:50:47 +0900, Michael Paquier wrote:
>> PGDLLIMPORT changes don't get back-patched as well...
>
> We've been more aggressive with that lately, and I think that's good. It
> just is a unnecessarily portability barrier for extension to be
> judicious in applying that.

Agreed.  Are there any objections with the proposal of changing the
interruption pending flags in miscadmin.h to use sig_atomic_t?
ClientConnectionLost would gain PGDLLIMPORT at the same time.
--
Michael

Вложения

Re: Proposal for Signal Detection Refactoring

От
Chris Travers
Дата:


On Wed, Sep 26, 2018 at 12:41 AM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Sep 24, 2018 at 09:27:35PM -0700, Andres Freund wrote:
> On 2018-09-25 11:50:47 +0900, Michael Paquier wrote:
>> PGDLLIMPORT changes don't get back-patched as well...
>
> We've been more aggressive with that lately, and I think that's good. It
> just is a unnecessarily portability barrier for extension to be
> judicious in applying that.

Agreed.  Are there any objections with the proposal of changing the
interruption pending flags in miscadmin.h to use sig_atomic_t?
ClientConnectionLost would gain PGDLLIMPORT at the same time.

I am strongly in favor of doing this.
 
--
Michael


--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Re: Proposal for Signal Detection Refactoring

От
Chris Travers
Дата:


On Tue, Sep 25, 2018 at 3:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Chris Travers <chris.travers@adjust.com> writes:
> However,  what I think one could do is use a struct of volatile
> sig_atomic_t members and macros for checking/setting.  Simply writing a
> value is safe in C89 and higher.

Yeah, we could group those flags in a struct, but what's the point?

This was one of two things I noticed in my previous patch on interrupts and loops where I wasn't sure what the best practice in our code is.

If we don't want to make this change, then would there be any objection to me writing up a README describing the flags, and best practices in terms of checking them in our code based on the current places we use them?  If the current approach will be unlikely to change in the future, then at least we can document that the way I went about things is consistent with current best practices so next time someone doesn't really wonder.
 

                        regards, tom lane


--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Re: Proposal for Signal Detection Refactoring

От
Michael Paquier
Дата:
On Wed, Sep 26, 2018 at 09:50:00AM +0200, Chris Travers wrote:
> On Wed, Sep 26, 2018 at 12:41 AM Michael Paquier <michael@paquier.xyz>
> wrote:
>> Agreed.  Are there any objections with the proposal of changing the
>> interruption pending flags in miscadmin.h to use sig_atomic_t?
>> ClientConnectionLost would gain PGDLLIMPORT at the same time.
>
> I am strongly in favor of doing this.

Okay, done.
--
Michael

Вложения

Re: Proposal for Signal Detection Refactoring

От
Chris Travers
Дата:


On Wed, Sep 26, 2018 at 9:54 AM Chris Travers <chris.travers@adjust.com> wrote:


On Tue, Sep 25, 2018 at 3:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Chris Travers <chris.travers@adjust.com> writes:
> However,  what I think one could do is use a struct of volatile
> sig_atomic_t members and macros for checking/setting.  Simply writing a
> value is safe in C89 and higher.

Yeah, we could group those flags in a struct, but what's the point?

This was one of two things I noticed in my previous patch on interrupts and loops where I wasn't sure what the best practice in our code is.

If we don't want to make this change, then would there be any objection to me writing up a README describing the flags, and best practices in terms of checking them in our code based on the current places we use them?  If the current approach will be unlikely to change in the future, then at least we can document that the way I went about things is consistent with current best practices so next time someone doesn't really wonder.

Attaching a first draft of a readme.  Feedback welcome.  I noticed further that we used to document signals and what they did with carious processes but that this was removed after 7.0, probably due to complexity reasons.
 

                        regards, tom lane


--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin



--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Вложения

Re: Proposal for Signal Detection Refactoring

От
Chris Travers
Дата:
Moving this to documentation due to a general consensus that abstracting this is not necessarily worth it.

If we don't want to refactor and abstract this, it is worth documenting the design as to how things work now so that
otherswho face bugs can consult docs instead of trying to determine acceptable or recommended practices by looking at
thecode alone. 

Re: Proposal for Signal Detection Refactoring

От
Peter Eisentraut
Дата:
On 25/09/2018 02:23, Andres Freund wrote:
>> My point was just to reduce the number of variables used and ease
>> debugger lookups with what is on the stack.
> I'm not sure a bitflag really gives you that - before gdb gives you the
> plain value, afterwards you need to know the enum values and do bit math
> to know.

You could use an actual C bit field, to make debugging easier.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Proposal for Signal Detection Refactoring

От
Peter Eisentraut
Дата:
On 01/10/2018 14:00, Chris Travers wrote:
> 
> 
> On Wed, Sep 26, 2018 at 9:54 AM Chris Travers <chris.travers@adjust.com
> <mailto:chris.travers@adjust.com>> wrote:
> 
> 
> 
>     On Tue, Sep 25, 2018 at 3:23 PM Tom Lane <tgl@sss.pgh.pa.us
>     <mailto:tgl@sss.pgh.pa.us>> wrote:
> 
>         Chris Travers <chris.travers@adjust.com
>         <mailto:chris.travers@adjust.com>> writes:
>         > However,  what I think one could do is use a struct of volatile
>         > sig_atomic_t members and macros for checking/setting.  Simply
>         writing a
>         > value is safe in C89 and higher.
> 
>         Yeah, we could group those flags in a struct, but what's the point?
> 
> 
>     This was one of two things I noticed in my previous patch on
>     interrupts and loops where I wasn't sure what the best practice in
>     our code is.
> 
>     If we don't want to make this change, then would there be any
>     objection to me writing up a README describing the flags, and best
>     practices in terms of checking them in our code based on the current
>     places we use them?  If the current approach will be unlikely to
>     change in the future, then at least we can document that the way I
>     went about things is consistent with current best practices so next
>     time someone doesn't really wonder.
> 
> 
> Attaching a first draft of a readme.  Feedback welcome.  I noticed
> further that we used to document signals and what they did with carious
> processes but that this was removed after 7.0, probably due to
> complexity reasons.

diff --git a/src/backend/utils/init/README b/src/backend/utils/init/README
new file mode 100644
index 0000000000..0472687f53
--- /dev/null
+++ b/src/backend/utils/init/README
@@ -0,0 +1,96 @@
+src/backend/utils/misc/README

Not only are the directory names mismatched here, but I wonder whether
this file would be long in either of these directories.

+Implementation Notes on Globals and Signal/Event Handling
+=========================================================
+
+The approch to signal handling in PostgreSQL is designed to strictly
conform
+with the C89 standard and designed to run with as few platform
assumptions as
+possible.

We use C99 now, so why would be design this to be conforming to C89?


More generally, I'd like this material to be code comments.  It's the
kind of stuff that gets outdated before long if it's kept separate.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Proposal for Signal Detection Refactoring

От
Andres Freund
Дата:

On October 9, 2018 6:58:18 AM PDT, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
>On 25/09/2018 02:23, Andres Freund wrote:
>>> My point was just to reduce the number of variables used and ease
>>> debugger lookups with what is on the stack.
>> I'm not sure a bitflag really gives you that - before gdb gives you
>the
>> plain value, afterwards you need to know the enum values and do bit
>math
>> to know.
>
>You could use an actual C bit field, to make debugging easier.

See Tom's reply to this suggestion...

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: Proposal for Signal Detection Refactoring

От
Chris Travers
Дата:


On Tue, Oct 9, 2018 at 4:04 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 01/10/2018 14:00, Chris Travers wrote:
>
>
> On Wed, Sep 26, 2018 at 9:54 AM Chris Travers <chris.travers@adjust.com
> <mailto:chris.travers@adjust.com>> wrote:
>
>
>
>     On Tue, Sep 25, 2018 at 3:23 PM Tom Lane <tgl@sss.pgh.pa.us
>     <mailto:tgl@sss.pgh.pa.us>> wrote:
>
>         Chris Travers <chris.travers@adjust.com
>         <mailto:chris.travers@adjust.com>> writes:
>         > However,  what I think one could do is use a struct of volatile
>         > sig_atomic_t members and macros for checking/setting.  Simply
>         writing a
>         > value is safe in C89 and higher.
>
>         Yeah, we could group those flags in a struct, but what's the point? 
>
>
>     This was one of two things I noticed in my previous patch on
>     interrupts and loops where I wasn't sure what the best practice in
>     our code is.
>
>     If we don't want to make this change, then would there be any
>     objection to me writing up a README describing the flags, and best
>     practices in terms of checking them in our code based on the current
>     places we use them?  If the current approach will be unlikely to
>     change in the future, then at least we can document that the way I
>     went about things is consistent with current best practices so next
>     time someone doesn't really wonder.
>
>
> Attaching a first draft of a readme.  Feedback welcome.  I noticed
> further that we used to document signals and what they did with carious
> processes but that this was removed after 7.0, probably due to
> complexity reasons.

diff --git a/src/backend/utils/init/README b/src/backend/utils/init/README
new file mode 100644
index 0000000000..0472687f53
--- /dev/null
+++ b/src/backend/utils/init/README
@@ -0,0 +1,96 @@
+src/backend/utils/misc/README

Not only are the directory names mismatched here, but I wonder whether
this file would be long in either of these directories.

I am open to suggestions here.  The file was put where the signal stuff was.  Maybe moving it up and calling it signals.README? 

+Implementation Notes on Globals and Signal/Event Handling
+=========================================================
+
+The approch to signal handling in PostgreSQL is designed to strictly
conform
+with the C89 standard and designed to run with as few platform
assumptions as
+possible.

We use C99 now, so why would be design this to be conforming to C89?

Can or worms alert:  C89 and C99 are functionally the same here.  C99 changed the spec in part because every known C89 implementation was compliant in this area with the C99 spec.

I am fine with bumping that up to C99. 


More generally, I'd like this material to be code comments.  It's the
kind of stuff that gets outdated before long if it's kept separate.

The problem is that code comments are not going to be good places to document "how do I check for pending actions?"  That could be moved to the main SGML I guess..... 

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Re: Proposal for Signal Detection Refactoring

От
Dmitry Dolgov
Дата:
> On Wed, Oct 10, 2018 at 7:10 PM Chris Travers <chris.travers@adjust.com> wrote:
>
>> More generally, I'd like this material to be code comments.  It's the
>> kind of stuff that gets outdated before long if it's kept separate.
>
> The problem is that code comments are not going to be good places to document "how do I check for pending actions?"
Thatcould be moved to the main SGML I guess.....
 

I aggree with Peter here, for me it also feels more natural to have this
information as code commentaries - at least if I would search for it that would
be my first thought. As for "how do I..." part, I think there are alreasy
similar commentaries in the code, which makes sense - this kind of questions
usually appear when you're reading/writing some code.

It doesn't look like there is much left to do in this discussion, but for now
I'll move it to the next CF.


Re: Proposal for Signal Detection Refactoring

От
Chris Travers
Дата:
On Thu, Nov 29, 2018 at 8:46 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> On Wed, Oct 10, 2018 at 7:10 PM Chris Travers <chris.travers@adjust.com> wrote:
>
>> More generally, I'd like this material to be code comments.  It's the
>> kind of stuff that gets outdated before long if it's kept separate.
>
> The problem is that code comments are not going to be good places to document "how do I check for pending actions?"  That could be moved to the main SGML I guess.....

I aggree with Peter here, for me it also feels more natural to have this
information as code commentaries - at least if I would search for it that would
be my first thought. As for "how do I..." part, I think there are alreasy
similar commentaries in the code, which makes sense - this kind of questions
usually appear when you're reading/writing some code.

It doesn't look like there is much left to do in this discussion, but for now
I'll move it to the next CF.

Ok will move this into code comments if there are no objections. 


--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Re: Proposal for Signal Detection Refactoring

От
Chris Travers
Дата:
Latest patch, simpler but answers the key questions as code comments

On Mon, Dec 3, 2018 at 6:56 AM Chris Travers <chris.travers@adjust.com> wrote:
On Thu, Nov 29, 2018 at 8:46 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> On Wed, Oct 10, 2018 at 7:10 PM Chris Travers <chris.travers@adjust.com> wrote:
>
>> More generally, I'd like this material to be code comments.  It's the
>> kind of stuff that gets outdated before long if it's kept separate.
>
> The problem is that code comments are not going to be good places to document "how do I check for pending actions?"  That could be moved to the main SGML I guess.....

I aggree with Peter here, for me it also feels more natural to have this
information as code commentaries - at least if I would search for it that would
be my first thought. As for "how do I..." part, I think there are alreasy
similar commentaries in the code, which makes sense - this kind of questions
usually appear when you're reading/writing some code.

It doesn't look like there is much left to do in this discussion, but for now
I'll move it to the next CF.

Ok will move this into code comments if there are no objections. 


--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin



--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Вложения

Re: Proposal for Signal Detection Refactoring

От
Andres Freund
Дата:
Hi,

On 2019-01-17 10:50:56 +0100, Chris Travers wrote:
> diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
> index c6939779b9..5ed715589e 100644
> --- a/src/backend/utils/init/globals.c
> +++ b/src/backend/utils/init/globals.c
> @@ -27,12 +27,35 @@
>  
>  ProtocolVersion FrontendProtocol;
>  
> +/*
> + * Signal Handling variables here are set in signal handlers and can be
> + * checked by code to determine the implications of calling
> + * CHECK_FOR_INTERRUPTS(). While all are supported, in practice
> + * InterruptPending, QueryCancelPending, and ProcDiePending appear to be
> + * sufficient for almost all use cases.
> + */

Especially the latter part of comment seems like a bad idea.


> +/* Whether CHECK_FOR_INTERRUPTS will do anything */
>  volatile sig_atomic_t InterruptPending = false;

That's not actually a correct description. CFI is dependent on other
things than just InterruptPending, namely HOLD_INTERRUPTS() (even though
it's inside ProcessInterrupts()). It also always does more on windows.
I think the variable name is at least as good a description as the
comment you added.


> +/* Whether we are planning on cancelling the current query */
>  volatile sig_atomic_t QueryCancelPending = false;

> +/* Whether we are planning to exit the current process when safe to do so.*/
>  volatile sig_atomic_t ProcDiePending = false;
> +
> +/* Whether we have detected a lost client connection */
>  volatile sig_atomic_t ClientConnectionLost = false;
> +
> +/*
> + * Whether the process is dying because idle transaction timeout has been
> + * reached.
> + */
>  volatile sig_atomic_t IdleInTransactionSessionTimeoutPending = false;
> +
> +/* Whether we will reload the configuration at next opportunity */
>  volatile sig_atomic_t ConfigReloadPending = false;
> +

These comments all seem to add no information above the variable name.


I'm not quite understanding what this patch is intended to solve, sorry.

Greetings,

Andres Freund


Re: Proposal for Signal Detection Refactoring

От
Chris Travers
Дата:


On Thu, Jan 17, 2019 at 6:38 PM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2019-01-17 10:50:56 +0100, Chris Travers wrote:
> diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
> index c6939779b9..5ed715589e 100644
> --- a/src/backend/utils/init/globals.c
> +++ b/src/backend/utils/init/globals.c
> @@ -27,12 +27,35 @@

>  ProtocolVersion FrontendProtocol;

> +/*
> + * Signal Handling variables here are set in signal handlers and can be
> + * checked by code to determine the implications of calling
> + * CHECK_FOR_INTERRUPTS(). While all are supported, in practice
> + * InterruptPending, QueryCancelPending, and ProcDiePending appear to be
> + * sufficient for almost all use cases.
> + */

Especially the latter part of comment seems like a bad idea.


> +/* Whether CHECK_FOR_INTERRUPTS will do anything */
>  volatile sig_atomic_t InterruptPending = false;

That's not actually a correct description. CFI is dependent on other
things than just InterruptPending, namely HOLD_INTERRUPTS() (even though
it's inside ProcessInterrupts()). It also always does more on windows.
I think the variable name is at least as good a description as the
comment you added.


> +/* Whether we are planning on cancelling the current query */
>  volatile sig_atomic_t QueryCancelPending = false;

> +/* Whether we are planning to exit the current process when safe to do so.*/
>  volatile sig_atomic_t ProcDiePending = false;
> +
> +/* Whether we have detected a lost client connection */
>  volatile sig_atomic_t ClientConnectionLost = false;
> +
> +/*
> + * Whether the process is dying because idle transaction timeout has been
> + * reached.
> + */
>  volatile sig_atomic_t IdleInTransactionSessionTimeoutPending = false;
> +
> +/* Whether we will reload the configuration at next opportunity */
>  volatile sig_atomic_t ConfigReloadPending = false;
> +

These comments all seem to add no information above the variable name.


I'm not quite understanding what this patch is intended to solve, sorry.

I would like to see the fact that checking these global fields is the correct way of understanding what CHECK_FOR_INTERRUPTS will do before you do it as per the two or three places in the code where this is already done.

I don't like reasoning about acceptable coding practices based on "someone has already committed something like this before."  I found that difficult when I was trying to fix the race condition and thought this would help others in similar cases.

Keep in mind that C extensions might use internal functions etc. perhaps in ways which are different from what the main source code does.  Therefore what is safe to do, and what we are not willing to guarantee as safe should probably be documented. 

Greetings,

Andres Freund


--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Re: Proposal for Signal Detection Refactoring

От
Andres Freund
Дата:
Hi,

On 2018-10-09 16:04:35 +0200, Peter Eisentraut wrote:
> More generally, I'd like this material to be code comments.  It's the
> kind of stuff that gets outdated before long if it's kept separate.

I'm not sure I buy this here - we don't have (but perhaps should?) a
convenient location for an overview comment around this. There's no
"signal_handling.c" where it'd clearly belong - given the lack of a
clear point to look to, I don't think a README.SIGNAL_HANDLING would get
out-of-date more quickly than code comments in mildly related place (say
postgres.c or miscinit.c) would get out of date at a different pace.

Greetings,

Andres Freund


Re: Proposal for Signal Detection Refactoring

От
Chris Travers
Дата:


On Thu, Jan 17, 2019 at 7:08 PM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2018-10-09 16:04:35 +0200, Peter Eisentraut wrote:
> More generally, I'd like this material to be code comments.  It's the
> kind of stuff that gets outdated before long if it's kept separate.

I'm not sure I buy this here - we don't have (but perhaps should?) a
convenient location for an overview comment around this. There's no
"signal_handling.c" where it'd clearly belong - given the lack of a
clear point to look to, I don't think a README.SIGNAL_HANDLING would get
out-of-date more quickly than code comments in mildly related place (say
postgres.c or miscinit.c) would get out of date at a different pace.

The other point is that "This is the way to do it" is a little easier when in a README rather than in code comments sine those might be scattered in a bunch of different places. 

Greetings,

Andres Freund


--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Re: Proposal for Signal Detection Refactoring

От
Chris Travers
Дата:
attached is a new signal handing patch.  Path is corrected and moved.  The documentation is sightly streamlined in some places and expanded in others.

Feedback requested.  
--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Вложения

Re: Proposal for Signal Detection Refactoring

От
Michael Paquier
Дата:
On Wed, Jan 23, 2019 at 11:55:09AM +0100, Chris Travers wrote:
> attached is a new signal handing patch.  Path is corrected and moved.  The
> documentation is sightly streamlined in some places and expanded in others.

This has not been reviewed, so moved to next CF.
--
Michael

Вложения

Re: Proposal for Signal Detection Refactoring

От
Andres Freund
Дата:
Hi,

On 2019-01-23 11:55:09 +0100, Chris Travers wrote:
> +Implementation Notes on Globals and Signal/Event Handling
> +=========================================================
> +
> +The approch to signal handling in PostgreSQL is designed to strictly conform
> +with the C89 standard and designed to run with as few platform assumptions as
> +possible.

I'm not clear as to what that means. For one we don't target C99
anymore, for another a lot of the signal handling stuff we do isn't
defined by C99, but versions of posix.


> +The primary constraint in signal handling is that things are interruptable.
> +This means that the signal handler may be interrupted at any point and that
> +execution may return to it at a later point in time.

That seems wrong. The primary problem is that *non* signal handler code
can be interrupted at ay time. Sure signal handlers interrupting each
other is a thing, but comparatively that doesn't add a ton of
complexity.


> +How PostgreSQL Handles Signals
> +------------------------------
> +
> +Most signals (except SIGSEGV, and SIGKILL) are blocked by PostgreSQL
> +during process startup.  Certain signals are given specific meaning and
> +trapped by signal handlers.  The primary signal handlers of the backends, 
> +located in src/backend/tcop/postgres.c, typically just write to variables of
> +sig_atomic_t (as documented below) and return control back to the main code.

Note that the other absolutely crucial thing is to save/restore errno.

I don't really think that signals handlers "return control back to the
main code".


> +An exception is made for SIGQUIT which is used by the postmaster to terminate
> +backend sessions quickly when another backend dies so that the postmaster
> +may re-initialize shared memory and otherwise return to a known-good state.
> +
> +The signals are then checked later when the CHECK_FOR_INTERRUPTS() macro is

s/signals/variables/


> +called.  This macro conditionally calls CheckPendingInterrupts in 
> +src/backend/tcop/postgres.c if InterruptPending is set.  This allows for
> +query cancellation and process termination to be done, under ordiary cases,
> +in a timely and orderly way, without posing problems for shared resources such
> +as shard memory and semaphores.

s/shard/shared/


> +CHECK_FOR_INTERRUPTS() may be called only when there is no more cleanup to do
> +because the query or process may be aborted.  However query cancellation
> +requests and SIGTERM signals will not be processed until the next time this is
> +called.

I don't think that's correct. Most resources can be cleaned up at
transaction abort.


> +Checking and Handling Interrupts
> +--------------------------------
> +
> +CHECK_FOR_SIGNALS() may cause a non-local exit because the function it wraps
> +utilizes PostgreSQL's built-in exception framework (ereport) to abort queries
> +and can call exit() to exit a orocess.

You mean CHECK_FOR_INTERRUPTS?


> +For this reason, it is imperative that CHECK_FOR_INTERRUPTS() is not called in
> +places that require additional cleanup, such as dynamic shared memory, temporary
> +files, or any other shared resource.  Instead CHECK_FOR_INTERRUPTS() should be
> +called at the next point when it is safe to do so.

That's largely wrong, there's cleanup mechanisms fo rmost of the above.


Greetings,

Andres Freund


Re: Proposal for Signal Detection Refactoring

От
Chris Travers
Дата:

Here's a new patch.  No rush on it.  I am moving it to next commitfest anyway because as code documentation I think this is a low priority late in the release cycle.

The changes  mostly address Andres's feedback above.

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Вложения

Re: Proposal for Signal Detection Refactoring

От
Tom Lane
Дата:
Chris Travers <chris.travers@adjust.com> writes:
> Here's a new patch.  No rush on it.  I am moving it to next commitfest
> anyway because as code documentation I think this is a low priority late in
> the release cycle.
> The changes  mostly address Andres's feedback above.

I took a look through this version.  (BTW, please add a version number
to the patchfile name in future, so people can keep track of which
one we're talking about.)

+src/backend/utils/misc/README.SIGNAL_HANDLING

Hmm ... I'm not very sure of a good place to put this, but utils/misc/
seems kinda random, because there's no code at all in there that's
particularly related to this topic.  One alternate suggestion is
utils/init/, which is at least related by virtue of the signal flags
being defined in globals.c.  Another idea is src/backend/tcop/, since
postgres.c is where the rubber meets the road for these issues,
at least in regular backends.

Also, "README.SIGNAL_HANDLING" seems both overly long and not very
consistent with the naming of other README files.  The ones that aren't
just "README" are

README.git
doc/src/sgml/README.links
src/backend/access/heap/README.HOT
src/backend/access/heap/README.tuplock
src/backend/access/transam/README.parallel
src/backend/libpq/README.SSL
src/backend/statistics/README.dependencies
src/backend/statistics/README.mcv
src/backend/storage/lmgr/README-SSI
src/backend/storage/lmgr/README.barrier
src/interfaces/ecpg/README.dynSQL
src/interfaces/ecpg/preproc/README.parser

so there's a pretty clear preference for "README.lowercasesomething".
Hence I suggest "README.signals".

+Implementation Notes on Globals and Signal/Event Handling
+=========================================================

I'd drop "Globals and", it's not very relevant here; it's surely not
the lead keyword.  Maybe "Signal Handling in Postgres" would be even
more apropos.

+The approch to signal handling in PostgreSQL is designed to strictly conform
+with the C89 standard and designed to run with as few platform assumptions as
+possible.

s/approch/approach/, s/conform with/conform to/, s/run with/make/ (or just
drop "and designed..." entirely; C spec compliance is as much as we need
to say).

+Most signals (except SIGSEGV, and SIGKILL) are blocked by PostgreSQL
+during process startup.

The parenthetical remark is wrong, or at least incomplete; there's other
stuff we don't block such as SIGBUS.  I think I'd skip any attempt to
list them here and just say "Most blockable signals are blocked ...".
(Also, reading down, I note you return to this topic where you mention
SIGILL.  Perhaps better to enlarge on never-blocked signals at that
point.)

+An exception is made for SIGQUIT which is used by the postmaster to terminate
+backend sessions quickly when another backend dies so that the postmaster
+may re-initialize shared memory and otherwise return to a known-good state.

It'd likely be worth the verbiage to enlarge on this.  There is a clear
risk of the SIGQUIT handler malfunctioning because it interrupts something
that's not safe to interrupt.  We live with this on the theory that we're
just trying to kill the process anyway, so corruption of its internal
state is tolerable; plus the postmaster has SIGKILL-after-a-timeout logic
in case a child's SIGQUIT handler gets completely stuck.

+CHECK_FOR_INTERRUPTS() may be called only when there is no more cleanup to do
+because the query or process may be aborted.

This seems pretty wrong or at least confusing.  Maybe better
"CHECK_FOR_INTERRUPTS must be called only in places where it is safe to
invoke transaction abort and/or process termination.  Because Postgres
generally expects backend subsystems to maintain enough state to clean
up after themselves at transaction abort, this is a weak restriction.
However, functions that execute only straight-line code are allowed to
put persistent variables into transiently inconsistent states; that's
safe exactly as long as no CHECK_FOR_INTERRUPTS is executed while the
process state is unsafe for transaction abort".

+SIGHUP  - Reload config file (Postmaster Only)

Nope, that's not postmaster-only.  Backends and most other child
processes also re-read the config file on SIGHUP.

+SIGINT  - (Postmaster) Fast shutdown, (Backend) Cancel Query
+SIGQUIT - (Postmaster) Immediate Shutdown, (Backend) Exit Immediately
+SIGTERM - Terminate backend gracefully

IIRC, SIGTERM is also Smart Shutdown in the postmaster; why isn't
this documented similarly to SIGINT/SIGQUIT?

+SIGUSR1 - Pending IPC or LWLock event

Probably better to explain this as a general purpose signal multiplexor,
because that's what it is.  LWLock etc are specific use-cases.

+SIGINFO - Parent died
+SIGPWR  - Parent died (alternative)

I was totally confused by this to start with, until grepping reminded me
that POSTMASTER_DEATH_SIGNAL is implememented as one or the other of
them.  It might be better to use that name in this documentation, and
then explain that under the hood it is SIGINFO, SIGPWR, or possibly
some other otherwise-unused signal.  "Parent died" seems inappropriately
general, too, because it's specifically the postmaster we care about.

+A few other signals, such as SIGUSR2 may have different meanings in different
+worker processes.  In the case of SIGUSR2, we use it to promote a postmaster,
+but also to exit a checkpointer or to stop replication.  In general CPU-based
+interrupts (such as SIGILL) are not caught or handled by PostgreSQL.

Not sure if we want such an incomplete listing of SIGUSR2 uses as that.
Also, the point about SIGILL surely should be a separate para?

+CHECK_FOR_INTERRUPTS() may cause a non-local exit because the function it wraps
+utilizes PostgreSQL's built-in exception framework (ereport) to abort queries
+and can call exit() to exit a orocess. CHECK_FOR_INTERRUPTS() MUST NOT be called
+in places where the current function assumes that cleanup of shared resources
+may be required, or where other problems with non-local exits are foreseen in the
+case of ordinary usage of the function.  In cases where expectations of normal
+safety is suspended (critical sections and the like) there is a mechanism for
+holding off interrupts, as documented where the HOLD_INTERRUPTS() macro is defined.

s/orocess/process/, also this seems mighty repetitive with what was
written earlier.  Maybe best for the earlier text to be very minimal
and just say use of CHECK_FOR_INTERRUPTS is explained below.

+Because InterruptPending will always be set when QueryCancelPending and
+ProcDiePending are set, checking it first is a useful operation where the number
+of possible calls are very high.  In cases where the number of calls are lower,
+this micro-optimization may be omitted.

This para seems pretty pointless.  Are there any such places?

            regards, tom lane



Re: Proposal for Signal Detection Refactoring

От
Thomas Munro
Дата:
On Tue, Jul 30, 2019 at 6:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Chris Travers <chris.travers@adjust.com> writes:
> > Here's a new patch.  No rush on it.  I am moving it to next commitfest
> > anyway because as code documentation I think this is a low priority late in
> > the release cycle.
> > The changes  mostly address Andres's feedback above.
>
> I took a look through this version.  (BTW, please add a version number
> to the patchfile name in future, so people can keep track of which
> one we're talking about.)
>
> [review]

I've moved this to the September 'fest, where it's "Waiting on Author".

-- 
Thomas Munro
https://enterprisedb.com



Re: Proposal for Signal Detection Refactoring

От
Craig Ringer
Дата:
On Wed, 6 Mar 2019 at 17:38, Chris Travers <chris.travers@adjust.com> wrote:

Here's a new patch.  No rush on it.  I am moving it to next commitfest anyway because as code documentation I think this is a low priority late in the release cycle.

While you're looking at signal detection changes I suggest making sure you get them right for the contribs that use explicit signal handling, like src/test/modules/worker_spi/ .

I'm actually pretty sure worker_spi is incorrect as it stands in the current codebase. It defines its own worker_spi_sighup and worker_spi_sigterm handlers. worker_spi_sigterm() sets a static bool got_sighup that's scoped to worker_spi.c . Importantly it does NOT set ipc.c's InterruptPending or ProcDiePending so the CHECK_FOR_INTERRUPTS() macro will not notice when a backend running worker_spi gets a SIGTERM. So long as worker_spi's own main loop is serviced regularly that's fine, but it means worker_spi won't react to signals at all if it's busy in a query or somewhere else in postgres code.

It's also worth paying attention to the walsender, which has its own signal handling, and pretty much anywhere else that has a pqsignal(...) call that isn't SIG_IGN, die, quickdie, startup_die or procsignal_sigusr1_handler:

    git grep -P 'pqsignal\(SIG(INT|HUP|TERM|QUIT|USR1|USR2), (?!die|startup_die|quickdie|procsignal_sigusr1_handler|SIG_IGN)' src/backend/


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

Re: Proposal for Signal Detection Refactoring

От
Alvaro Herrera
Дата:
On 2019-Mar-06, Chris Travers wrote:

> Here's a new patch.  No rush on it.  I am moving it to next commitfest
> anyway because as code documentation I think this is a low priority late in
> the release cycle.

Chris, are you submitting an updated patch?  There's some unaddressed
feedback from Tom, which looks like it wouldn't take much time to
address.  There are also some comments from Craig, but they're not
directly related as he's suggesting to fix code that does not follow
what you're documenting; so while it'd be a good idea to fix it, it
doesn't get in the way of your patch.  I won't stop you if you want to
post patches for that, though.

Either way, please keep things moving.

Thanks,

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services