Обсуждение: Spurious standby query cancellations

Поиск
Список
Период
Сортировка

Spurious standby query cancellations

От
Jeff Janes
Дата:
In ResolveRecoveryConflictWithLock, there is the comment:

    /*
     * If blowing away everybody with conflicting locks doesn't work, after
     * the first two attempts then we just start blowing everybody away until
     * it does work.


But what it does is something different than that.  

At least under some conditions, it will wait patiently for all initially conflicting locks to go away.  If that doesn't work twice (because new lockers joined while we were waiting for the old ones to go away), then it will wait patiently for all transactions throughout the system to go away even if they don't conflict, perhaps not even realizing that the lock has become free in the mean time.

Then when its patience runs out, it kills everything on the system.  But it never tried to blow away just the conflicting locks, instead it just tried to wait them out.

The fact that trying to wait them out didn't work (twice) doesn't mean that killing them wouldn't have worked.

I think that it was intended that num_attempts would get incremented only once WaitExceedsMaxStandbyDelay becomes true, but that is not what happens with the current code.

Currently WaitExceedsMaxStandbyDelay only has one caller.  I think it we should take the sleep code out of that function and move it into the existing call site, and then have ResolveRecoveryConflictWithLock check with WaitExceedsMaxStandbyDelay before incrementing num_attempts.

Cheers,

Jeff
Вложения

Re: Spurious standby query cancellations

От
Simon Riggs
Дата:
On 27 August 2015 at 22:55, Jeff Janes <jeff.janes@gmail.com> wrote:
In ResolveRecoveryConflictWithLock, there is the comment:

    /*
     * If blowing away everybody with conflicting locks doesn't work, after
     * the first two attempts then we just start blowing everybody away until
     * it does work.


But what it does is something different than that.  

At least under some conditions, it will wait patiently for all initially conflicting locks to go away.  If that doesn't work twice (because new lockers joined while we were waiting for the old ones to go away), then it will wait patiently for all transactions throughout the system to go away even if they don't conflict, perhaps not even realizing that the lock has become free in the mean time.

Then when its patience runs out, it kills everything on the system.  But it never tried to blow away just the conflicting locks, instead it just tried to wait them out.

The fact that trying to wait them out didn't work (twice) doesn't mean that killing them wouldn't have worked.

I think that it was intended that num_attempts would get incremented only once WaitExceedsMaxStandbyDelay becomes true, but that is not what happens with the current code.

Currently WaitExceedsMaxStandbyDelay only has one caller.  I think it we should take the sleep code out of that function and move it into the existing call site, and then have ResolveRecoveryConflictWithLock check with WaitExceedsMaxStandbyDelay before incrementing num_attempts.

I agree with the problem in the current code, thank you for spotting it. 

I also agree that the proposed solution would work-ish, if we are suggesting backpatching this.

It's now possible to fix this by putting a lock wait on the actual lock request, which wasn't available when I first wrote that, hence the crappy wait loop. Using the timeout handler would now be the preferred way to solve this. We can backpatch that to 9.3 if needed, where they were introduced.

There's an example of how to use lock waits further down on ResolveRecoveryConflictWithBufferPin(). Could you have a look at doing it that way?


We probably also need to resurrect my earlier patch to avoid logging AccessExclusiveLocks on temp tables.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Spurious standby query cancellations

От
Jeff Janes
Дата:
On Fri, Sep 4, 2015 at 3:25 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 27 August 2015 at 22:55, Jeff Janes <jeff.janes@gmail.com> wrote:
In ResolveRecoveryConflictWithLock, there is the comment:

    /*
     * If blowing away everybody with conflicting locks doesn't work, after
     * the first two attempts then we just start blowing everybody away until
     * it does work.


But what it does is something different than that.  

At least under some conditions, it will wait patiently for all initially conflicting locks to go away.  If that doesn't work twice (because new lockers joined while we were waiting for the old ones to go away), then it will wait patiently for all transactions throughout the system to go away even if they don't conflict, perhaps not even realizing that the lock has become free in the mean time.

Then when its patience runs out, it kills everything on the system.  But it never tried to blow away just the conflicting locks, instead it just tried to wait them out.

The fact that trying to wait them out didn't work (twice) doesn't mean that killing them wouldn't have worked.

I think that it was intended that num_attempts would get incremented only once WaitExceedsMaxStandbyDelay becomes true, but that is not what happens with the current code.

Currently WaitExceedsMaxStandbyDelay only has one caller.  I think it we should take the sleep code out of that function and move it into the existing call site, and then have ResolveRecoveryConflictWithLock check with WaitExceedsMaxStandbyDelay before incrementing num_attempts.

I agree with the problem in the current code, thank you for spotting it. 

I also agree that the proposed solution would work-ish, if we are suggesting backpatching this.

I wasn't specifically thinking of backpatching it, but if it doesn't operate the way you intended it to, it might make sense to do that.

I stumbled on this while experimentally looking into a different issue (the ProcArrayLock contention issue in HeapTupleSatisfiesMVCC that was recently ameliorated in 8a7d0701814, but the variant of that issue reported in "[PERFORM] Strange query stalls on replica in 9.3.9").  I haven't heard of any complaints from the field about too-frequent query cancellations, but I also don't have my "ear to the ground" in that area.
 

It's now possible to fix this by putting a lock wait on the actual lock request, which wasn't available when I first wrote that, hence the crappy wait loop. Using the timeout handler would now be the preferred way to solve this. We can backpatch that to 9.3 if needed, where they were introduced.

There's an example of how to use lock waits further down on ResolveRecoveryConflictWithBufferPin(). Could you have a look at doing it that way?

It looks like this will take some major surgery.  The heavy weight lock manager doesn't play well with others when it comes to timeouts other than its own.  LockBufferForCleanup is a simple retry loop, but the lock manager is much more complicated than that.

Here are some approaches I can think of:

Approach one, replace LockAcquireExtended's dontWait boolean with a "waitUntil" timeout value.  Queue the lock like ordinary (non-recovery-backend)  lock requests are queued so that new requestors are not granted new locks which conflict with what we want.  If the timeout expires without the lock being acquired, dequeue ourselves and clean up, and then return LOCKACQUIRE_NOT_AVAIL.  I think something would have to be done about deadlocks, as well, as I don't think the regular deadlock detectors works in the standby startup process.

Approach two, same as one but when the timeout expires we invoke a callback to terminate the blocking processes without dequeueing ourselves.  That way no one can sneak in and grab the lock during the time we were doing the terminations.  This way we are guaranteed to succeed after one round of terminations, and there is no need to terminate innocent bystanders.  I think this is my preferred solution to have.  (But I haven't had much luck implementing it beyond the hand-wavy stages.)

Approach three, ask for the lock with dontWait as we do now, but if we don't get the lock then somehow arrange for us to be signalled when the lock becomes free, without being queued for it.  Then we would have to retry, with no guarantee the retry would be successful.  Functionally this would be no different than the simple patch I gave, except the polling loop is much more efficient.  You still have to terminate processes if a stream of new requestors take the lock in a constantly overlapping basis.


We probably also need to resurrect my earlier patch to avoid logging AccessExclusiveLocks on temp tables.

I couldn't find that, can you provide a link?

Another source of frequent AccessExclusiveLocks is vacuum truncation attempts.  If a table has some occupied pages right at the end which are marked as all-visible, then forward scan doesn't visit them.  Since it didn't visit them, it assumes they might be empty and truncatable and so takes the AccessExclusiveLock, only to immediately see that the last page is not empty.  Perhaps the forward scan could be special-cased to never skip the final block of a table.  That way if it is not empty, the truncation is abandoned before taking the AccessExclusiveLock.

Cheers,

Jeff

Re: Spurious standby query cancellations

От
Simon Riggs
Дата:
On 14 September 2015 at 12:00, Jeff Janes <jeff.janes@gmail.com> wrote:
 
It's now possible to fix this by putting a lock wait on the actual lock request, which wasn't available when I first wrote that, hence the crappy wait loop. Using the timeout handler would now be the preferred way to solve this. We can backpatch that to 9.3 if needed, where they were introduced.

There's an example of how to use lock waits further down on ResolveRecoveryConflictWithBufferPin(). Could you have a look at doing it that way?

It looks like this will take some major surgery.  The heavy weight lock manager doesn't play well with others when it comes to timeouts other than its own.  LockBufferForCleanup is a simple retry loop, but the lock manager is much more complicated than that.

Not sure I understand this objection. I can't see a reason that my proposal wouldn't work.
 

We probably also need to resurrect my earlier patch to avoid logging AccessExclusiveLocks on temp tables.

I couldn't find that, can you provide a link?

 
Another source of frequent AccessExclusiveLocks is vacuum truncation attempts.  If a table has some occupied pages right at the end which are marked as all-visible, then forward scan doesn't visit them.  Since it didn't visit them, it assumes they might be empty and truncatable and so takes the AccessExclusiveLock, only to immediately see that the last page is not empty.  Perhaps the forward scan could be special-cased to never skip the final block of a table.  That way if it is not empty, the truncation is abandoned before taking the AccessExclusiveLock.

We can probably get rid of that lock, but it will require lots of smoke and mirrors to persuade scans that they don't actually need to scan as far as they thought they did, because a VACUUM has checked and it knows those blocks do not contain tuples visible to the scan. Which sounds a little hairy.

I think Andres is working on putting an end of data watermark in shared memory rather than using the end of physical file, which sounds like a better plan.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Spurious standby query cancellations

От
Jeff Janes
Дата:
On Wed, Sep 16, 2015 at 2:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 14 September 2015 at 12:00, Jeff Janes <jeff.janes@gmail.com> wrote:
 
It's now possible to fix this by putting a lock wait on the actual lock request, which wasn't available when I first wrote that, hence the crappy wait loop. Using the timeout handler would now be the preferred way to solve this. We can backpatch that to 9.3 if needed, where they were introduced.

There's an example of how to use lock waits further down on ResolveRecoveryConflictWithBufferPin(). Could you have a look at doing it that way?

It looks like this will take some major surgery.  The heavy weight lock manager doesn't play well with others when it comes to timeouts other than its own.  LockBufferForCleanup is a simple retry loop, but the lock manager is much more complicated than that.

Not sure I understand this objection. I can't see a reason that my proposal wouldn't work.

On further thought, neither do I.  The attached patch inverts ResolveRecoveryConflictWithLock to be called back from the lmgr code so that is it like ResolveRecoveryConflictWithBufferPin code.  It does not try to cancel the conflicting lock holders from the signal handler, rather it just loops an extra time and cancels the transactions on the next call.

It looks like the deadlock detection is adequately handled within normal lmgr code within the back-ends of the other parties to the deadlock, so I didn't do a timeout for deadlock detection purposes.

Cheers,

Jeff
Вложения

Re: Spurious standby query cancellations

От
Michael Paquier
Дата:
On Thu, Sep 24, 2015 at 3:33 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Wed, Sep 16, 2015 at 2:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>
>> On 14 September 2015 at 12:00, Jeff Janes <jeff.janes@gmail.com> wrote:
>>
>>>>
>>>> It's now possible to fix this by putting a lock wait on the actual lock
>>>> request, which wasn't available when I first wrote that, hence the crappy
>>>> wait loop. Using the timeout handler would now be the preferred way to solve
>>>> this. We can backpatch that to 9.3 if needed, where they were introduced.
>>>>
>>>> There's an example of how to use lock waits further down on
>>>> ResolveRecoveryConflictWithBufferPin(). Could you have a look at doing it
>>>> that way?
>>>
>>>
>>> It looks like this will take some major surgery.  The heavy weight lock
>>> manager doesn't play well with others when it comes to timeouts other than
>>> its own.  LockBufferForCleanup is a simple retry loop, but the lock manager
>>> is much more complicated than that.
>>
>>
>> Not sure I understand this objection. I can't see a reason that my
>> proposal wouldn't work.
>
>
> On further thought, neither do I.  The attached patch inverts
> ResolveRecoveryConflictWithLock to be called back from the lmgr code so that
> is it like ResolveRecoveryConflictWithBufferPin code.  It does not try to
> cancel the conflicting lock holders from the signal handler, rather it just
> loops an extra time and cancels the transactions on the next call.
>
> It looks like the deadlock detection is adequately handled within normal
> lmgr code within the back-ends of the other parties to the deadlock, so I
> didn't do a timeout for deadlock detection purposes.

Patch moved to next CF because of a lack of reviews. Simon is
registered as reviewer, hence I guess that the ball is on his side of
the field.
-- 
Michael



Re: Spurious standby query cancellations

От
Jeff Janes
Дата:
On Wed, Sep 23, 2015 at 11:33 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>
> On further thought, neither do I.  The attached patch inverts
> ResolveRecoveryConflictWithLock to be called back from the lmgr code so that
> is it like ResolveRecoveryConflictWithBufferPin code.  It does not try to
> cancel the conflicting lock holders from the signal handler, rather it just
> loops an extra time and cancels the transactions on the next call.
>
> It looks like the deadlock detection is adequately handled within normal
> lmgr code within the back-ends of the other parties to the deadlock, so I
> didn't do a timeout for deadlock detection purposes.

I was testing that this still applies to git HEAD.  And it doesn't due
to the re-factoring of lock.h into lockdef.h.  (And apparently it
never actually did, because that refactoring happened long before I
wrote this patch.  I guess I must have done this work against
9_5_STABLE.)

standby.h cannot include lock.h because standby.h is included
indirectly by pg_xlogdump.  But it has to get LOCKTAG which is only in
lock.h.

Does this mean that standby.h also needs to get parts spun off into a
new standbydef.h that can be included from front-end code?

standby.h doesn't need to know the internals of LOCKTAG.  It just
needs to declare a function that receives it as an opaque pointer.  I
don't know if that info helps resolve the situation, though.

Cheers,

Jeff



Re: Spurious standby query cancellations

От
Jeff Janes
Дата:
On Wed, Dec 23, 2015 at 9:40 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Wed, Sep 23, 2015 at 11:33 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>
>> On further thought, neither do I.  The attached patch inverts
>> ResolveRecoveryConflictWithLock to be called back from the lmgr code so that
>> is it like ResolveRecoveryConflictWithBufferPin code.  It does not try to
>> cancel the conflicting lock holders from the signal handler, rather it just
>> loops an extra time and cancels the transactions on the next call.
>>
>> It looks like the deadlock detection is adequately handled within normal
>> lmgr code within the back-ends of the other parties to the deadlock, so I
>> didn't do a timeout for deadlock detection purposes.
>
> I was testing that this still applies to git HEAD.  And it doesn't due
> to the re-factoring of lock.h into lockdef.h.  (And apparently it
> never actually did, because that refactoring happened long before I
> wrote this patch.  I guess I must have done this work against
> 9_5_STABLE.)
>
> standby.h cannot include lock.h because standby.h is included
> indirectly by pg_xlogdump.  But it has to get LOCKTAG which is only in
> lock.h.
>
> Does this mean that standby.h also needs to get parts spun off into a
> new standbydef.h that can be included from front-end code?


That is how I've done it.

The lock cancel patch applies over the header split patch.

Cheers,

Jeff

Вложения

Re: Spurious standby query cancellations

От
Simon Riggs
Дата:
On 24 December 2015 at 20:15, Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Dec 23, 2015 at 9:40 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Wed, Sep 23, 2015 at 11:33 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>
>> On further thought, neither do I.  The attached patch inverts
>> ResolveRecoveryConflictWithLock to be called back from the lmgr code so that
>> is it like ResolveRecoveryConflictWithBufferPin code.  It does not try to
>> cancel the conflicting lock holders from the signal handler, rather it just
>> loops an extra time and cancels the transactions on the next call.
>>
>> It looks like the deadlock detection is adequately handled within normal
>> lmgr code within the back-ends of the other parties to the deadlock, so I
>> didn't do a timeout for deadlock detection purposes.
>
> I was testing that this still applies to git HEAD.  And it doesn't due
> to the re-factoring of lock.h into lockdef.h.  (And apparently it
> never actually did, because that refactoring happened long before I
> wrote this patch.  I guess I must have done this work against
> 9_5_STABLE.)
>
> standby.h cannot include lock.h because standby.h is included
> indirectly by pg_xlogdump.  But it has to get LOCKTAG which is only in
> lock.h.
>
> Does this mean that standby.h also needs to get parts spun off into a
> new standbydef.h that can be included from front-end code?


That is how I've done it.

The lock cancel patch applies over the header split patch.

This looks good to me, apart from some WhitespaceCrime.

Header split applied, will test and apply the main patch this week.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Spurious standby query cancellations

От
Alvaro Herrera
Дата:
Simon Riggs wrote:

> This looks good to me, apart from some WhitespaceCrime.
> 
> Header split applied, will test and apply the main patch this week.

Since the patch already appears to have a committer's attention, it's
okay to move it to the next commitfest; if Simon happens to commit ahead
of time, that would be a great outcome too.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Spurious standby query cancellations

От
Amit Kapila
Дата:
On Fri, Dec 25, 2015 at 1:45 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>
> On Wed, Dec 23, 2015 at 9:40 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> > On Wed, Sep 23, 2015 at 11:33 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> >>
> >> On further thought, neither do I.  The attached patch inverts
> >> ResolveRecoveryConflictWithLock to be called back from the lmgr code so that
> >> is it like ResolveRecoveryConflictWithBufferPin code.  It does not try to
> >> cancel the conflicting lock holders from the signal handler, rather it just
> >> loops an extra time and cancels the transactions on the next call.
> >>
> >> It looks like the deadlock detection is adequately handled within normal
> >> lmgr code within the back-ends of the other parties to the deadlock, so I
> >> didn't do a timeout for deadlock detection purposes.
> >
> > I was testing that this still applies to git HEAD.  And it doesn't due
> > to the re-factoring of lock.h into lockdef.h.  (And apparently it
> > never actually did, because that refactoring happened long before I
> > wrote this patch.  I guess I must have done this work against
> > 9_5_STABLE.)
> >
> > standby.h cannot include lock.h because standby.h is included
> > indirectly by pg_xlogdump.  But it has to get LOCKTAG which is only in
> > lock.h.
> >
> > Does this mean that standby.h also needs to get parts spun off into a
> > new standbydef.h that can be included from front-end code?
>
>
> That is how I've done it.
>
> The lock cancel patch applies over the header split patch.
>

Review Comments -

1.
/*
  * If somebody wakes us between LWLockRelease and WaitLatch, the latch
--- 1081,1103 ----
  * If 
LockTimeout is set, also enable the timeout for that.  We can save a
  * few cycles by enabling both 
timeout sources in one call.
  */
! if (!InHotStandby)
  {
! if (LockTimeout > 0)
!
{
! EnableTimeoutParams timeouts[2];
!
! timeouts[0].id = 
DEADLOCK_TIMEOUT;
! timeouts[0].type = TMPARAM_AFTER;
! timeouts
[0].delay_ms = DeadlockTimeout;
! timeouts[1].id = LOCK_TIMEOUT;
!
timeouts[1].type = TMPARAM_AFTER;
! timeouts[1].delay_ms = LockTimeout;
!
enable_timeouts(timeouts, 2);
! }
! else
! enable_timeout_after
(DEADLOCK_TIMEOUT, DeadlockTimeout);
  }

If you are enabling the timeout only in non-hotstandby mode, then
later in code (in same function) it should be disabled under the same
condition, otherwise it will lead to assertion failure.

2.
+ /*
+ * Clear any timeout requests established above.  We assume here that the
+ * Startup 
process doesn't have any other outstanding timeouts than what 
+ * this function
+ * uses.  If 
that stops being true, we could cancel the timeouts
+ * individually, but that'd be slower.
+ */

Comment seems to be misaligned.

3.
+ void
+ StandbyLockTimeoutHandler(void)
+ {
+ /* forget any pending STANDBY_DEADLOCK_TIMEOUT request */
+
disable_timeout(STANDBY_DEADLOCK_TIMEOUT, false);
+ }

Is there any reason to disable deadlock timeout when it is not
enabled by ResolveRecoveryConflictWithLock()?



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Spurious standby query cancellations

От
Simon Riggs
Дата:
On 24 December 2015 at 20:15, Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Dec 23, 2015 at 9:40 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Wed, Sep 23, 2015 at 11:33 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>
>> On further thought, neither do I.  The attached patch inverts
>> ResolveRecoveryConflictWithLock to be called back from the lmgr code so that
>> is it like ResolveRecoveryConflictWithBufferPin code.  It does not try to
>> cancel the conflicting lock holders from the signal handler, rather it just
>> loops an extra time and cancels the transactions on the next call.
>>
>> It looks like the deadlock detection is adequately handled within normal
>> lmgr code within the back-ends of the other parties to the deadlock, so I
>> didn't do a timeout for deadlock detection purposes.
>
 
That is how I've done it.

It's taken me a while to figure this out.

My testing showed a bug in disable_timeout(), which turns out to be a double-disable, which I've fixed. I'll submit a different patch to put in some diagnostics if such cases show up again, which could happen now we have user-defined timeouts.

What surprises me is that I can't see this patch ever worked as submitted, when run on an assert-enabled build.

If you want this backpatched, please submit versions that apply cleanly and test them. I'm less inclined to do that myself, just regard this as an improvement.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services