Обсуждение: pg_promote() can cause busy loop
Hi, I found small issue in pg_promote(). If postmaster dies while pg_promote() is waiting for the standby promotion to finish, pg_promote() can cause busy loop. This happens because pg_promote() does nothing when WaitLatch() detects the postmaster death event. I think that pg_promote() should bail out of the loop immediately in that case. Attached is the patch for the fix. Regards, -- Fujii Masao
Вложения
On Thu, Sep 05, 2019 at 09:46:26AM +0900, Fujii Masao wrote:
> I found small issue in pg_promote(). If postmaster dies
> while pg_promote() is waiting for the standby promotion to finish,
> pg_promote() can cause busy loop. This happens because
> pg_promote() does nothing when WaitLatch() detects
> the postmaster death event. I think that pg_promote()
> should bail out of the loop immediately in that case.
>
> Attached is the patch for the fix.
Indeed, this is not correct.
- ereport(WARNING,
- (errmsg("server did not promote within %d seconds",
- wait_seconds)));
+ if (i >= WAITS_PER_SECOND * wait_seconds)
+ ereport(WARNING,
+ (errmsg("server did not promote within %d seconds", wait_seconds)));
Would it make more sense to issue a warning mentioning the postmaster
death and then return PG_RETURN_BOOL(false) instead of breaking out of
the loop? It could be confusing to warn about a timeout if the
postmaster died in parallel, and we know the actual reason why the
promotion did not happen in this case.
--
Michael
Вложения
On Thu, Sep 5, 2019 at 10:26 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Sep 05, 2019 at 09:46:26AM +0900, Fujii Masao wrote:
> > I found small issue in pg_promote(). If postmaster dies
> > while pg_promote() is waiting for the standby promotion to finish,
> > pg_promote() can cause busy loop. This happens because
> > pg_promote() does nothing when WaitLatch() detects
> > the postmaster death event. I think that pg_promote()
> > should bail out of the loop immediately in that case.
> >
> > Attached is the patch for the fix.
>
> Indeed, this is not correct.
>
> - ereport(WARNING,
> - (errmsg("server did not promote within %d seconds",
> - wait_seconds)));
> + if (i >= WAITS_PER_SECOND * wait_seconds)
> + ereport(WARNING,
> + (errmsg("server did not promote within %d seconds", wait_seconds)));
>
> Would it make more sense to issue a warning mentioning the postmaster
> death and then return PG_RETURN_BOOL(false) instead of breaking out of
> the loop? It could be confusing to warn about a timeout if the
> postmaster died in parallel, and we know the actual reason why the
> promotion did not happen in this case.
It's ok to use PG_RETURN_BOOL(false) instead of breaking out of the loop
in that case. Which would make the code simpler.
But I don't think it's worth warning about postmaster death here
because a backend will emit FATAL message like "terminating connection
due to unexpected postmaster exit" in secure_read() after
pg_promote() returns false.
Regards,
--
Fujii Masao
On Thu, Sep 05, 2019 at 10:53:19AM +0900, Fujii Masao wrote: > It's ok to use PG_RETURN_BOOL(false) instead of breaking out of the loop > in that case. Which would make the code simpler. Okay. I would have done so FWIW. > But I don't think it's worth warning about postmaster death here > because a backend will emit FATAL message like "terminating connection > due to unexpected postmaster exit" in secure_read() after > pg_promote() returns false. Good point, that could be equally confusing. -- Michael
Вложения
On Thu, Sep 5, 2019 at 11:10 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Sep 05, 2019 at 10:53:19AM +0900, Fujii Masao wrote: > > It's ok to use PG_RETURN_BOOL(false) instead of breaking out of the loop > > in that case. Which would make the code simpler. > > Okay. I would have done so FWIW. > > > But I don't think it's worth warning about postmaster death here > > because a backend will emit FATAL message like "terminating connection > > due to unexpected postmaster exit" in secure_read() after > > pg_promote() returns false. > > Good point, that could be equally confusing. So, barring any objection, I will commit the attached patch. Regards, -- Fujii Masao
Вложения
On Thu, Sep 05, 2019 at 04:03:22PM +0900, Fujii Masao wrote: > So, barring any objection, I will commit the attached patch. LGTM. Thanks! -- Michael
Вложения
On Thu, Sep 5, 2019 at 4:52 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Sep 05, 2019 at 04:03:22PM +0900, Fujii Masao wrote: > > So, barring any objection, I will commit the attached patch. > > LGTM. Thanks! Committed. Thanks! Regards, -- Fujii Masao