Обсуждение: pg_promote() can cause busy loop

Поиск
Список
Период
Сортировка

pg_promote() can cause busy loop

От
Fujii Masao
Дата:
Hi,

I found small issue in pg_promote(). If postmaster dies
while pg_promote() is waiting for the standby promotion to finish,
pg_promote() can cause busy loop. This happens because
pg_promote() does nothing when WaitLatch() detects
the postmaster death event. I think that pg_promote()
should bail out of the loop immediately in that case.

Attached is the patch for the fix.

Regards,

-- 
Fujii Masao

Вложения

Re: pg_promote() can cause busy loop

От
Michael Paquier
Дата:
On Thu, Sep 05, 2019 at 09:46:26AM +0900, Fujii Masao wrote:
> I found small issue in pg_promote(). If postmaster dies
> while pg_promote() is waiting for the standby promotion to finish,
> pg_promote() can cause busy loop. This happens because
> pg_promote() does nothing when WaitLatch() detects
> the postmaster death event. I think that pg_promote()
> should bail out of the loop immediately in that case.
>
> Attached is the patch for the fix.

Indeed, this is not correct.

-   ereport(WARNING,
-           (errmsg("server did not promote within %d seconds",
-           wait_seconds)));
+   if (i >= WAITS_PER_SECOND * wait_seconds)
+       ereport(WARNING,
+               (errmsg("server did not promote within %d seconds", wait_seconds)));

Would it make more sense to issue a warning mentioning the postmaster
death and then return PG_RETURN_BOOL(false) instead of breaking out of
the loop?  It could be confusing to warn about a timeout if the
postmaster died in parallel, and we know the actual reason why the
promotion did not happen in this case.
--
Michael

Вложения

Re: pg_promote() can cause busy loop

От
Fujii Masao
Дата:
On Thu, Sep 5, 2019 at 10:26 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Sep 05, 2019 at 09:46:26AM +0900, Fujii Masao wrote:
> > I found small issue in pg_promote(). If postmaster dies
> > while pg_promote() is waiting for the standby promotion to finish,
> > pg_promote() can cause busy loop. This happens because
> > pg_promote() does nothing when WaitLatch() detects
> > the postmaster death event. I think that pg_promote()
> > should bail out of the loop immediately in that case.
> >
> > Attached is the patch for the fix.
>
> Indeed, this is not correct.
>
> -   ereport(WARNING,
> -           (errmsg("server did not promote within %d seconds",
> -           wait_seconds)));
> +   if (i >= WAITS_PER_SECOND * wait_seconds)
> +       ereport(WARNING,
> +               (errmsg("server did not promote within %d seconds", wait_seconds)));
>
> Would it make more sense to issue a warning mentioning the postmaster
> death and then return PG_RETURN_BOOL(false) instead of breaking out of
> the loop?  It could be confusing to warn about a timeout if the
> postmaster died in parallel, and we know the actual reason why the
> promotion did not happen in this case.

It's ok to use PG_RETURN_BOOL(false) instead of breaking out of the loop
in that case. Which would make the code simpler.

But I don't think it's worth warning about postmaster death here
because a backend will emit FATAL message like "terminating connection
due to unexpected postmaster exit" in secure_read() after
pg_promote() returns false.

Regards,

-- 
Fujii Masao



Re: pg_promote() can cause busy loop

От
Michael Paquier
Дата:
On Thu, Sep 05, 2019 at 10:53:19AM +0900, Fujii Masao wrote:
> It's ok to use PG_RETURN_BOOL(false) instead of breaking out of the loop
> in that case. Which would make the code simpler.

Okay.  I would have done so FWIW.

> But I don't think it's worth warning about postmaster death here
> because a backend will emit FATAL message like "terminating connection
> due to unexpected postmaster exit" in secure_read() after
> pg_promote() returns false.

Good point, that could be equally confusing.
--
Michael

Вложения

Re: pg_promote() can cause busy loop

От
Fujii Masao
Дата:
On Thu, Sep 5, 2019 at 11:10 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Sep 05, 2019 at 10:53:19AM +0900, Fujii Masao wrote:
> > It's ok to use PG_RETURN_BOOL(false) instead of breaking out of the loop
> > in that case. Which would make the code simpler.
>
> Okay.  I would have done so FWIW.
>
> > But I don't think it's worth warning about postmaster death here
> > because a backend will emit FATAL message like "terminating connection
> > due to unexpected postmaster exit" in secure_read() after
> > pg_promote() returns false.
>
> Good point, that could be equally confusing.

So, barring any objection, I will commit the attached patch.

Regards,

-- 
Fujii Masao

Вложения

Re: pg_promote() can cause busy loop

От
Michael Paquier
Дата:
On Thu, Sep 05, 2019 at 04:03:22PM +0900, Fujii Masao wrote:
> So, barring any objection, I will commit the attached patch.

LGTM.  Thanks!
--
Michael

Вложения

Re: pg_promote() can cause busy loop

От
Fujii Masao
Дата:
On Thu, Sep 5, 2019 at 4:52 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Sep 05, 2019 at 04:03:22PM +0900, Fujii Masao wrote:
> > So, barring any objection, I will commit the attached patch.
>
> LGTM.  Thanks!

Committed. Thanks!

Regards,

-- 
Fujii Masao