Обсуждение: BUG #16817: kill process cause postmaster hang

Поиск

Список

Период

Сортировка

BUG #16817: kill process cause postmaster hang

От

PG Bug reporting form

Дата:

11 января 2021 г., 15:10:42

The following bug has been logged on the website:

Bug reference:      16817
Logged by:          Bo Chen
Email address:      bchen90@163.com
PostgreSQL version: 11.8
Operating system:   euleros v2r7 x86_64
Description:

Hi hackers

    Recently we encountered a problem that after killed walwriter, we expect
the database can recover normally, but it not (the postmaster hang in the
stat of  'wait dead end'，and the archiver does't exit).
    After analysis this problem, we found it could be a bug for a long time.
for archiver now use 'system' to call the configed archive command. For
'system' the linux programmer's manual describe the following 'During
execution of the command, SIGCHLD will be blocked, and SIGINT and SIGQUIT
will be ignored'. 

    So, when a child chrash, we now just SIGQUIT the archiver just one time,
while the archiver just execute 'system', SIGQUIT  will be ignored, then the
posmaster hang in stat of 'wait dead end'.

    For this porblem, we now added a SIGUSR2 for archiver after SIGQUIT  for
HandleChildCrash. If there any other solution？

   regards，ChenBo

Re: BUG #16817: kill process cause postmaster hang

От

Tom Lane

Дата:

11 января 2021 г., 15:55:30

PG Bug reporting form <noreply@postgresql.org> writes:
>     Recently we encountered a problem that after killed walwriter, we expect
> the database can recover normally, but it not (the postmaster hang in the
> stat of  'wait dead end', and the archiver does't exit).
>     After analysis this problem, we found it could be a bug for a long time.
> for archiver now use 'system' to call the configed archive command. For
> 'system' the linux programmer's manual describe the following 'During
> execution of the command, SIGCHLD will be blocked, and SIGINT and SIGQUIT
> will be ignored'. 

>     So, when a child chrash, we now just SIGQUIT the archiver just one time,
> while the archiver just execute 'system', SIGQUIT  will be ignored, then the
> posmaster hang in stat of 'wait dead end'.

Not sure I believe this: why wouldn't the SIGKILL-after-5-seconds logic
get us out of that situation?

            regards, tom lane

Re: BUG #16817: kill process cause postmaster hang

От

bchen90

Дата:

25 января 2021 г., 01:01:04

Hi, tom

    Thanks for you reply, and can you elaborate "SIGKILL-after-5-seconds
logic"？


regards, chenbo



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-bugs-f2117394.html

Re: BUG #16817: kill process cause postmaster hang

От

Andy Fan

Дата:

25 января 2021 г., 01:29:52

On Mon, Jan 25, 2021 at 9:01 AM bchen90 <bchen90@163.com> wrote:

Hi, tom

Thanks for you reply, and can you elaborate "SIGKILL-after-5-seconds
logic"？

regards, chenbo

82233ce7ea42d6ba519aaec63008aff49da6c7af should be the commit Tom was

talking about.

commit 82233ce7ea42d6ba519aaec63008aff49da6c7af
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Fri Jun 28 17:20:53 2013 -0400

Send SIGKILL to children if they don't die quickly in immediate shutdown

On immediate shutdown, or during a restart-after-crash sequence,
postmaster used to send SIGQUIT (and then abandon ship if shutdown); but
this is not a good strategy if backends don't die because of that
signal. (This might happen, for example, if a backend gets tangled
trying to malloc() due to gettext(), as in an example illustrated by
MauMau.) This causes problems when later trying to restart the server,
because some processes are still attached to the shared memory segment.

Instead of just abandoning such backends to their fates, we now have
postmaster hang around for a little while longer, send a SIGKILL after
some reasonable waiting period, and then exit. This makes immediate
shutdown more reliable.

There is disagreement on whether it's best for postmaster to exit after
sending SIGKILL, or to stick around until all children have reported
death. If this controversy is resolved differently than what this patch
implements, it's an easy change to make.

Bug reported by MauMau in message 20DAEA8949EC4E2289C6E8E58560DEC0@maumau

MauMau and Álvaro Herrera

Best Regards

Andy Fan (https://www.aliyun.com/)

Re: BUG #16817: kill process cause postmaster hang

От

Michael Paquier

Дата:

25 января 2021 г., 01:35:01

On Sun, Jan 24, 2021 at 06:01:04PM -0700, bchen90 wrote:
>     Thanks for you reply, and can you elaborate "SIGKILL-after-5-seconds
> logic"？

You are looking for the changes related to this command, as of
postmaster.c:
git grep SIGKILL_CHILDREN_AFTER_SECS
--
Michael

Вложения

signature.asc

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: BUG #16817: kill process cause postmaster hang

BUG #16817: kill process cause postmaster hang

Re: BUG #16817: kill process cause postmaster hang

Re: BUG #16817: kill process cause postmaster hang

Re: BUG #16817: kill process cause postmaster hang

Re: BUG #16817: kill process cause postmaster hang

Вложения