Обсуждение: pgsql: Perform an immediate shutdown if the postmaster.pid file is remo

Поиск
Список
Период
Сортировка

pgsql: Perform an immediate shutdown if the postmaster.pid file is remo

От
Tom Lane
Дата:
Perform an immediate shutdown if the postmaster.pid file is removed.

The postmaster now checks every minute or so (worst case, at most two
minutes) that postmaster.pid is still there and still contains its own PID.
If not, it performs an immediate shutdown, as though it had received
SIGQUIT.

The original goal behind this change was to ensure that failed buildfarm
runs would get fully cleaned up, even if the test scripts had left a
postmaster running, which is not an infrequent occurrence.  When the
buildfarm script removes a test postmaster's $PGDATA directory, its next
check on postmaster.pid will fail and cause it to exit.  Previously, manual
intervention was often needed to get rid of such orphaned postmasters,
since they'd block new test postmasters from obtaining the expected socket
address.

However, by checking postmaster.pid and not something else, we can provide
additional robustness: manual removal of postmaster.pid is a frequent DBA
mistake, and now we can at least limit the damage that will ensue if a new
postmaster is started while the old one is still alive.

Back-patch to all supported branches, since we won't get the desired
improvement in buildfarm reliability otherwise.

Branch
------
REL9_3_STABLE

Details
-------
http://git.postgresql.org/pg/commitdiff/31bc563b9be306623c5e9a52816b432945fa6df9

Modified Files
--------------
src/backend/postmaster/postmaster.c |   52 ++++++++++++++++++++------
src/backend/utils/init/miscinit.c   |   70 +++++++++++++++++++++++++++++++++++
src/include/miscadmin.h             |    1 +
3 files changed, 112 insertions(+), 11 deletions(-)


Re: pgsql: Perform an immediate shutdown if the postmaster.pid file is remo

От
Thom Brown
Дата:
On 6 October 2015 at 22:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Perform an immediate shutdown if the postmaster.pid file is removed.
>
> The postmaster now checks every minute or so (worst case, at most two
> minutes) that postmaster.pid is still there and still contains its own PID.
> If not, it performs an immediate shutdown, as though it had received
> SIGQUIT.
>
> The original goal behind this change was to ensure that failed buildfarm
> runs would get fully cleaned up, even if the test scripts had left a
> postmaster running, which is not an infrequent occurrence.  When the
> buildfarm script removes a test postmaster's $PGDATA directory, its next
> check on postmaster.pid will fail and cause it to exit.  Previously, manual
> intervention was often needed to get rid of such orphaned postmasters,
> since they'd block new test postmasters from obtaining the expected socket
> address.
>
> However, by checking postmaster.pid and not something else, we can provide
> additional robustness: manual removal of postmaster.pid is a frequent DBA
> mistake, and now we can at least limit the damage that will ensue if a new
> postmaster is started while the old one is still alive.
>
> Back-patch to all supported branches, since we won't get the desired
> improvement in buildfarm reliability otherwise.
>
> Branch
> ------
> REL9_3_STABLE
>
> Details
> -------
> http://git.postgresql.org/pg/commitdiff/31bc563b9be306623c5e9a52816b432945fa6df9
>
> Modified Files
> --------------
> src/backend/postmaster/postmaster.c |   52 ++++++++++++++++++++------
> src/backend/utils/init/miscinit.c   |   70 +++++++++++++++++++++++++++++++++++
> src/include/miscadmin.h             |    1 +
> 3 files changed, 112 insertions(+), 11 deletions(-)

The log contains a misleading output following the removal of the pid file:

2015-10-09 15:39:32 BST [31507]: [4-1] user=,db=,client= LOG:  could
not open file "postmaster.pid": No such file or directory
2015-10-09 15:39:32 BST [31507]: [5-1] user=,db=,client= LOG:
performing immediate shutdown because data directory lock file is
invalid
2015-10-09 15:39:32 BST [31507]: [6-1] user=,db=,client= LOG:
received immediate shutdown request
2015-10-09 15:39:32 BST [31556]: [1-1] user=,db=,client= WARNING:
terminating connection because of crash of another server process
2015-10-09 15:39:32 BST [31556]: [2-1] user=,db=,client= DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally
and possibly corrupted shared memory.
2015-10-09 15:39:32 BST [31556]: [3-1] user=,db=,client= HINT:  In a
moment you should be able to reconnect to the database and repeat your
command.

Is this anything we need to worry about?

--
Thom


Re: pgsql: Perform an immediate shutdown if the postmaster.pid file is remo

От
Tom Lane
Дата:
Thom Brown <thom@linux.com> writes:
> The log contains a misleading output following the removal of the pid file:

> 2015-10-09 15:39:32 BST [31507]: [4-1] user=,db=,client= LOG:  could
> not open file "postmaster.pid": No such file or directory
> 2015-10-09 15:39:32 BST [31507]: [5-1] user=,db=,client= LOG:
> performing immediate shutdown because data directory lock file is
> invalid
> 2015-10-09 15:39:32 BST [31507]: [6-1] user=,db=,client= LOG:
> received immediate shutdown request
> 2015-10-09 15:39:32 BST [31556]: [1-1] user=,db=,client= WARNING:
> terminating connection because of crash of another server process
> 2015-10-09 15:39:32 BST [31556]: [2-1] user=,db=,client= DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server process exited abnormally
> and possibly corrupted shared memory.
> 2015-10-09 15:39:32 BST [31556]: [3-1] user=,db=,client= HINT:  In a
> moment you should be able to reconnect to the database and repeat your
> command.

Looks as-expected to me.  We're forcing a panic stop.

            regards, tom lane


Re: pgsql: Perform an immediate shutdown if the postmaster.pid file is remo

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> Thom Brown <thom@linux.com> writes:
> > The log contains a misleading output following the removal of the pid file:
>
> > 2015-10-09 15:39:32 BST [31507]: [4-1] user=,db=,client= LOG:  could
> > not open file "postmaster.pid": No such file or directory
> > 2015-10-09 15:39:32 BST [31507]: [5-1] user=,db=,client= LOG:
> > performing immediate shutdown because data directory lock file is
> > invalid
> > 2015-10-09 15:39:32 BST [31507]: [6-1] user=,db=,client= LOG:
> > received immediate shutdown request
> > 2015-10-09 15:39:32 BST [31556]: [1-1] user=,db=,client= WARNING:
> > terminating connection because of crash of another server process
> > 2015-10-09 15:39:32 BST [31556]: [2-1] user=,db=,client= DETAIL:  The
> > postmaster has commanded this server process to roll back the current
> > transaction and exit, because another server process exited abnormally
> > and possibly corrupted shared memory.
> > 2015-10-09 15:39:32 BST [31556]: [3-1] user=,db=,client= HINT:  In a
> > moment you should be able to reconnect to the database and repeat your
> > command.
>
> Looks as-expected to me.  We're forcing a panic stop.

I think he's complaining that the final HINT is misleading.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pgsql: Perform an immediate shutdown if the postmaster.pid file is remo

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> Looks as-expected to me.  We're forcing a panic stop.

> I think he's complaining that the final HINT is misleading.

Well, all the particular backend knows is that it got SIGQUIT.
Maybe we should rewrite the message text for that entirely, but
that didn't seem in-scope for this patch.

            regards, tom lane