Re: [RFC] Should we fix postmaster to avoid slow shutdown?

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [RFC] Should we fix postmaster to avoid slow shutdown?
Дата
Msg-id CA+TgmobkffkFeV5zQeQST=xpZpMVAYMfQkUnqg6PUMDMO6FLRg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [RFC] Should we fix postmaster to avoid slow shutdown?  ("Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>)
Ответы Re: [RFC] Should we fix postmaster to avoid slow shutdown?
Re: [RFC] Should we fix postmaster to avoid slow shutdown?
Список pgsql-hackers
On Sun, Nov 20, 2016 at 10:20 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> The reasons why I proposed this patch are:
>
> * It happened in a highly mission-critical production system of a customer who uses 9.2.
>
> * 9.4's solution is not perfect, because it wastes 5 seconds anyway, which is unexpected for users.  The customer's
requirementincludes failover within 30 seconds, so 5 seconds can be seen as a risk.
 
> Plus, I'm worried about the possibility that the SIGKILLed process wouldn't disappear if it's writing to a network
storagelike NFS.
 
>
> * And first of all, the immediate shutdown should shut the server down immediately without doing anything heavy, as
thename means.
 

So there are two questions here:

1. Should we try to avoid having the stats collector write a stats
file during an immediate shutdown?  The file will be removed anyway
during crash recovery, so writing it is pointless.  I think you are
right that 9.4's solution here is not perfect, because of the 5 second
delay, and also because if the stats collector is stuck inside the
kernel trying to write to the OS, it may be in a non-interruptible
wait state where even SIGKILL has no immediate effect.  Anyway, it's
stupid even from a performance point of view to waste time writing a
file that we're just going to nuke.

2. Should we close listen sockets sooner during an immediate shutdown?I agree with Tom and Peter that this isn't a good
idea. People
 
expect the sockets not to go away until the end - e.g. they use
PQping() to test the server status, or they connect just to see what
error they get - and the fact that a client application could
hypothetically generate such a relentless stream of connection
attempts that the dead-end backends thereby created slow down shutdown
is not in my mind a sufficient reason to change the behavior.

So I think 001 should proceed and 002 should be rejected.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "K S, Sandhya (Nokia - IN/Bangalore)"
Дата:
Сообщение: Re: Postgres abort found in 9.3.11
Следующее
От: Tom Lane
Дата:
Сообщение: Re: patch: function xmltable