Re: [RFC] Should we fix postmaster to avoid slow shutdown?

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [RFC] Should we fix postmaster to avoid slow shutdown?
Дата
Msg-id CA+TgmoYb7mFYthxj9dJAjZbXu0gy6NeFLB8u83Ao26VrKGM6zg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [RFC] Should we fix postmaster to avoid slow shutdown?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Tue, Nov 22, 2016 at 3:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I agree.  However, in many cases, the major cost of a fast shutdown is
>> getting the dirty data already in the operating system buffers down to
>> disk, not in writing out shared_buffers itself.  The latter is
>> probably a single-digit number of gigabytes, or maybe double-digit.
>> The former might be a lot more, and the write of the pgstat file may
>> back up behind it.  I've seen cases where an 8kB buffered write from
>> Postgres takes tens of seconds to complete because the OS buffer cache
>> is already saturated with dirty data, and the stats files could easily
>> be a lot more than that.
>
> I think this is mostly FUD, because we don't fsync the stats files.  Maybe
> we should, but we don't today.  So even if we have managed to get the
> system into a state where physical writes are heavily backlogged, that's
> not a reason to assume that the stats collector will be unable to do its
> thing promptly.  All it has to do is push a relatively small amount of
> data into kernel buffers.

I don't believe that's automatically fast, if we're bumping up against
dirty_ratio.  However, suppose you're right.  Then what prompted the
original complaint?  The OP said "The problem here is that postmaster
took as long as 15 seconds to terminate after it had detected a
crashed backend."  It clearly WASN'T an indefinite hang as might have
occurred with the malloc-lock problem for which we implemented the
SIGKILL stuff.  So something during shutdown took a long time, but not
forever.  There's no convincing evidence I've seen that it has to have
been this particular thing, but I find it plausible, because normal
backends bail out without doing much of anything, and here we have a
process that is trying to continue doing work after having received
SIGQUIT.  If not this, then what?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [RFC] Should we fix postmaster to avoid slow shutdown?
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [RFC] Should we fix postmaster to avoid slow shutdown?