Re: stress test for parallel workers

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: stress test for parallel workers
Дата
Msg-id CA+hUKG+C6uPF1cNZkW8xg+NAgorW9Q5DQusGGCAj+K8sb8m_aQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: stress test for parallel workers  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Sat, Oct 12, 2019 at 9:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-10-11 14:56:41 -0400, Tom Lane wrote:
> >> ... So it's really hard to explain
> >> that as anything except a kernel bug: sometimes, the kernel
> >> doesn't give us as much stack as it promised it would.  And the
> >> machine is not loaded enough for there to be any rational
> >> resource-exhaustion excuse for that.
>
> > Linux expands stack space only on demand, thus it's possible to run out
> > of stack space while there ought to be stack space. Unfortunately that
> > during a stack expansion, which means there's no easy place to report
> > that.  I've seen this be hit in production on busy machines.
>
> As I said, this machine doesn't seem busy enough for that to be a
> tenable excuse; there's nobody but me logged in, and the buildfarm
> critter isn't running.

Yeah.  As I speculated in the other thread[1], the straightforward
can't-allocate-any-more-space-but-no-other-way-to-tell-you-that case,
ie, the explanation that doesn't involve a bug in Linux or PostgreSQL,
seems unlikely unless we also see other more obvious signs of
occasional overcommit problems (ie not during stack expansion) on
those hosts, doesn't it?  How likely is it that this 1-2MB of stack
space is the straw that breaks the camels back, every time?

[1] https://www.postgresql.org/message-id/CA%2BhUKGJ_MkqdEH-LmmebhNLSFeyWwvYVXfPaz3A2_p27EQfZwA%40mail.gmail.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Justin Pryzby
Дата:
Сообщение: v12.0 ERROR: trying to store a heap tuple into wrong type of slot
Следующее
От: Tom Lane
Дата:
Сообщение: Re: stress test for parallel workers