Re: [GENERAL] startup process stuck in recovery

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: [GENERAL] startup process stuck in recovery
Дата
Msg-id CANP8+jJhvwhngTaoT1yEmi1YD-uPDEsr=vbHz_yM+=y-NZgn=g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [GENERAL] startup process stuck in recovery  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [GENERAL] startup process stuck in recovery  (Christophe Pettus <xof@thebuild.com>)
Список pgsql-general
On 10 October 2017 at 21:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> What I see is that, given this particular test case, the backend
> process on the master never holds more than a few locks at a time.
> Each time we abort a subtransaction, the AE lock it was holding
> on the temp table it created gets dropped.  However ... on the
> standby server, pre v10, the replay process attempts to take all
> 12000 of those AE locks at once.  This is not a great plan.

Standby doesn't take locks "at once", they are added just as they
arrive. The locks are held by topxid, so not released at subxid abort,
by design, so they are held concurrently.

> v10 and HEAD avoid the problem because the standby server doesn't
> take locks (any at all, AFAICS).  I suppose this must be a
> consequence of commit 9b013dc238c, though I'm not sure exactly how.

Locks are still taken, but in 9b013dc238c we just avoid trying to
release locks when transactions don't have any.

> Anyway, it's pretty scary that it's so easy to run the replay process
> out of shared memory pre-v10.  I wonder if we should consider
> backpatching that fix.  Any situation where the replay process takes
> more locks concurrently than were ever held on the master is surely
> very bad news.

v10 improves on this specific point because we perform lock release at
subxid abort.

Various cases have been reported over time and this has been improving
steadily in each release.

It isn't "easy" to run the replay process out of memory because
clearly that doesn't happen much, but yes there are some pessimal use
cases that don't work well. The use case described seems incredibly
unreal and certainly amenable to being rewritten.

Backpatching some of those fixes is quite risky, IMHO.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

В списке pgsql-general по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: [GENERAL] core system is getting unresponsive because over 300cpu load
Следующее
От: Christophe Pettus
Дата:
Сообщение: Re: [GENERAL] startup process stuck in recovery