Re: Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up
Дата
Msg-id 20150127081746.GI4655@awork2.anarazel.de
обсуждение исходный текст
Ответ на Re: Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-hackers
On 2015-01-27 16:23:53 +0900, Michael Paquier wrote:
> On Tue, Jan 27, 2015 at 6:24 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > Unfortunately that Assert()s when there's a lock conflict because
> > e.g. another backend is currently connecting. That's because ProcSleep()
> > does a enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout) - and
> > there's no deadlock timeout (or lock timeout) handler registered.

> Yes, that could logically happen if there is a lock conflicting as
> RowExclusiveLock or lower lock can be taken in recovery.

I don't this specific lock (it's a object, not relation lock) can easily
be taken directly by a user except during authentication.

> > [...]
> > afaics, that should work? Not pretty, but probably easier than starting
> > to reason about the deadlock detector in the startup process.

> Wouldn't it be cleaner to simply register a dedicated handler in
> StartupXlog instead, obviously something else than DEADLOCK_TIMEOUT as
> it is reserved for backend operations? For back-branches, we may even
> consider using DEADLOCK_TIMEOUT..

What would that timeout handler actually do? Two problems:
a) We really don't want the startup process be killed/error out as that  likely means a postmaster restart. So the
defaultdeadlock detector  strategy is something that's really useless for us.
 
b) Pretty much all other (if not actually all other) heavyweight lock  acquisitions in the startup process acquire
locksusing dontWait =  true - which means that the deadlock detector isn't actually  run. That's essentially fine
becausewe simply kill everything in our  way. C.f. StandbyAcquireAccessExclusiveLock() et al.
 

There's a dedicated 'deadlock detector' like infrastructure around
ResolveRecoveryConflictWithBufferPin(), but it deals with a class of
deadlocks that's not handled in the deadlock detector anyway.


I think this isn't particularly pretty, but it seems to be working well
enough, and changing it would be pretty invasive. So keeping in line
with all that code seems to be easier.

> > We probably should also add a Assert(!InRecovery || sessionLock) to
> > LockAcquireExtended() - these kind of problems are otherwise hard to
> > find in a developer setting.

> So this means that locks other than session ones cannot be taken while
> a node is in recovery, but RowExclusiveLock can be taken while in
> recovery. Don't we have a problem with this assertion then?

Note that InRecovery doesn't mean what you probably think it means:
/** Are we doing recovery from XLOG?** This is only ever true in the startup process; it should be read as meaning*
"thisprocess is replaying WAL records", rather than "the system is in* recovery mode".  It should be examined primarily
byfunctions that need* to act differently when called from a WAL redo function (e.g., to skip WAL* logging).  To check
whetherthe system is in recovery regardless of which* process you're running in, use RecoveryInProgress() but only
aftershared* memory startup and lock initialization.*/
 
bool        InRecovery = false;

The assertion actually should be even stricter:
Assert(InRecovery || (sessionLock && dontWait));
i.e. we never acquire a heavyweight lock in the startup process unless
it's a session lock (as we don't have resource managers/a xact to track
locks) and we don't wait (as we don't have the deadlock detector
infrastructure set up).

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Using 128-bit integers for sum, avg and statistics aggregates
Следующее
От: Andreas Karlsson
Дата:
Сообщение: Re: Using 128-bit integers for sum, avg and statistics aggregates