pgsql: Fix longstanding recursion hazard in sinval message processing.

Поиск
Список
Период
Сортировка
От Tom Lane
Тема pgsql: Fix longstanding recursion hazard in sinval message processing.
Дата
Msg-id E1fyOsM-00050A-JT@gemulon.postgresql.org
обсуждение исходный текст
Список pgsql-committers
Fix longstanding recursion hazard in sinval message processing.

LockRelationOid and sibling routines supposed that, if our session already
holds the lock they were asked to acquire, they could skip calling
AcceptInvalidationMessages on the grounds that we must have already read
any remote sinval messages issued against the relation being locked.
This is normally true, but there's a critical special case where it's not:
processing inside AcceptInvalidationMessages might attempt to access system
relations, resulting in a recursive call to acquire a relation lock.

Hence, if the outer call had acquired that same system catalog lock, we'd
fall through, despite the possibility that there's an as-yet-unread sinval
message for that system catalog.  This could, for example, result in
failure to access a system catalog or index that had just been processed
by VACUUM FULL.  This is the explanation for buildfarm failures we've been
seeing intermittently for the past three months.  The bug is far older
than that, but commits a54e1f158 et al added a new recursion case within
AcceptInvalidationMessages that is apparently easier to hit than any
previous case.

To fix this, we must not skip calling AcceptInvalidationMessages until
we have *finished* a call to it since acquiring a relation lock, not
merely acquired the lock.  (There's already adequate logic inside
AcceptInvalidationMessages to deal with being called recursively.)
Fortunately, we can implement that at trivial cost, by adding a flag
to LOCALLOCK hashtable entries that tracks whether we know we have
completed such a call.

There is an API hazard added by this patch for external callers of
LockAcquire: if anything is testing for LOCKACQUIRE_ALREADY_HELD,
it might be fooled by the new return code LOCKACQUIRE_ALREADY_CLEAR
into thinking the lock wasn't already held.  This should be a fail-soft
condition, though, unless something very bizarre is being done in
response to the test.

Also, I added an additional output argument to LockAcquireExtended,
assuming that that probably isn't called by any outside code given
the very limited usefulness of its additional functionality.

Back-patch to all supported branches.

Discussion: https://postgr.es/m/12259.1532117714@sss.pgh.pa.us

Branch
------
REL9_3_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/95e9f928ce5e831ceef725b988d937dacae15026

Modified Files
--------------
src/backend/storage/ipc/standby.c |  8 +++---
src/backend/storage/lmgr/lmgr.c   | 38 +++++++++++++++++++++------
src/backend/storage/lmgr/lock.c   | 55 ++++++++++++++++++++++++++++++++++++---
src/include/storage/lock.h        |  8 ++++--
4 files changed, 92 insertions(+), 17 deletions(-)


В списке pgsql-committers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: pgsql: Allow extensions to install built as well as unbuiltheaders.
Следующее
От: Tom Lane
Дата:
Сообщение: pgsql: Limit depth of forced recursion for CLOBBER_CACHE_RECURSIVELY.