Обсуждение: Adding some error context for lock wait failures
I noted a complaint [1] about how hard it is to debug unforeseen
lock-timeout failures: we give no details about what we were
waiting for. It's not hard to improve that situation, at least
to the extent of printing numeric locktag details similar to what
you get in deadlock reports. (It'd be nice to give object names,
but just as with deadlocks, incurring any additional lock
acquisitions here seems too scary.) The attached patch will
produce reports like
regression=# begin;
BEGIN
regression=*# lock table tenk1;
^CCancel request sent
ERROR: canceling statement due to user request
CONTEXT: waiting for AccessExclusiveLock on relation 77382 of database 77348
regression=!# abort;
ROLLBACK
regression=# set lock_timeout TO '1s';
SET
regression=# begin;
BEGIN
regression=*# lock table tenk1;
ERROR: canceling statement due to lock timeout
CONTEXT: waiting for AccessExclusiveLock on relation 77382 of database 77348
and then the user can manually look up the object's identity.
Thoughts?
regards, tom lane
[1] https://www.postgresql.org/message-id/CAKE1AiY17RgcKCFba1N6Sz6SjHqSqvq%2BcfKWBfyKFEjT-L%2Bqkg%40mail.gmail.com
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 62f3471448e..01b752830fb 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -415,6 +415,7 @@ static void GrantLockLocal(LOCALLOCK *locallock, ResourceOwner owner);
static void BeginStrongLockAcquire(LOCALLOCK *locallock, uint32 fasthashcode);
static void FinishStrongLockAcquire(void);
static ProcWaitStatus WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner);
+static void waitonlock_error_callback(void *arg);
static void ReleaseLockIfHeld(LOCALLOCK *locallock, bool sessionLock);
static void LockReassignOwner(LOCALLOCK *locallock, ResourceOwner parent);
static bool UnGrantLock(LOCK *lock, LOCKMODE lockmode,
@@ -1931,6 +1932,7 @@ static ProcWaitStatus
WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner)
{
ProcWaitStatus result;
+ ErrorContextCallback waiterrcontext;
TRACE_POSTGRESQL_LOCK_WAIT_START(locallock->tag.lock.locktag_field1,
locallock->tag.lock.locktag_field2,
@@ -1939,6 +1941,12 @@ WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner)
locallock->tag.lock.locktag_type,
locallock->tag.mode);
+ /* Setup error traceback support for ereport() */
+ waiterrcontext.callback = waitonlock_error_callback;
+ waiterrcontext.arg = (void *) locallock;
+ waiterrcontext.previous = error_context_stack;
+ error_context_stack = &waiterrcontext;
+
/* adjust the process title to indicate that it's waiting */
set_ps_display_suffix("waiting");
@@ -1990,6 +1998,8 @@ WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner)
/* reset ps display to remove the suffix */
set_ps_display_remove_suffix();
+ error_context_stack = waiterrcontext.previous;
+
TRACE_POSTGRESQL_LOCK_WAIT_DONE(locallock->tag.lock.locktag_field1,
locallock->tag.lock.locktag_field2,
locallock->tag.lock.locktag_field3,
@@ -2000,6 +2010,28 @@ WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner)
return result;
}
+/*
+ * error context callback for failures in WaitOnLock
+ *
+ * We report which lock was being waited on, in the same style used in
+ * deadlock reports. This helps with lock timeout errors in particular.
+ */
+static void
+waitonlock_error_callback(void *arg)
+{
+ LOCALLOCK *locallock = (LOCALLOCK *) arg;
+ const LOCKTAG *tag = &locallock->tag.lock;
+ LOCKMODE mode = locallock->tag.mode;
+ StringInfoData locktagbuf;
+
+ initStringInfo(&locktagbuf);
+ DescribeLockTag(&locktagbuf, tag);
+
+ errcontext("waiting for %s on %s",
+ GetLockmodeName(tag->locktag_lockmethodid, mode),
+ locktagbuf.data);
+}
+
/*
* Remove a proc from the wait-queue it is on (caller must know it is on one).
* This is only used when the proc has failed to get the lock, so we set its
Hi,
On Jul 11, 2025 at 01:06 +0800, Tom Lane <tgl@sss.pgh.pa.us>, wrote:
On Jul 11, 2025 at 01:06 +0800, Tom Lane <tgl@sss.pgh.pa.us>, wrote:
I noted a complaint [1] about how hard it is to debug unforeseen
lock-timeout failures: we give no details about what we were
waiting for. It's not hard to improve that situation, at least
to the extent of printing numeric locktag details similar to what
you get in deadlock reports. (It'd be nice to give object names,
but just as with deadlocks, incurring any additional lock
acquisitions here seems too scary.) The attached patch will
produce reports like
regression=# begin;
BEGIN
regression=*# lock table tenk1;
^CCancel request sent
ERROR: canceling statement due to user request
CONTEXT: waiting for AccessExclusiveLock on relation 77382 of database 77348
regression=!# abort;
ROLLBACK
regression=# set lock_timeout TO '1s';
SET
regression=# begin;
BEGIN
regression=*# lock table tenk1;
ERROR: canceling statement due to lock timeout
CONTEXT: waiting for AccessExclusiveLock on relation 77382 of database 77348
and then the user can manually look up the object's identity.
Thoughts?
regards, tom lane
[1] https://www.postgresql.org/message-id/CAKE1AiY17RgcKCFba1N6Sz6SjHqSqvq%2BcfKWBfyKFEjT-L%2Bqkg%40mail.gmail.com
````
}
PG_CATCH();
{
/* In this path, awaitedLock remains set until LockErrorCleanup */
/* reset ps display to remove the suffix */
set_ps_display_remove_suffix();
/* and propagate the error */
PG_RE_THROW();
}
PG_END_TRY();
/*
* We no longer want LockErrorCleanup to do anything.
*/
awaitedLock = NULL;
/* reset ps display to remove the suffix */
set_ps_display_remove_suffix();
error_context_stack = waiterrcontext.previous;
```
Do we need to rollback error_context_stack to the previous state if we enter the branch for PG_CATCH()?
--
Zhang Mingli
HashData
Zhang Mingli <zmlpostgres@gmail.com> writes:
> Do we need to rollback error_context_stack to the previous state if we enter the branch for PG_CATCH()?
No. The PG_TRY mechanism itself deals with that: the next outer
level of PG_TRY will restore error_context_stack to what it had
been. If this were not so, most other places that add an
error_context_stack entry would also be broken.
regards, tom lane
Hi,
On Jul 11, 2025 at 10:53 +0800, Tom Lane <tgl@sss.pgh.pa.us>, wrote:
On Jul 11, 2025 at 10:53 +0800, Tom Lane <tgl@sss.pgh.pa.us>, wrote:
Zhang Mingli <zmlpostgres@gmail.com> writes:Do we need to rollback error_context_stack to the previous state if we enter the branch for PG_CATCH()?
No. The PG_TRY mechanism itself deals with that: the next outer
level of PG_TRY will restore error_context_stack to what it had
been. If this were not so, most other places that add an
error_context_stack entry would also be broken.
Got it. LGTM.
--
Zhang Mingli
HashData
On Thu, Jul 10, 2025 at 10:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I noted a complaint [1] about how hard it is to debug unforeseen > lock-timeout failures: we give no details about what we were > waiting for. It's not hard to improve that situation, at least > to the extent of printing numeric locktag details similar to what > you get in deadlock reports. (It'd be nice to give object names, > but just as with deadlocks, incurring any additional lock > acquisitions here seems too scary.) The attached patch will > produce reports like > > regression=# begin; > BEGIN > regression=*# lock table tenk1; > ^CCancel request sent > ERROR: canceling statement due to user request > CONTEXT: waiting for AccessExclusiveLock on relation 77382 of database 77348 > regression=!# abort; > ROLLBACK > regression=# set lock_timeout TO '1s'; > SET > regression=# begin; > BEGIN > regression=*# lock table tenk1; > ERROR: canceling statement due to lock timeout > CONTEXT: waiting for AccessExclusiveLock on relation 77382 of database 77348 > > and then the user can manually look up the object's identity. > This seems to be quite useful to me, initially I thought if we can print the relation and database name then this could be even better but it might be a bad idea to fetch the object name while we are in error callback. And anyway deadlock error is also reported in the same format so this makes sense. -- Regards, Dilip Kumar Google
Hi,
On Jul 11, 2025 at 11:36 +0800, Dilip Kumar <dilipbalaut@gmail.com>, wrote:
On Jul 11, 2025 at 11:36 +0800, Dilip Kumar <dilipbalaut@gmail.com>, wrote:
This seems to be quite useful to me, initially I thought if we can
print the relation and database name then this could be even better
but it might be a bad idea to fetch the object name while we are in
error callback.
May be confused if there were tables with same names under different schemas.
--
Zhang Mingli
HashData
--
Zhang Mingli
HashData
On Fri, Jul 11, 2025 at 9:34 AM Zhang Mingli <zmlpostgres@gmail.com> wrote: > > On Jul 11, 2025 at 11:36 +0800, Dilip Kumar <dilipbalaut@gmail.com>, wrote: > > > This seems to be quite useful to me, initially I thought if we can > print the relation and database name then this could be even better > but it might be a bad idea to fetch the object name while we are in > error callback. > > May be confused if there were tables with same names under different schemas. If that's the only issue we can print schema qualified name, but I think the problem is in error callback we just have lock tag information which only have OIDs and we don't look up the relcaches/sys table from the error path. -- Regards, Dilip Kumar Google
Dilip Kumar <dilipbalaut@gmail.com> writes:
> On Fri, Jul 11, 2025 at 9:34 AM Zhang Mingli <zmlpostgres@gmail.com> wrote:
>> May be confused if there were tables with same names under different schemas.
> If that's the only issue we can print schema qualified name, but I
> think the problem is in error callback we just have lock tag
> information which only have OIDs and we don't look up the
> relcaches/sys table from the error path.
Yeah. In an ideal world we'd look up the OID references and print
the object names. But trying to do catalog lookups in an
already-failed transaction seems way too dangerous, just as we've
judged it to be for deadlock reports.
Hearing no comments beyond that one, pushed.
regards, tom lane
Hi,
On 2025-08-29 15:46:33 -0400, Tom Lane wrote:
> Hearing no comments beyond that one, pushed.
valgrind complains that there's a memory leak here:
==374853== 1,024 bytes in 1 blocks are definitely lost in loss record 1,257 of 1,459
==374853== at 0xFD902A: palloc (mcxt.c:1389)
==374853== by 0x101A3D6: initStringInfoInternal (stringinfo.c:45)
==374853== by 0x101A46B: initStringInfo (stringinfo.c:99)
==374853== by 0xD8CF32: waitonlock_error_callback (lock.c:2027)
==374853== by 0xF916E2: errfinish (elog.c:510)
==374853== by 0xDA2076: ProcSleep (proc.c:1621)
==374853== by 0xD8CE85: WaitOnLock (lock.c:1979)
==374853== by 0xD8B9D8: LockAcquireExtended (lock.c:1221)
==374853== by 0xD8ACDA: LockAcquire (lock.c:814)
==374853== by 0xD93364: VirtualXactLock (lock.c:4844)
==374853== by 0xA4337E: WaitForOlderSnapshots (indexcmds.c:492)
==374853== by 0xA4A6DF: ReindexRelationConcurrently (indexcmds.c:4216)
==374853== by 0xA480B4: ReindexIndex (indexcmds.c:2956)
==374853== by 0xA47F12: ExecReindex (indexcmds.c:2885)
==374853== by 0xDBE60F: ProcessUtilitySlow (utility.c:1561)
==374853== by 0xDBCEA9: standard_ProcessUtility (utility.c:1060)
==374853== by 0xDBBB6A: ProcessUtility (utility.c:523)
==374853== by 0xDBA2D8: PortalRunUtility (pquery.c:1153)
==374853== by 0xDBA566: PortalRunMulti (pquery.c:1310)
==374853== by 0xDB99EB: PortalRun (pquery.c:788)
==374853==
{
<insert_a_suppression_name_here>
Memcheck:Leak
match-leak-kinds: definite
fun:palloc
fun:initStringInfoInternal
fun:initStringInfo
fun:waitonlock_error_callback
fun:errfinish
fun:ProcSleep
fun:WaitOnLock
fun:LockAcquireExtended
fun:LockAcquire
fun:VirtualXactLock
fun:WaitForOlderSnapshots
fun:ReindexRelationConcurrently
fun:ReindexIndex
fun:ExecReindex
fun:ProcessUtilitySlow
fun:standard_ProcessUtility
fun:ProcessUtility
fun:PortalRunUtility
fun:PortalRunMulti
fun:PortalRun
}
I suspect that the scope of the leak is somewhat bound, as ErrorContext will
be reset after errors. However it won't be reset if there aren't ever any
errors...
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
> valgrind complains that there's a memory leak here:
> ==374853== 1,024 bytes in 1 blocks are definitely lost in loss record 1,257 of 1,459
> ==374853== at 0xFD902A: palloc (mcxt.c:1389)
> ==374853== by 0x101A3D6: initStringInfoInternal (stringinfo.c:45)
> ==374853== by 0x101A46B: initStringInfo (stringinfo.c:99)
> ==374853== by 0xD8CF32: waitonlock_error_callback (lock.c:2027)
> ==374853== by 0xF916E2: errfinish (elog.c:510)
Hmm, that is interesting -- I'd expect error cleanup to deal with
that. Did you happen to notice the exact repro case? It's surely
easy enough to add a pfree, but I don't believe that other errcontext
callbacks are any more careful than this one.
regards, tom lane
Hi, On 2025-10-09 11:22:39 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > valgrind complains that there's a memory leak here: > > > ==374853== 1,024 bytes in 1 blocks are definitely lost in loss record 1,257 of 1,459 > > ==374853== at 0xFD902A: palloc (mcxt.c:1389) > > ==374853== by 0x101A3D6: initStringInfoInternal (stringinfo.c:45) > > ==374853== by 0x101A46B: initStringInfo (stringinfo.c:99) > > ==374853== by 0xD8CF32: waitonlock_error_callback (lock.c:2027) > > ==374853== by 0xF916E2: errfinish (elog.c:510) > > Hmm, that is interesting -- I'd expect error cleanup to deal with > that. Did you happen to notice the exact repro case? It's surely > easy enough to add a pfree, but I don't believe that other errcontext > callbacks are any more careful than this one. I think the difference to most other cases is that this is just an informational message, so there simply isn't any error cleanup. It's possible we should change that, as you say it's not hard to imagine other error contexts called in < ERROR cases also leaking... As for a repro, it seems the basic case of A: CREATE TABLE foo(); A: BEGIN; LOCK TABLE foo; B: BEGIN; LOCK TABLE foo; A: SELECT pg_sleep(2); A: COMMIT; B: \c triggers it. Of course the backtrace is slightly different in that case compared to the case that I reported upthread. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes:
> On 2025-10-09 11:22:39 -0400, Tom Lane wrote:
>> Hmm, that is interesting -- I'd expect error cleanup to deal with
>> that. Did you happen to notice the exact repro case? It's surely
>> easy enough to add a pfree, but I don't believe that other errcontext
>> callbacks are any more careful than this one.
> I think the difference to most other cases is that this is just an
> informational message, so there simply isn't any error cleanup. It's possible
> we should change that, as you say it's not hard to imagine other error
> contexts called in < ERROR cases also leaking...
Yeah. I see that errfinish does FreeErrorDataContents in the
non-ERROR code path, but of course that does nothing for random
leakages during error processing. I'm tempted to have it do
MemoryContextReset(ErrorContext) if we are at stack depth zero.
That'd be unsafe during nested error processing, but there
should not be anything of interest leftover once we're out
of the nest.
regards, tom lane
I wrote:
> Yeah. I see that errfinish does FreeErrorDataContents in the
> non-ERROR code path, but of course that does nothing for random
> leakages during error processing. I'm tempted to have it do
> MemoryContextReset(ErrorContext) if we are at stack depth zero.
> That'd be unsafe during nested error processing, but there
> should not be anything of interest leftover once we're out
> of the nest.
Concretely, like the attached. This passes check-world, but
I can't test it under valgrind because I'm hitting the same
CREATE DATABASE failure skink is reporting.
I wonder if we should back-patch this. In principle, a backend
that emits a long series of non-error log messages or client
notice messages could accumulate indefinitely much leakage in
ErrorContext. The lack of field reports suggests that maybe
there weren't any such leaks up to now, but that seems unduly
optimistic.
regards, tom lane
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index b7b9692f8c8..648d2d2e70c 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -542,11 +542,20 @@ errfinish(const char *filename, int lineno, const char *funcname)
/* Emit the message to the right places */
EmitErrorReport();
- /* Now free up subsidiary data attached to stack entry, and release it */
- FreeErrorDataContents(edata);
- errordata_stack_depth--;
+ /*
+ * If this is the outermost recursion level, we can clean up by resetting
+ * ErrorContext altogether (compare FlushErrorState), which is good
+ * because it cleans up any random leakages that might have occurred in
+ * places such as context callback functions. If we're nested, we can
+ * only safely remove the subsidiary data of the current stack entry.
+ */
+ if (errordata_stack_depth == 0 && recursion_depth == 1)
+ MemoryContextReset(ErrorContext);
+ else
+ FreeErrorDataContents(edata);
- /* Exit error-handling context */
+ /* Release stack entry and exit error-handling context */
+ errordata_stack_depth--;
MemoryContextSwitchTo(oldcontext);
recursion_depth--;
Hi,
On 2025-10-09 12:50:53 -0400, Tom Lane wrote:
> I wrote:
> > Yeah. I see that errfinish does FreeErrorDataContents in the
> > non-ERROR code path, but of course that does nothing for random
> > leakages during error processing. I'm tempted to have it do
> > MemoryContextReset(ErrorContext) if we are at stack depth zero.
> > That'd be unsafe during nested error processing, but there
> > should not be anything of interest leftover once we're out
> > of the nest.
>
> Concretely, like the attached. This passes check-world, but
> I can't test it under valgrind because I'm hitting the same
> CREATE DATABASE failure skink is reporting.
Sorry, was working on a fix when life rudely intervened. Here's a quick
temporary fix:
diff --git i/src/backend/storage/buffer/bufmgr.c w/src/backend/storage/buffer/bufmgr.c
index d69e08ae61e..51c21e2ee06 100644
--- i/src/backend/storage/buffer/bufmgr.c
+++ w/src/backend/storage/buffer/bufmgr.c
@@ -3325,6 +3325,9 @@ TrackNewBufferPin(Buffer buf)
ref->refcount++;
ResourceOwnerRememberBuffer(CurrentResourceOwner, buf);
+
+ VALGRIND_MAKE_MEM_DEFINED(BufHdrGetBlock(GetBufferDescriptor(buf - 1)), BLCKSZ);
+
}
#define ST_SORT sort_checkpoint_bufferids
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
> On 2025-10-09 12:50:53 -0400, Tom Lane wrote:
>> Concretely, like the attached. This passes check-world, but
>> I can't test it under valgrind because I'm hitting the same
>> CREATE DATABASE failure skink is reporting.
> Sorry, was working on a fix when life rudely intervened. Here's a quick
> temporary fix:
Thanks. With that, I've confirmed that this change suppresses the
leak report in your example, and it also gets through the core
regression tests under valgrind (though I didn't run leak checking
for that). That's enough to convince me that the fix is OK.
Do you have an opinion on whether to back-patch? I'm leaning
in the direction of doing so, but it could be argued that it's
too much risk for a problem that we only know for sure exists
in master.
regards, tom lane
Hi,
On 2025-10-09 13:33:44 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2025-10-09 12:50:53 -0400, Tom Lane wrote:
> >> Concretely, like the attached. This passes check-world, but
> >> I can't test it under valgrind because I'm hitting the same
> >> CREATE DATABASE failure skink is reporting.
>
> > Sorry, was working on a fix when life rudely intervened. Here's a quick
> > temporary fix:
>
> Thanks. With that, I've confirmed that this change suppresses the
> leak report in your example, and it also gets through the core
> regression tests under valgrind (though I didn't run leak checking
> for that). That's enough to convince me that the fix is OK.
There are a few places that do
ereport(...);
/* Flush any strings created in ErrorContext */
FlushErrorState();
That'd be obsoleted by this change, right?
I also looked through other mentions of ErrorContext to see if there are any
issues, didn't find anything.
There are some independently worrisome functions I noticed while looking
around for problems. E.g. initTrie() and GetConnection() catching errors
without doing a transaction rollback seems decidedly not great. But that's
unrelated, just noticed it while grepping around.
> Do you have an opinion on whether to back-patch? I'm leaning
> in the direction of doing so, but it could be argued that it's
> too much risk for a problem that we only know for sure exists
> in master.
I'm a bit worried about it causing problems, although I don't have an actual
theory as to how. So I'd slightly lean towards not backpatching. But it's just
a vague gut feeling.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
> There are a few places that do
> ereport(...);
> /* Flush any strings created in ErrorContext */
> FlushErrorState();
> That'd be obsoleted by this change, right?
Oh, I see them, all in guc.c. Yeah, we should get rid of those;
they seem not too safe anyway given that they're unconditional.
>> Do you have an opinion on whether to back-patch?
> I'm a bit worried about it causing problems, although I don't have an actual
> theory as to how. So I'd slightly lean towards not backpatching. But it's just
> a vague gut feeling.
Fair enough. We can always back-patch later if someone exhibits
an actual problem in the field.
regards, tom lane
I wrote:
> Andres Freund <andres@anarazel.de> writes:
>> There are a few places that do
>> ereport(...);
>> /* Flush any strings created in ErrorContext */
>> FlushErrorState();
>> That'd be obsoleted by this change, right?
> Oh, I see them, all in guc.c. Yeah, we should get rid of those;
> they seem not too safe anyway given that they're unconditional.
Oh, I take that back: we need to keep those, because what they
are for is to clean up strings created by GUC_check_errdetail
and friends, which will happen before the ereport call. The
case where it's problematic is if the error logging level is
high enough that errstart decides there's nothing to do: then
we won't reach errfinish and that cleanup won't happen.
Conceivably we could deal with that scenario by having errstart
do the MemoryContextReset if it takes the no-op path, but I find
that a bit scary. Besides, that path is supposed to be fast.
So we'd better keep those calls in guc.c, but I'll change their
comments ...
regards, tom lane