On 02/02/2015 03:50 PM, Heikki Linnakangas wrote:
> This only happens with assertions enabled. The culprit is commit
> f88d4cfc9d417dac2ee41a8f5e593898e56fd2bd, which added the 'ctid'
> argument to XactLockTableWait. check_exclusion_constraint calls
> index_endscan() just before XactLockTableWait, but that free's the
> memory the ctid points to.
>
> The fix for this particular instance is trivial: copy the ctid to a
> local variable before calling index_endscan. However, looking at the
> other XactLockTableWait() and MultiXactIdWait() calls, there are more
> questionable pointers being passed. Most point to heap tuples on disk
> pages, after releasing the lock on the page, although not the pin. The
> one in EvalPlanQualFetch releases the pin too.
>
> I'll write up a patch to change those call sites to use local variables.
> Hopefully it's trivial enough to still include in 9.4.1, although time
> is really running out..
I'll commit the attached fix shortly, so please shout quickly if you see
a problem with this.
Aside from the potential for segfaults with assertions, I think the
calls passed incorrect ctid anyway. For example:
> --- a/src/backend/executor/execUtils.c
> +++ b/src/backend/executor/execUtils.c
> @@ -1307,7 +1307,7 @@ retry:
> if (TransactionIdIsValid(xwait))
> {
> index_endscan(index_scan);
> - XactLockTableWait(xwait, heap, &tup->t_data->t_ctid,
> + XactLockTableWait(xwait, heap, &tup->t_self,
> XLTW_RecheckExclusionConstr);
> goto retry;
> }
We don't really want to pass the heap tuple's ctid field. If the tuple
was updated (again) in the same transaction, the one that's still
in-progress, that points to the *next* tuple in the chain, but the error
message says "while checking exclusion constraint on tuple (%u,%u) in
relation \"%s\"". We should be passing the TID of the tuple itself, not
the ctid value in the tuple's header. The attached patch fixes that too.
- Heikki