Re: Are ctid chaining loops safe without relation size checks?

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Are ctid chaining loops safe without relation size checks?
Дата
Msg-id 20190515214401.k3z4npoisek3qfgq@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Are ctid chaining loops safe without relation size checks?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi,

On 2019-05-15 15:09:34 -0400, Tom Lane wrote:
> (If we're not checking liveness of x_max before following the link,
> we'd have trouble ...)

I don't think we do everywhere - e.g. in heap_get_latest_tid() case that
made me think about this there's only this as an xmax based loop
termination:

        /*
         * After following a t_ctid link, we might arrive at an unrelated
         * tuple.  Check for XMIN match.
         */
        if (TransactionIdIsValid(priorXmax) &&
            !TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(tp.t_data)))
        {
            UnlockReleaseBuffer(buffer);
            break;
        }

but that's after we already followed the link, and read the page (and it
obviously isn't a liveliness check).

Additionally, note that heap_get_latest_tid() does *not* terminate when
it finds a visible tuple:

        /*
         * Check tuple visibility; if visible, set it as the new result
         * candidate.
         */
        valid = HeapTupleSatisfiesVisibility(&tp, snapshot, buffer);
        CheckForSerializableConflictOut(valid, relation, &tp, buffer, snapshot);
        if (valid)
            *tid = ctid;

it just continues to the next tuple version, if any.

EvalPlanQualFetch() in <= 11 and heapam_tuple_lock() in master and
heap_lock_updated_tuple_rec() don't have the problem that xmax might
suddenly abort while following the chain, because they have code like:

    /*
     * If tuple is being updated by other transaction then we
     * have to wait for its commit/abort, or die trying.
     */
    if (TransactionIdIsValid(SnapshotDirty.xmax))
    {
        ReleaseBuffer(buffer);
        switch (wait_policy)
        {
            case LockWaitBlock:
                XactLockTableWait(SnapshotDirty.xmax,
                                  relation, &tuple->t_self,
                                  XLTW_FetchUpdated);
                break;
            case LockWaitSkip:
                if (!ConditionalXactLockTableWait(SnapshotDirty.xmax))
                    /* skip instead of waiting */
                    return TM_WouldBlock;
                break;
            case LockWaitError:
                if (!ConditionalXactLockTableWait(SnapshotDirty.xmax))
                    ereport(ERROR,
                            (errcode(ERRCODE_LOCK_NOT_AVAILABLE),
                             errmsg("could not obtain lock on row in relation \"%s\"",
                                    RelationGetRelationName(relation))));
                break;
        }
        continue;   /* loop back to repeat heap_fetch */
    }

but heap_get_latest_tid() doesn't have that logic.


So I think the problem is just that heap_get_latest_tid() is missing
this type of check. The reason for which presumably is this piece of
intended functionality:

 * Actually, this gets the latest version that is visible according to
 * the passed snapshot.  You can pass SnapshotDirty to get the very latest,
 * possibly uncommitted version.

which means that neither can it block when xmax is still running, nor
can it terminate when HeapTupleSatisfiesVisibility() returns true.
There's no core code using a !mvcc snapshot however.




Because it's relevant for other work we've talked about for v13, and for
a potential fix:

> Andres Freund <andres@anarazel.de> writes:
> > Which I dutifully rewrote. But I'm actually not sure it's safe at all
> > for heap to rely on t_ctid links to be valid. What prevents a ctid link
> > to point to a page that's since been truncated away?
>
> Nothing, but when would the issue come up?  The updated tuple must be
> newer than the one pointing at it, so if it's dead then the one pointing
> at it must be too, no?

Well, the current tuple might not be dead, it might be
UPDATE_IN_PROGRESS when start following the ctid chain. By the time we
get to the next tuple, that UPDATE might have rolled back, vacuum came
along, removed the new version of thew tuple (which then becomes DEAD,
not RECENTLY_DEAD) and then truncated the relation.  Currently that's
not possible in the nodeTidscan.c case, because we'll have a lock
preventing truncations. But if we were to allow truncations without an
AEL, that'd be different.


I went through a few possible ways to fix this:

1) Break out of heap_get_latest_tid()/ loop, if the ctid to be chained
   to is bigger than the block length. It can't be visible by any
   definition except SnapshotAny, and we could just disallow that. As
   there's a lock on the relation heap_get_latest_tid() is operating on,
   we can rely on that value not getting too outdated.

   But I don't think that's correct, because the newest version of the
   tuple *actually* might be beyond the end of the table at the
   beginning of the scan - we *do* allow extension of the table while
   somebody holds a lock on the table after all.

   I also don't like adding more assumptions that depend on preventing
   truncations while any other lock is held.

2) Just disallow SnapshotDirty/SnapshotAny for heap_get_latest_tid(),
   and break out of the loop if the current tuple is visible. There
   can't be a newer visible version anyway, and as long as the input tid
   for heap_get_latest_tid() points to something the calling transaction
   could see (even if in an earlier snapshot), there has to be a visible
   version somewhere.

   That'd not fix the tid.c callers, but they're essentially unused and
   weird anyway.

   I'm not sure if WHERE CURRENT OF with a WITH HOLD cursor is possible
   / would be a problem. In that case we might need to add a
   XactLockTableWait too.

3) Declare these problems as esoteric, and don't care.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Emacs vs pg_indent's weird indentation for function declarations
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: pgsql: Compute XID horizon for page level index vacuum on primary.