Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly
Дата
Msg-id CANP8+j+vcrgv6-KaY_1bdgF7nE2G0r12c398g=7QyazVXem3xg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly  (Andres Freund <andres@anarazel.de>)
Ответы Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtranslinks incorrectly
Список pgsql-hackers
On 24 April 2017 at 00:25, Andres Freund <andres@anarazel.de> wrote:

>> >> It's not clear to me how much potential this has to create user data
>> >> corruption, but it doesn't look good at first glance.  Discuss.
>> >
>> > Hm. I think it can cause wrong tqual.c results in some edge cases.
>> > During HS, lastOverflowedXid will be set in some cases, and then
>> > XidInMVCCSnapshot etc will not find the actual toplevel xid, which'll in
>> > turn cause lookups snapshot->subxip (where all HS xids reside)
>> > potentially return wrong results.
>> >
>> > I was about to say that I don't see how it could result in persistent
>> > corruption however - the subtrans lookups are only done for
>> > (snapshot->xmin, snapshot->xmax] and subtrans is truncated
>> > regularly. But these days CHECKPOINT_END_OF_RECOVERY isn't obligatory
>> > anymore, so that might be delayed.  Hm.
>>
>> I've not found any reason, yet, to believe we return wrong answers in
>> any case, even though the transient data structure pg_subtrans is
>> corrupted by the switched call Tom discovers.
>
> I think I pointed out a danger above. Consider what happens if query on
> a standby has a suboverflowed snapshot:
> Snapshot
> GetSnapshotData(Snapshot snapshot)
> {
> ...
>                 if (TransactionIdPrecedesOrEquals(xmin, procArray->lastOverflowedXid))
>                         suboverflowed = true;
>         }
> ..
>         snapshot->suboverflowed = suboverflowed;
> }
>
> In that case we rely on pg_subtrans for visibility determinations:
> bool
> HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
>                                            Buffer buffer)
> {
> ...
>         if (!HeapTupleHeaderXminCommitted(tuple))
>         {
> ...
>                 else if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))
>                         return false;
>
> and
> static bool
> XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
> {
> ...
>         if (!snapshot->takenDuringRecovery)
>         {
> ...
>         else
>         {
> ...
>                 if (snapshot->suboverflowed)
>                 {
>                         /*
>                          * Snapshot overflowed, so convert xid to top-level.  This is safe
>                          * because we eliminated too-old XIDs above.
>                          */
>                         xid = SubTransGetTopmostTransaction(xid);
>
> if the subxid->xid mapping doesn't actually exist - as it's the case
> with this bug afaics - we'll not get the correct toplevel
> transaction.

The nature of the corruption is that in some cases
* a subxid will point to nothing (even though in most cases it was
already set correctly)
* the parent will point to a subxid

So both wrong.

> Which'll mean the following block:
>                 /*
>                  * We now have either a top-level xid higher than xmin or an
>                  * indeterminate xid. We don't know whether it's top level or subxact
>                  * but it doesn't matter. If it's present, the xid is visible.
>                  */
>                 for (j = 0; j < snapshot->subxcnt; j++)
>                 {
>                         if (TransactionIdEquals(xid, snapshot->subxip[j]))
>                                 return true;
>                 }
> won't work correctly if suboverflowed.

Your example of snapshots taken during recovery is not correct.

Note that SubTransGetTopmostTransaction() returns a valid, running
xid, even though it is the wrong one.

Snapshots work differently on standbys - we store all known running
xids, so the test still passes correctly, even when overflowed.

I'd call that just plain luck. We behave correctly, but for the wrong
reasons, at least in this case.


> I don't see anything prevent wrong results here?

I've had an even better look around now and I think I've found
something but need to turn it into a repeatable test case so I can
double-check this before reporting in full, so I don't cry wolf.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ashutosh Bapat
Дата:
Сообщение: Re: [HACKERS] Adding support for Default partition in partitioning
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: [HACKERS] OK, so culicidae is *still* broken