Re: BUG #17368: Assert failed in GetSafeSnapshot() for SERIALIZABLE READ ONLY DEFERRABLE transaction

Поиск
Список
Период
Сортировка
От Alexander Lakhin
Тема Re: BUG #17368: Assert failed in GetSafeSnapshot() for SERIALIZABLE READ ONLY DEFERRABLE transaction
Дата
Msg-id 18a116d2-7927-6a62-aa60-ab16c53183f9@gmail.com
обсуждение исходный текст
Ответ на BUG #17368: Assert failed in GetSafeSnapshot() for SERIALIZABLE READ ONLY DEFERRABLE transaction  (PG Bug reporting form <noreply@postgresql.org>)
Ответы Re: BUG #17368: Assert failed in GetSafeSnapshot() for SERIALIZABLE READ ONLY DEFERRABLE transaction  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-bugs
15.01.2022 11:00, PG Bug reporting form wrote:
The following bug has been logged on the website:

Bug reference:      17368
This bug is separated from the #17116, because it's an unrelated issue. But it was discussed in the thread #17116 and there was proposed the working v4-0002-Fix-race-in-SERIALIZABLE-READ-ONLY.patch [1].
I've attached the modification for the read-only-anomaly-3 test to reproduce the bug (the result is shown in the bug report).
For ease of reading, all relevant info quoted here:
On Wed, Jul 28, 2021 at 5:02 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Wed, Jul 28, 2021 at 9:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
(Except for read-only-anomaly-3, that causes another
assertion fail but I believe that it's not related to the initial issue
and deserves another bug report.)
Huh.  I tried and failed to find that one with concurrent runs.  I'll
wait for your next report before I do anything, just in case there's a
connection.
This is indeed an unrelated issue, and much older.  After many
kilowatt hours, I reproduced it here and worked out that it comes from
GetSafeSnapshot() not expecting possibleUnsafeConflicts to be empty
when WritableSxactCount == 0.  More soon.

On Fri, Jul 30, 2021 at 9:41 AM Thomas Munro <thomas.munro@gmail.com> wrote:
The problem is that if we don't take the fast exit here:
    if (XactReadOnly && PredXact->WritableSxactCount == 0)    {        ReleasePredXact(sxact);        LWLockRelease(SerializableXactHashLock);        return snapshot;    }

... then we search for writable SSI transactions here:
        for (othersxact = FirstPredXact();             othersxact != NULL;             othersxact = NextPredXact(othersxact))        {            if (!SxactIsCommitted(othersxact)                && !SxactIsDoomed(othersxact)                && !SxactIsReadOnly(othersxact))            {                SetPossibleUnsafeConflict(sxact, othersxact);            }        }

... but that can find nothing if all writers happen to be doomed.
WritableSxactCount doesn't count read-only or committed transactions
so we don't have to worry about those confusing us, but it does count
doomed transactions.  Having an empty list here is mostly fine, except
that nobody will ever set our RO_SAFE flag, and in the case of
DEFERRABLE, the assertion that someone has set it will fail.  This is
the case since:

===
commit bdaabb9b22caa71021754d3967b4032b194d9880
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date:   Fri Jul 8 00:36:30 2011 +0300
    There's a small window wherein a transaction is committed but not yet    on the finished list, and we shouldn't flag it as a potential conflict    if so. We can also skip adding a doomed transaction to the list of    possible conflicts because we know it won't commit.
    Dan Ports and Kevin Grittner.
===

I see three fixes:

1.  Check if the list is empty, and if so, set our own
SXACT_FLAG_RO_SAFE.  Then the assertion in GetSafeSnapshot() will
pass, and also non-DEFERRABLE callers will eventually see the flag and
opt out.

2.  Check if the list is empty, and if so, opt out immediately (as we
do when WriteableSxactCount == 0).  This would be strictly better than
#1, because it happens sooner while we still have the lock.  On the
other hand, we'd have to undo more effects, and that involves writing
more fragile and very rarely run code.

3.  Just delete the assertion in GetSafeSnapshot().  This is
attractively simple but it means that READ ONLY non-DEFERRABLE
transactions arbitrarily miss out on the RO_SAFE optimisation in this
(admittedly rare) case.

The attached 0002 has idea #1, which I prefer for back-patching.  I
think #2 might be a good idea if we take the filtering logic further
so that it becomes more likely that you get an empty list (for
example, why don't we skip transactions that are running in a
different database?), but then that'd be new code, not something we
back-patch.  Thoughts?
[1] https://www.postgresql.org/message-id/CA%2BhUKGJd-%3DmdUdS%2BGh-FN4rZgAg4M-V%3D%3DG7FMCU-KbUffyPJDw%40mail.gmail.com

Best regards,
Alexander
Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #17368: Assert failed in GetSafeSnapshot() for SERIALIZABLE READ ONLY DEFERRABLE transaction
Следующее
От: Alexander Lakhin
Дата:
Сообщение: Re: BUG #17116: Assert failed in SerialSetActiveSerXmin() on commit of parallelized serializable transaction