Re: BUG #17368: Assert failed in GetSafeSnapshot() for SERIALIZABLE READ ONLY DEFERRABLE transaction
От | Thomas Munro |
---|---|
Тема | Re: BUG #17368: Assert failed in GetSafeSnapshot() for SERIALIZABLE READ ONLY DEFERRABLE transaction |
Дата | |
Msg-id | CA+hUKG+4m+fvnM_Czyp3Z_h4SaqmwR7sq9XE=k0GcaL_afEfUA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #17368: Assert failed in GetSafeSnapshot() for SERIALIZABLE READ ONLY DEFERRABLE transaction (Alexander Lakhin <exclusion@gmail.com>) |
Ответы |
Re: BUG #17368: Assert failed in GetSafeSnapshot() for SERIALIZABLE READ ONLY DEFERRABLE transaction
(Thomas Munro <thomas.munro@gmail.com>)
|
Список | pgsql-bugs |
On Sun, Jan 16, 2022 at 5:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: >> On Fri, Jul 30, 2021 at 9:41 AM Thomas Munro <thomas.munro@gmail.com> wrote: > The problem is that if we don't take the fast exit here: > > if (XactReadOnly && PredXact->WritableSxactCount == 0) > { > ReleasePredXact(sxact); > LWLockRelease(SerializableXactHashLock); > return snapshot; > } > > ... then we search for writable SSI transactions here: > > for (othersxact = FirstPredXact(); > othersxact != NULL; > othersxact = NextPredXact(othersxact)) > { > if (!SxactIsCommitted(othersxact) > && !SxactIsDoomed(othersxact) > && !SxactIsReadOnly(othersxact)) > { > SetPossibleUnsafeConflict(sxact, othersxact); > } > } > > ... but that can find nothing if all writers happen to be doomed. > WritableSxactCount doesn't count read-only or committed transactions > so we don't have to worry about those confusing us, but it does count > doomed transactions. Having an empty list here is mostly fine, except > that nobody will ever set our RO_SAFE flag, and in the case of > DEFERRABLE, the assertion that someone has set it will fail. This is > the case since: > > === > commit bdaabb9b22caa71021754d3967b4032b194d9880 > Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Fri Jul 8 00:36:30 2011 +0300 > > There's a small window wherein a transaction is committed but not yet > on the finished list, and we shouldn't flag it as a potential conflict > if so. We can also skip adding a doomed transaction to the list of > possible conflicts because we know it won't commit. > > Dan Ports and Kevin Grittner. > === > > I see three fixes: > > 1. Check if the list is empty, and if so, set our own > SXACT_FLAG_RO_SAFE. Then the assertion in GetSafeSnapshot() will > pass, and also non-DEFERRABLE callers will eventually see the flag and > opt out. > > 2. Check if the list is empty, and if so, opt out immediately (as we > do when WriteableSxactCount == 0). This would be strictly better than > #1, because it happens sooner while we still have the lock. On the > other hand, we'd have to undo more effects, and that involves writing > more fragile and very rarely run code. > > 3. Just delete the assertion in GetSafeSnapshot(). This is > attractively simple but it means that READ ONLY non-DEFERRABLE > transactions arbitrarily miss out on the RO_SAFE optimisation in this > (admittedly rare) case. > > The attached 0002 has idea #1, which I prefer for back-patching. I > think #2 might be a good idea if we take the filtering logic further > so that it becomes more likely that you get an empty list (for > example, why don't we skip transactions that are running in a > different database?), but then that'd be new code, not something we > back-patch. Thoughts? While swapping unrelated bug #17116 back into my brain, I remembered this one. Coincidentally, I also had a note on my to-do list to investigate a problem that has started showing up on CI, in the new "running" tests (which seem to be finding fun new timing bugs in committed code): http://cfbot.cputube.org/highlights/assertion-30.html Duh, it's this same ancient bug that we already figured out, which dates from 2011. I plan to do some more testing and then proceed with the idea mentioned above[1]. [1] https://www.postgresql.org/message-id/attachment/125053/v4-0002-Fix-race-in-SERIALIZABLE-READ-ONLY.patch
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Tom LaneДата:
Сообщение: Re: found a possible bug, modulus of an integer on a partition table appears to be wrong
Следующее
От: Thomas MunroДата:
Сообщение: Re: BUG #17116: Assert failed in SerialSetActiveSerXmin() on commit of parallelized serializable transaction