Обсуждение: Race condition in TransactionIdIsInProgress

Поиск

Список

Период

Сортировка

Race condition in TransactionIdIsInProgress

От

Konstantin Knizhnik

Дата:

10 февраля 2022 г., 19:46:43

Hi hackers,

Postgres first records state transaction in CLOG, then removes 
transaction from procarray and finally releases locks.
But it can happen that transaction is marked as committed in CLOG, 
XMAX_COMMITTED bit is set in modified tuple but
TransactionIdIsInProgress still returns true. As a result "t_xmin %u is 
uncommitted in tuple (%u,%u) to be updated in table"
error is reported.

TransactionIdIsInProgress first checks for cached XID status:

     /*
      * We may have just checked the status of this transaction, so if it is
      * already known to be completed, we can fall out without any access to
      * shared memory.
      */
     if (TransactionIdIsKnownCompleted(xid))
     {
         xc_by_known_xact_inc();
         return false;
     }

This status cache is updated by TransactionLogFetch.
So once transaction status is fetched from CLOG, subsequent invocation 
of TransactionIdIsKnownCompleted will return true
even if transaction is not removed from procarray.

The following stack illustrates how it can happen:

#6  0x000055ab7758bbaf in TransactionLogFetch (transactionId=4482)
     at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/transam/transam.c:88
#7  TransactionLogFetch (transactionId=4482)
     at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/transam/transam.c:52
#8  0x000055ab7758bc7d in TransactionIdDidAbort 
(transactionId=transactionId@entry=4482)
     at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/transam/transam.c:186
#9  0x000055ab77812a43 in TransactionIdIsInProgress (xid=4482)
     at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/storage/ipc/procarray.c:1587
#10 0x000055ab7753de79 in HeapTupleSatisfiesVacuumHorizon 
(htup=htup@entry=0x7ffe562cec50, buffer=buffer@entry=1817,
     dead_after=dead_after@entry=0x7ffe562ceb14)
     at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam_visibility.c:1281
#11 0x000055ab7753e2c5 in HeapTupleSatisfiesNonVacuumable (buffer=1817, 
snapshot=0x7ffe562cec70, htup=0x7ffe562cec50)
     at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam_visibility.c:1449
#12 HeapTupleSatisfiesVisibility (tup=tup@entry=0x7ffe562cec50, 
snapshot=snapshot@entry=0x7ffe562cec70, buffer=buffer@entry=1817)
     at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam_visibility.c:1804
#13 0x000055ab7752f3b5 in heap_hot_search_buffer 
(tid=tid@entry=0x7ffe562cec2a, relation=relation@entry=0x7fa57fbf17c8, 
buffer=buffer@entry=1817,
     snapshot=snapshot@entry=0x7ffe562cec70, 
heapTuple=heapTuple@entry=0x7ffe562cec50, all_dead=all_dead@entry=0x0, 
first_call=true)
     at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam.c:1802
#14 0x000055ab775370d8 in heap_index_delete_tuples (rel=0x7fa57fbf17c8, 
delstate=<optimized out>)
     at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam.c:7480
#15 0x000055ab7755767a in table_index_delete_tuples 
(delstate=0x7ffe562d0160, rel=0x2)
     at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/include/access/tableam.h:1327
#16 _bt_delitems_delete_check (rel=rel@entry=0x7fa57fbf4518, 
buf=buf@entry=1916, heapRel=heapRel@entry=0x7fa57fbf17c8,
     delstate=delstate@entry=0x7ffe562d0160)
     at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtpage.c:1541
#17 0x000055ab7754e475 in _bt_bottomupdel_pass 
(rel=rel@entry=0x7fa57fbf4518, buf=buf@entry=1916, 
heapRel=heapRel@entry=0x7fa57fbf17c8, newitemsz=20)
     at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtdedup.c:406
#18 0x000055ab7755010a in _bt_delete_or_dedup_one_page 
(rel=0x7fa57fbf4518, heapRel=0x7fa57fbf17c8, insertstate=0x7ffe562d0640,
--Type <RET> for more, q to quit, c to continue without paging--
     simpleonly=<optimized out>, checkingunique=<optimized out>, 
uniquedup=<optimized out>, indexUnchanged=true)
     at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtinsert.c:2763
#19 0x000055ab775548c3 in _bt_findinsertloc (heapRel=0x7fa57fbf17c8, 
stack=0x55ab7a1cde08, indexUnchanged=true, checkingunique=true,
     insertstate=0x7ffe562d0640, rel=0x7fa57fbf4518)
     at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtinsert.c:904
#20 _bt_doinsert (rel=rel@entry=0x7fa57fbf4518, 
itup=itup@entry=0x55ab7a1cde30, 
checkUnique=checkUnique@entry=UNIQUE_CHECK_YES,
     indexUnchanged=indexUnchanged@entry=true, 
heapRel=heapRel@entry=0x7fa57fbf17c8)
     at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtinsert.c:255
#21 0x000055ab7755a931 in btinsert (rel=0x7fa57fbf4518, 
values=<optimized out>, isnull=<optimized out>, ht_ctid=0x55ab7a1cd068, 
heapRel=0x7fa57fbf17c8,
     checkUnique=UNIQUE_CHECK_YES, indexUnchanged=true, 
indexInfo=0x55ab7a1cdbd0)
     at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtree.c:199
#22 0x000055ab7769ae59 in ExecInsertIndexTuples 
(resultRelInfo=resultRelInfo@entry=0x55ab7a1c88b8, 
slot=slot@entry=0x55ab7a1cd038,
     estate=estate@entry=0x55ab7a1c8430, update=update@entry=true, 
noDupErr=noDupErr@entry=false, specConflict=specConflict@entry=0x0,
     arbiterIndexes=0x0) at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/executor/execIndexing.c:415


heap_lock_tuple will not wait transaction completion if 
HEAP_XMAX_COMMITTED is set in tuple and returns TM_Updated:

         /*
          * Time to sleep on the other transaction/multixact, if necessary.
          *
          * If the other transaction is an update/delete that's already
          * committed, then sleeping cannot possibly do any good: if we're
          * required to sleep, get out to raise an error instead.
...


The problem is not so easy to reproduce. I got it when performe commits 
of transaction with large number of subtransactions with several 
synchronous replicas.
But it is possible to artificially introduce delay between 
RecordTransactionCommit() and ProcArrayEndTransaction():

diff --git a/src/backend/access/transam/xact.c 
b/src/backend/access/transam/xact.c
index ca6f6d57d3..d095fd13b0 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2187,6 +2187,7 @@ CommitTransaction(void)
                  * durably commit.
                  */
                 latestXid = RecordTransactionCommit();
+               pg_usleep(random() % 100000);
         }
         else
         {




Then the problem can be reproduced with the following pgbebch script:

update.sql:
\set aid random(1, 1000)
select do_update(:aid,100)


update function:
create or replace function do_update(id integer, level integer) returns 
void as $$
begin
     begin
         if level > 0 then
             perform do_update(id, level-1);
         else
             update pgbench_accounts SET abalance = abalance + 1 WHERE 
aid = id;
         end if;
     exception WHEN OTHERS THEN
         raise notice '% %', SQLERRM, SQLSTATE;
     end;
end; $$ language plpgsql;


pgbench -i postgres
pgbench  -f update.sql -c 100 -T 100 -P 1 postgres

Re: Race condition in TransactionIdIsInProgress

От

Andrey Borodin

Дата:

10 февраля 2022 г., 20:33:56

> On 10 Feb 2022, at 21:46, Konstantin Knizhnik <knizhnik@garret.ru> wrote:
> As a result "t_xmin %u is uncommitted in tuple (%u,%u) to be updated in table"
> error is reported.

Wow, cool, that seem to be a solution to one more mysterious corruption thread - “reporting TID/table with corruption
error"[0]. Thank you! 

Now it’s obvious that this is not a real data corruption. Maybe let’s remove corruption error code from the error? I
hadbeen literally woken at night by this code few times in January. 

And do you have a plan how to fix the actual issue?

Best regards, Andrey Borodin.

[0] https://www.postgresql.org/message-id/flat/202108191637.oqyzrdtnheir%40alvherre.pgsql

Re: Race condition in TransactionIdIsInProgress

От

Alvaro Herrera

Дата:

10 февраля 2022 г., 22:33:15

On 2022-Feb-10, Andrey Borodin wrote:

> > On 10 Feb 2022, at 21:46, Konstantin Knizhnik <knizhnik@garret.ru> wrote:
> > As a result "t_xmin %u is uncommitted in tuple (%u,%u) to be updated in table"
> > error is reported.
> 
> Wow, cool, that seem to be a solution to one more mysterious
> corruption thread - “reporting TID/table with corruption error" [0].
> Thank you!

Ooh, great find.  I'll have a look at this soon.  Thanks for CCing me.

-- 
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/
"I love the Postgres community. It's all about doing things _properly_. :-)"
(David Garamond)

Re: Race condition in TransactionIdIsInProgress

От

Robert Haas

Дата:

10 февраля 2022 г., 23:21:27

On Thu, Feb 10, 2022 at 11:46 AM Konstantin Knizhnik <knizhnik@garret.ru> wrote:
> Postgres first records state transaction in CLOG, then removes
> transaction from procarray and finally releases locks.
> But it can happen that transaction is marked as committed in CLOG,
> XMAX_COMMITTED bit is set in modified tuple but
> TransactionIdIsInProgress still returns true. As a result "t_xmin %u is
> uncommitted in tuple (%u,%u) to be updated in table"
> error is reported.

This is exactly why the HeapTupleSatisfiesFOO() functions all test
TransactionIdIsInProgress() first and TransactionIdDidCommit() only if
it returns false. I don't think it's really legal to test
TransactionIdDidCommit() without checking TransactionIdIsInProgress()
first. And I bet that's exactly what this code is doing...

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Race condition in TransactionIdIsInProgress

От

Andres Freund

Дата:

11 февраля 2022 г., 08:56:09

Hi,

On 2022-02-10 15:21:27 -0500, Robert Haas wrote:
> On Thu, Feb 10, 2022 at 11:46 AM Konstantin Knizhnik <knizhnik@garret.ru> wrote:
> > Postgres first records state transaction in CLOG, then removes
> > transaction from procarray and finally releases locks.
> > But it can happen that transaction is marked as committed in CLOG,
> > XMAX_COMMITTED bit is set in modified tuple but
> > TransactionIdIsInProgress still returns true. As a result "t_xmin %u is
> > uncommitted in tuple (%u,%u) to be updated in table"
> > error is reported.
>
> This is exactly why the HeapTupleSatisfiesFOO() functions all test
> TransactionIdIsInProgress() first and TransactionIdDidCommit() only if
> it returns false. I don't think it's really legal to test
> TransactionIdDidCommit() without checking TransactionIdIsInProgress()
> first. And I bet that's exactly what this code is doing...

The problem is that the TransactionIdDidAbort() call is coming from within
TransactionIdIsInProgress().

If I understand correctly, the problem is that the xid cache stuff doesn't
correctly work for subtransactions and breaks TransactionIdIsInProgress().

For the problem to occur you need an xid that is for a subtransaction that
suboverflowed and that is 'clog committed' but not 'procarray committed'.

If that xid is tested twice with TransactionIdIsInProgress(), you will get a
bogus result on the second call within the same session.

The first TransactionIdIsInProgress() will go to
    /*
     * Step 4: have to check pg_subtrans.
     *
     * At this point, we know it's either a subtransaction of one of the Xids
     * in xids[], or it's not running.  If it's an already-failed
     * subtransaction, we want to say "not running" even though its parent may
     * still be running.  So first, check pg_xact to see if it's been aborted.
     */
    xc_slow_answer_inc();

    if (TransactionIdDidAbort(xid))
        return false;

and fetch that xid. The TransactionLogFetch() will set cachedFetchXid, because
the xid's status is TRANSACTION_STATUS_COMMITTED. Because the transaction
didn't abort TransactionIdIsInProgress() will go onto the the following loop,
see the toplevel xid is in progress in the procarray and return true.

The second call to TransactionIdIsInProgress() will see that the fetched
transaction matches the cached transactionid and return false.

    /*
     * We may have just checked the status of this transaction, so if it is
     * already known to be completed, we can fall out without any access to
     * shared memory.
     */
    if (TransactionIdIsKnownCompleted(xid))
    {
        xc_by_known_xact_inc();
        return false;
    }

Without any improper uses of TransactionIdDidAbort()/TransactionIdDidCommit()
before TransactionIdIsInProgress(). And other sessions will still return
true. And even *this* session can return true again, if another transaction is
checked that ends up caching another transaction status in cachedFetchXid.

It looks to me like the TransactionIdIsKnownCompleted() path in
TransactionIdIsInProgress() is just entirely broken. An prior
TransactionLogFetch() can't be allowed to make subsequent
TransactionIdIsInProgress() calls return wrong values.

If I understand the problem correctly, a SELECT
pg_xact_status(clog-committed-pgproc-in-progres) can make
TransactionIdIsInProgress() return wrong values too. It's not just
TransactionIdIsInProgress() itself.

Looks like this problem goes back all the way back to

commit 611b4393f22f2bb43135501cd6b7591299b6b453
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   2008-03-11 20:20:35 +0000

    Make TransactionIdIsInProgress check transam.c's single-item XID status cache
    before it goes groveling through the ProcArray.  In situations where the same
    recently-committed transaction ID is checked repeatedly by tqual.c, this saves
    a lot of shared-memory searches.  And it's cheap enough that it shouldn't
    hurt noticeably when it doesn't help.
    Concept and patch by Simon, some minor tweaking and comment-cleanup by Tom.

I think this may actually mean that the hot corruption problem fixed in

commit 18b87b201f7
Author: Andres Freund <andres@anarazel.de>
Date:   2021-12-10 20:12:26 -0800

    Fix possible HOT corruption when RECENTLY_DEAD changes to DEAD while pruning.

for 14/master is present in older branches too :(. Need to trace through the
HTSV and pruning logic with a bit more braincells than currently available to
be sure.

Greetings,

Andres Freund

Re: Race condition in TransactionIdIsInProgress

От

Andres Freund

Дата:

11 февраля 2022 г., 09:11:38

Hi,

On 2022-02-10 21:56:09 -0800, Andres Freund wrote:
> I think this may actually mean that the hot corruption problem fixed in
> 
> commit 18b87b201f7
> Author: Andres Freund <andres@anarazel.de>
> Date:   2021-12-10 20:12:26 -0800
> 
>     Fix possible HOT corruption when RECENTLY_DEAD changes to DEAD while pruning.
> 
> for 14/master is present in older branches too :(. Need to trace through the
> HTSV and pruning logic with a bit more braincells than currently available to
> be sure.

On second thought, there's probably sufficiently more direct corruption this
can cause than corruption via hot pruning. Not that that's not a problem, but
... Inconsistent TransactionIdIsInProgress() result can wreak havoc quite
broadly.

Looks lik syncrep will make this a lot worse, because it can drastically
increase the window between the TransactionIdCommitTree() and
ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween.  But at
least it might make it easier to write tests exercising this scenario...

Greetings,

Andres Freund

Re: Race condition in TransactionIdIsInProgress

От

Simon Riggs

Дата:

11 февраля 2022 г., 11:48:43

On Fri, 11 Feb 2022 at 06:11, Andres Freund <andres@anarazel.de> wrote:

> Looks lik syncrep will make this a lot worse, because it can drastically
> increase the window between the TransactionIdCommitTree() and
> ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween.  But at
> least it might make it easier to write tests exercising this scenario...

Agreed

TransactionIdIsKnownCompleted(xid) is only broken because the single
item cache is set too early in some cases. The single item cache is
important for performance, so we just need to be more careful about
setting the cache.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Race condition in TransactionIdIsInProgress

От

Simon Riggs

Дата:

11 февраля 2022 г., 16:48:59

On Fri, 11 Feb 2022 at 08:48, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>
> On Fri, 11 Feb 2022 at 06:11, Andres Freund <andres@anarazel.de> wrote:
>
> > Looks lik syncrep will make this a lot worse, because it can drastically
> > increase the window between the TransactionIdCommitTree() and
> > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween.  But at
> > least it might make it easier to write tests exercising this scenario...
>
> Agreed
>
> TransactionIdIsKnownCompleted(xid) is only broken because the single
> item cache is set too early in some cases. The single item cache is
> important for performance, so we just need to be more careful about
> setting the cache.

Something like this... fix_cachedFetchXid.v1.patch prevents the cache
being set, but this fails! Not worked out why, yet.

just_remove_TransactionIdIsKnownCompleted_call.v1.patch
just removes the known offending call, passes make check, but IMHO
leaves the same error just as likely by other callers.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Вложения

Re: Race condition in TransactionIdIsInProgress

От

Andres Freund

Дата:

11 февраля 2022 г., 22:08:48

Hi,

On 2022-02-11 13:48:59 +0000, Simon Riggs wrote:
> On Fri, 11 Feb 2022 at 08:48, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
> >
> > On Fri, 11 Feb 2022 at 06:11, Andres Freund <andres@anarazel.de> wrote:
> >
> > > Looks lik syncrep will make this a lot worse, because it can drastically
> > > increase the window between the TransactionIdCommitTree() and
> > > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween.  But at
> > > least it might make it easier to write tests exercising this scenario...
> >
> > Agreed
> >
> > TransactionIdIsKnownCompleted(xid) is only broken because the single
> > item cache is set too early in some cases. The single item cache is
> > important for performance, so we just need to be more careful about
> > setting the cache.
> 
> Something like this... fix_cachedFetchXid.v1.patch prevents the cache
> being set, but this fails! Not worked out why, yet.

I don't think it's safe to check !TransactionIdKnownNotInProgress() in
TransactionLogFetch(), given that TransactionIdKnownNotInProgress() needs to
do TransactionLogFetch() internally.

> just_remove_TransactionIdIsKnownCompleted_call.v1.patch

I think this might contain a different diff than what the name suggests?

> just removes the known offending call, passes make check, but IMHO
> leaves the same error just as likely by other callers.

Which other callers are you referring to?

To me it seems that the "real" reason behind this specific issue being
nontrivial to fix and behind the frequent error of calling
TransactionIdDidCommit() before TransactionIdIsInProgress() is
that we've really made a mess out of the transaction status determination code
/ API. IMO the original sin is requiring the complicated

if (TransactionIdIsCurrentTransactionId(xid)
...
else if (TransactionIdIsInProgress(xid)
...
else if (TransactionIdDidCommit(xid)
...

dance at pretty much any place checking transaction status. The multiple calls
for the same xid is, I think, what pushed the caching down to the
TransactionLogFetch level.

ISTM that we should introduce something like TransactionIdStatus(xid) that
returns
XACT_STATUS_CURRENT
XACT_STATUS_IN_PROGRESS
XACT_STATUS_COMMITTED
XACT_STATUS_ABORTED
accompanied by a TransactionIdStatusMVCC(xid, snapshot) that checks
XidInMVCCSnapshot() instead of TransactionIdIsInProgress().

I think that'd also make visibility checks faster - we spend a good chunk of
cycles doing unnecessary function calls and repeatedly gathering information.

It's also kind of weird that TransactionIdIsCurrentTransactionId() isn't more
optimized - TransactionIdIsInProgress() knows it doesn't need to check
anything before RecentXmin, but TransactionIdIsCurrentTransactionId()
doesn't. Nor does TransactionIdIsCurrentTransactionId() check if the xid is
smaller than GetTopTransactionIdIfAny() before bsearching through subxids.

Of course anything along these lines would never be backpatchable...

Greetings,

Andres Freund

Re: Race condition in TransactionIdIsInProgress

От

Andres Freund

Дата:

12 февраля 2022 г., 03:41:24

Hi,

On 2022-02-10 22:11:38 -0800, Andres Freund wrote:
> Looks lik syncrep will make this a lot worse, because it can drastically
> increase the window between the TransactionIdCommitTree() and
> ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween.  But at
> least it might make it easier to write tests exercising this scenario...

FWIW, I've indeed reproduced this fairly easily with such a setup. A pgbench
r/w workload that's been modified to start 70 savepoints at the start shows

pgbench: error: client 22 script 0 aborted in command 12 query 0: ERROR:  t_xmin 3853739 is uncommitted in tuple
(2,159)to be updated in table "pgbench_branches"
 
pgbench: error: client 13 script 0 aborted in command 12 query 0: ERROR:  t_xmin 3954305 is uncommitted in tuple (2,58)
tobe updated in table "pgbench_branches"
 
pgbench: error: client 7 script 0 aborted in command 12 query 0: ERROR:  t_xmin 4017908 is uncommitted in tuple (3,44)
tobe updated in table "pgbench_branches"
 

after a few minutes of running with a local, not slowed down, syncrep. Without
any other artifical slowdowns or such.

Greetings,

Andres Freund

Re: Race condition in TransactionIdIsInProgress

От

Andres Freund

Дата:

12 февраля 2022 г., 06:42:31

Hi,

On 2022-02-11 16:41:24 -0800, Andres Freund wrote:
> FWIW, I've indeed reproduced this fairly easily with such a setup. A pgbench
> r/w workload that's been modified to start 70 savepoints at the start shows
> 
> pgbench: error: client 22 script 0 aborted in command 12 query 0: ERROR:  t_xmin 3853739 is uncommitted in tuple
(2,159)to be updated in table "pgbench_branches"

> pgbench: error: client 13 script 0 aborted in command 12 query 0: ERROR:  t_xmin 3954305 is uncommitted in tuple
(2,58)to be updated in table "pgbench_branches"

> pgbench: error: client 7 script 0 aborted in command 12 query 0: ERROR:  t_xmin 4017908 is uncommitted in tuple
(3,44)to be updated in table "pgbench_branches"

> 
> after a few minutes of running with a local, not slowed down, syncrep. Without
> any other artifical slowdowns or such.

And this can easily be triggered even without subtransactions, in a completely
reliable way.

The only reason I'm so far not succeeding in turning it into an
isolationtester spec is that a transaction waiting for SyncRep doesn't count
as waiting for isolationtester.

Basically

S1: BEGIN; $xid = txid_current(); UPDATE; COMMIT; <commit wait for syncrep>
S2: SELECT pg_xact_status($xid);
S2: UPDATE;

suffices, because the pg_xact_status() causes an xlog fetch, priming the xid
cache, which then causes the TransactionIdIsInProgress() to take the early
return path, despite the transaction still being in progress. Which then
allows the update to proceed, despite the S1 not having "properly committed"
yet.

Greetings,

Andres Freund

Re: Race condition in TransactionIdIsInProgress

От

Simon Riggs

Дата:

12 февраля 2022 г., 16:25:58

On Fri, 11 Feb 2022 at 19:08, Andres Freund <andres@anarazel.de> wrote:

> On 2022-02-11 13:48:59 +0000, Simon Riggs wrote:
> > On Fri, 11 Feb 2022 at 08:48, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
> > >
> > > On Fri, 11 Feb 2022 at 06:11, Andres Freund <andres@anarazel.de> wrote:
> > >
> > > > Looks lik syncrep will make this a lot worse, because it can drastically
> > > > increase the window between the TransactionIdCommitTree() and
> > > > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween.  But at
> > > > least it might make it easier to write tests exercising this scenario...
> > >
> > > Agreed
> > >
> > > TransactionIdIsKnownCompleted(xid) is only broken because the single
> > > item cache is set too early in some cases. The single item cache is
> > > important for performance, so we just need to be more careful about
> > > setting the cache.
> >
> > Something like this... fix_cachedFetchXid.v1.patch prevents the cache
> > being set, but this fails! Not worked out why, yet.
>
> I don't think it's safe to check !TransactionIdKnownNotInProgress() in
> TransactionLogFetch(), given that TransactionIdKnownNotInProgress() needs to
> do TransactionLogFetch() internally.

That's not correct because you're confusing
TransactionIdKnownNotInProgress(), which is a new function introduced
by the patch, with the existing function
TransactionIdIsKnownCompleted().


> > just_remove_TransactionIdIsKnownCompleted_call.v1.patch
>
> I think this might contain a different diff than what the name suggests?

Not at all, please don't be confused by the name. The patch removes
the call to TransactionIdIsKnownCompleted() from
TransactionIdIsInProgress().

I'm not sure it is possible to remove TransactionIdIsKnownCompleted()
in backbranches.


> > just removes the known offending call, passes make check, but IMHO
> > leaves the same error just as likely by other callers.
>
> Which other callers are you referring to?

I don't know of any, but we can't just remove a function in a
backbranch, can we?


> To me it seems that the "real" reason behind this specific issue being
> nontrivial to fix and behind the frequent error of calling
> TransactionIdDidCommit() before TransactionIdIsInProgress() is
> that we've really made a mess out of the transaction status determination code
> / API. IMO the original sin is requiring the complicated
>
> if (TransactionIdIsCurrentTransactionId(xid)
> ...
> else if (TransactionIdIsInProgress(xid)
> ...
> else if (TransactionIdDidCommit(xid)
> ...
>
> dance at pretty much any place checking transaction status.

Agreed, it's pretty weird to have to call the functions in the right
order or you get the wrong answer. Especially since we have no
cross-check to ensure the correct sequence was followed.


> The multiple calls
> for the same xid is, I think, what pushed the caching down to the
> TransactionLogFetch level.
>
> ISTM that we should introduce something like TransactionIdStatus(xid) that
> returns
> XACT_STATUS_CURRENT
> XACT_STATUS_IN_PROGRESS
> XACT_STATUS_COMMITTED
> XACT_STATUS_ABORTED
> accompanied by a TransactionIdStatusMVCC(xid, snapshot) that checks
> XidInMVCCSnapshot() instead of TransactionIdIsInProgress().
>
> I think that'd also make visibility checks faster - we spend a good chunk of
> cycles doing unnecessary function calls and repeatedly gathering information.
>
>
> It's also kind of weird that TransactionIdIsCurrentTransactionId() isn't more
> optimized - TransactionIdIsInProgress() knows it doesn't need to check
> anything before RecentXmin, but TransactionIdIsCurrentTransactionId()
> doesn't. Nor does TransactionIdIsCurrentTransactionId() check if the xid is
> smaller than GetTopTransactionIdIfAny() before bsearching through subxids.
>
> Of course anything along these lines would never be backpatchable...

Agreed

I've presented a simple patch intended for backpatching. You may not
like the name, but please have another look at
"just_remove_TransactionIdIsKnownCompleted_call.v1.patch".
I believe it is backpatchable with minimal impact and without loss of
performance.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Race condition in TransactionIdIsInProgress

От

Andres Freund

Дата:

13 февраля 2022 г., 00:50:35

Hi,

On 2022-02-12 13:25:58 +0000, Simon Riggs wrote:
> On Fri, 11 Feb 2022 at 19:08, Andres Freund <andres@anarazel.de> wrote:
> 
> > On 2022-02-11 13:48:59 +0000, Simon Riggs wrote:
> > > On Fri, 11 Feb 2022 at 08:48, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
> > > >
> > > > On Fri, 11 Feb 2022 at 06:11, Andres Freund <andres@anarazel.de> wrote:
> > > >
> > > > > Looks lik syncrep will make this a lot worse, because it can drastically
> > > > > increase the window between the TransactionIdCommitTree() and
> > > > > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween.  But at
> > > > > least it might make it easier to write tests exercising this scenario...
> > > >
> > > > Agreed
> > > >
> > > > TransactionIdIsKnownCompleted(xid) is only broken because the single
> > > > item cache is set too early in some cases. The single item cache is
> > > > important for performance, so we just need to be more careful about
> > > > setting the cache.
> > >
> > > Something like this... fix_cachedFetchXid.v1.patch prevents the cache
> > > being set, but this fails! Not worked out why, yet.
> >
> > I don't think it's safe to check !TransactionIdKnownNotInProgress() in
> > TransactionLogFetch(), given that TransactionIdKnownNotInProgress() needs to
> > do TransactionLogFetch() internally.
> 
> That's not correct because you're confusing
> TransactionIdKnownNotInProgress(), which is a new function introduced
> by the patch, with the existing function
> TransactionIdIsKnownCompleted().

I don't think so. This call of the new TransactionIdKnownNotInProgress()

> --- a/src/backend/access/transam/transam.c
> +++ b/src/backend/access/transam/transam.c
> @@ -73,6 +73,14 @@ TransactionLogFetch(TransactionId transactionId)
>          return TRANSACTION_STATUS_ABORTED;
>      }
>  
> +    /*
> +     * Safeguard that we have called TransactionIsIdInProgress() before
> +     * checking commit log manager, to ensure that we do not cache the
> +     * result until the xid is no longer in the procarray at eoxact.
> +     */
> +    if (!TransactionIdKnownNotInProgress(transactionId))
> +        return TRANSACTION_STATUS_IN_PROGRESS;
> +
>      /*
>       * Get the transaction status.
>       */

isn't right. Consider what happens to TransactionIdIsInProgress(x)'s call to
TransactionIdDidAbort(x) at "step 4". TransactionIdDidAbort(x) will call
TransactionLogFetch(x). cachedXidIsNotInProgress isn't yet set to x, so
TransactionLogFetch() returns TRANSACTION_STATUS_IN_PROGRESS. Even if the
(sub)transaction aborted.

Which explains why your first patch doesn't work.


> > > just_remove_TransactionIdIsKnownCompleted_call.v1.patch
> >
> > I think this might contain a different diff than what the name suggests?
> 
> Not at all, please don't be confused by the name. The patch removes
> the call to TransactionIdIsKnownCompleted() from
> TransactionIdIsInProgress().

I guess for me "just remove" doesn't include adding a new cache...


> I'm not sure it is possible to remove TransactionIdIsKnownCompleted()
> in backbranches.

I think it'd be fine if we needed to. Or we could just make it always return
false or such.


> > > just removes the known offending call, passes make check, but IMHO
> > > leaves the same error just as likely by other callers.
> >
> > Which other callers are you referring to?
> 
> I don't know of any, but we can't just remove a function in a
> backbranch, can we?

We have in the past, if it's a "sufficiently internal" function.


Greetings,

Andres Freund

Re: Race condition in TransactionIdIsInProgress

От

Heikki Linnakangas

Дата:

23 июня 2022 г., 22:03:27

On 12/02/2022 05:42, Andres Freund wrote:
> On 2022-02-11 16:41:24 -0800, Andres Freund wrote:
>> FWIW, I've indeed reproduced this fairly easily with such a setup. A pgbench
>> r/w workload that's been modified to start 70 savepoints at the start shows
>>
>> pgbench: error: client 22 script 0 aborted in command 12 query 0: ERROR:  t_xmin 3853739 is uncommitted in tuple
(2,159)to be updated in table "pgbench_branches"

>> pgbench: error: client 13 script 0 aborted in command 12 query 0: ERROR:  t_xmin 3954305 is uncommitted in tuple
(2,58)to be updated in table "pgbench_branches"

>> pgbench: error: client 7 script 0 aborted in command 12 query 0: ERROR:  t_xmin 4017908 is uncommitted in tuple
(3,44)to be updated in table "pgbench_branches"

>>
>> after a few minutes of running with a local, not slowed down, syncrep. Without
>> any other artifical slowdowns or such.
> 
> And this can easily be triggered even without subtransactions, in a completely
> reliable way.
> 
> The only reason I'm so far not succeeding in turning it into an
> isolationtester spec is that a transaction waiting for SyncRep doesn't count
> as waiting for isolationtester.
> 
> Basically
> 
> S1: BEGIN; $xid = txid_current(); UPDATE; COMMIT; <commit wait for syncrep>
> S2: SELECT pg_xact_status($xid);
> S2: UPDATE;
> 
> suffices, because the pg_xact_status() causes an xlog fetch, priming the xid
> cache, which then causes the TransactionIdIsInProgress() to take the early
> return path, despite the transaction still being in progress. Which then
> allows the update to proceed, despite the S1 not having "properly committed"
> yet.

I started to improve isolationtester, to allow the spec file to specify 
a custom query to check for whether the backend is blocked. But then I 
realized that it's not quite enough: there is currently no way write the 
   "$xid = txid_current()" step either. Isolationtester doesn't have 
variables like that. Also, you need to set synchronous_standby_names to 
make it block, which seems a bit weird in an isolation test.

I wish we had an explicit way to inject delays or failures. We discussed 
it before 
(https://www.postgresql.org/message-id/flat/CANXE4TdxdESX1jKw48xet-5GvBFVSq%3D4cgNeioTQff372KO45A%40mail.gmail.com), 
but I don't want to pick up that task right now.

Anyway, I wrote a TAP test for this instead. See attached. I'm not sure 
if this is worth committing, the pg_xact_status() trick is quite special 
for this particular bug.

Simon's just_remove_TransactionIdIsKnownCompleted_call.v1.patch looks 
good to me. Replying to the discussion on that:

On 12/02/2022 23:50, Andres Freund wrote:
> On 2022-02-12 13:25:58 +0000, Simon Riggs wrote:
>> I'm not sure it is possible to remove TransactionIdIsKnownCompleted()
>> in backbranches.
> 
> I think it'd be fine if we needed to. Or we could just make it always return
> false or such.
> 
> 
>>>> just removes the known offending call, passes make check, but IMHO
>>>> leaves the same error just as likely by other callers.
>>>
>>> Which other callers are you referring to?
>>
>> I don't know of any, but we can't just remove a function in a
>> backbranch, can we?
> 
> We have in the past, if it's a "sufficiently internal" function.

I think we should remove it in v15, and leave it as it is in 
backbranches. Just add a comment to point out that the name is a bit 
misleading, because it checks the clog rather than the proc array.  It's 
not inherently dangerous, and someone might have a legit use case for 
it. True, someone might also be doing this incorrect thing with it, but 
still seems like the right balance to me.

I think we also need to change pg_xact_status(), to also call 
TransactionIdIsInProgress() before TransactionIdDidCommit/Abort(). 
Currently, if it returns "committed", and you start a new transaction, 
the new transaction might not yet see the effects of that "committed" 
transaction. With that, you cannot reproduce the original bug with 
pg_xact_status() though, so there's no point in committing that test then.

In summary, I think we should:
- commit and backpatch Simon's 
just_remove_TransactionIdIsKnownCompleted_call.v1.patch
- fix pg_xact_status() to check TransactionIdIsInProgress()
- in v15, remove TransationIdIsKnownCompleted function altogether

I'll try to get around to that in the next few days, unless someone 
beats me to it.

- Heikki

Вложения

add-test-for-xid-cache-bug-1.patch

Re: Race condition in TransactionIdIsInProgress

От

Andres Freund

Дата:

24 июня 2022 г., 04:43:07

Hi,

On 2022-06-23 22:03:27 +0300, Heikki Linnakangas wrote:
> On 12/02/2022 05:42, Andres Freund wrote:
> > The only reason I'm so far not succeeding in turning it into an
> > isolationtester spec is that a transaction waiting for SyncRep doesn't count
> > as waiting for isolationtester.
> >
> > Basically
> >
> > S1: BEGIN; $xid = txid_current(); UPDATE; COMMIT; <commit wait for syncrep>
> > S2: SELECT pg_xact_status($xid);
> > S2: UPDATE;
> >
> > suffices, because the pg_xact_status() causes an xlog fetch, priming the xid
> > cache, which then causes the TransactionIdIsInProgress() to take the early
> > return path, despite the transaction still being in progress. Which then
> > allows the update to proceed, despite the S1 not having "properly committed"
> > yet.
>
> I started to improve isolationtester, to allow the spec file to specify a
> custom query to check for whether the backend is blocked. But then I
> realized that it's not quite enough: there is currently no way write the
> "$xid = txid_current()" step either. Isolationtester doesn't have variables
> like that. Also, you need to set synchronous_standby_names to make it block,
> which seems a bit weird in an isolation test.

I don't think we strictly need $xid - it likely can be implemented via
something like
  SELECT pg_xact_status(backend_xid) FROM pg_stat_activity WHERE application_name = 'testname/s1';


> I wish we had an explicit way to inject delays or failures. We discussed it
> before
(https://www.postgresql.org/message-id/flat/CANXE4TdxdESX1jKw48xet-5GvBFVSq%3D4cgNeioTQff372KO45A%40mail.gmail.com),
> but I don't want to pick up that task right now.

Understandable :(


> Anyway, I wrote a TAP test for this instead. See attached. I'm not sure if
> this is worth committing, the pg_xact_status() trick is quite special for
> this particular bug.

Yea, not sure either.


> Simon's just_remove_TransactionIdIsKnownCompleted_call.v1.patch looks good
> to me. Replying to the discussion on that:
>
> On 12/02/2022 23:50, Andres Freund wrote:
> > On 2022-02-12 13:25:58 +0000, Simon Riggs wrote:
> > > I'm not sure it is possible to remove TransactionIdIsKnownCompleted()
> > > in backbranches.
> >
> > I think it'd be fine if we needed to. Or we could just make it always return
> > false or such.
> >
> >
> > > > > just removes the known offending call, passes make check, but IMHO
> > > > > leaves the same error just as likely by other callers.
> > > >
> > > > Which other callers are you referring to?
> > >
> > > I don't know of any, but we can't just remove a function in a
> > > backbranch, can we?
> >
> > We have in the past, if it's a "sufficiently internal" function.
>
> I think we should remove it in v15, and leave it as it is in backbranches.
> Just add a comment to point out that the name is a bit misleading, because
> it checks the clog rather than the proc array.  It's not inherently
> dangerous, and someone might have a legit use case for it. True, someone
> might also be doing this incorrect thing with it, but still seems like the
> right balance to me.
>
> I think we also need to change pg_xact_status(), to also call
> TransactionIdIsInProgress() before TransactionIdDidCommit/Abort().
> Currently, if it returns "committed", and you start a new transaction, the
> new transaction might not yet see the effects of that "committed"
> transaction. With that, you cannot reproduce the original bug with
> pg_xact_status() though, so there's no point in committing that test then.
>
> In summary, I think we should:
> - commit and backpatch Simon's
> just_remove_TransactionIdIsKnownCompleted_call.v1.patch
> - fix pg_xact_status() to check TransactionIdIsInProgress()
> - in v15, remove TransationIdIsKnownCompleted function altogether
>
> I'll try to get around to that in the next few days, unless someone beats me
> to it.

Makes sense.

Greetings,

Andres Freund

Re: Race condition in TransactionIdIsInProgress

От

Heikki Linnakangas

Дата:

25 июня 2022 г., 12:18:51

On 24/06/2022 04:43, Andres Freund wrote:
> On 2022-06-23 22:03:27 +0300, Heikki Linnakangas wrote:
>> In summary, I think we should:
>> - commit and backpatch Simon's
>> just_remove_TransactionIdIsKnownCompleted_call.v1.patch
>> - fix pg_xact_status() to check TransactionIdIsInProgress()
>> - in v15, remove TransationIdIsKnownCompleted function altogether
>>
>> I'll try to get around to that in the next few days, unless someone beats me
>> to it.
> 
> Makes sense.

This is what I came up with for master. One difference from Simon's 
original patch is that I decided to not expose the 
TransactionIdIsKnownNotInProgress(), as there are no other callers of it 
in core, and it doesn't seem useful to extensions. I inlined it into the 
caller instead.

BTW, should we worry about XID wraparound with the cache? Could you have 
a backend sit idle for 2^32 transactions, without making any 
TransactionIdIsKnownNotInProgress() calls? That's not new with this 
patch, though, it could happen with the single-item cache in transam.c, too.

- Heikki

Вложения

0001-Fix-visibility-check-when-XID-is-committed-in-CLOG-b.patch

Re: Race condition in TransactionIdIsInProgress

От

Simon Riggs

Дата:

25 июня 2022 г., 13:10:22

On Sat, 25 Jun 2022 at 10:18, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 24/06/2022 04:43, Andres Freund wrote:
> > On 2022-06-23 22:03:27 +0300, Heikki Linnakangas wrote:
> >> In summary, I think we should:
> >> - commit and backpatch Simon's
> >> just_remove_TransactionIdIsKnownCompleted_call.v1.patch
> >> - fix pg_xact_status() to check TransactionIdIsInProgress()
> >> - in v15, remove TransationIdIsKnownCompleted function altogether
> >>
> >> I'll try to get around to that in the next few days, unless someone beats me
> >> to it.
> >
> > Makes sense.
>
> This is what I came up with for master. One difference from Simon's
> original patch is that I decided to not expose the
> TransactionIdIsKnownNotInProgress(), as there are no other callers of it
> in core, and it doesn't seem useful to extensions. I inlined it into the
> caller instead.

Looks good, thanks.

> BTW, should we worry about XID wraparound with the cache? Could you have
> a backend sit idle for 2^32 transactions, without making any
> TransactionIdIsKnownNotInProgress() calls? That's not new with this
> patch, though, it could happen with the single-item cache in transam.c, too.

Yes, as a separate patch only in PG15+, imho.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Race condition in TransactionIdIsInProgress

От

Heikki Linnakangas

Дата:

27 июня 2022 г., 08:26:22

On 25/06/2022 13:10, Simon Riggs wrote:
> On Sat, 25 Jun 2022 at 10:18, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>
>> On 24/06/2022 04:43, Andres Freund wrote:
>>> On 2022-06-23 22:03:27 +0300, Heikki Linnakangas wrote:
>>>> In summary, I think we should:
>>>> - commit and backpatch Simon's
>>>> just_remove_TransactionIdIsKnownCompleted_call.v1.patch
>>>> - fix pg_xact_status() to check TransactionIdIsInProgress()
>>>> - in v15, remove TransationIdIsKnownCompleted function altogether
>>>>
>>>> I'll try to get around to that in the next few days, unless someone beats me
>>>> to it.
>>>
>>> Makes sense.
>>
>> This is what I came up with for master. One difference from Simon's
>> original patch is that I decided to not expose the
>> TransactionIdIsKnownNotInProgress(), as there are no other callers of it
>> in core, and it doesn't seem useful to extensions. I inlined it into the
>> caller instead.
> 
> Looks good, thanks.

Committed and backpatched. Thanks!

- Heikki

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Race condition in TransactionIdIsInProgress

Вложения

Вложения

Вложения