Обсуждение: Nested transactions RFC
Hi, if it is acceptable for subtransactions to use up transaction numbers, then here is a half baked RFC for a possible implementation. If not, forget the rest of this message. The proposed implementation works much like the current transaction handling. It needs an additional system table pg_subtrans (child XactId PRIMARY KEY, parent XactId). BEGIN; -- starts a new (top level) transaction, say 100 INSERT row1; -- row1.xmin = 100 DELETE row2; -- row2.xmax = 100 BEGIN; -- starts a subtransaction, let's call it 200, -- stores 100 on the parent transaction stack -- (a localmemory structure), -- inserts (200, 100) into pg_subtrans INSERT row3; -- row3.xmin = 200, row3.XMIN_IS_SUB = true DELETE row4; -- row4.xmax = 200, row4.XMAX_IS_SUB = true COMMIT; -- resets CurrentTransaction to 100 (pop from xact stack), -- does *NOT* mark T200 as committed BEGIN; -- starts a subtransaction, let's call it 300, -- pushes 100 on the parent transaction stack, -- inserts(300, 100) into pg_subtrans BEGIN; -- starts a 3rd level subtransaction (400), -- pushes 300 on the parent transaction stack, -- inserts(400, 300) into pg_subtrans ... COMMIT; -- resets CurrentTransaction to 300 (transaction stack), -- does NOT mark T400 as committed INSERT row5; -- row5.xmin = 300, row5.XMIN_IS_SUB = true DELETE row6; -- row6.xmax = 300, row6.XMAX_IS_SUB = true ROLLBACK; -- resets CurrentTransaction to 100 (transaction stack), -- optionally removes (300, 100) from pg_subtrans, -- marks T300 as aborted COMMIT; -- marks T100 as committed or ROLLBACK; -- marks T100 as aborted Visibility: ----------- The checks for xmin and xmax are very similar. We look at xmin here: Traditionally a tuple is visible, if xmin has committed before the current snapshot was taken, or if xmin == CurrentTransaction(). A subtransaction is considered aborted, if it is marked aborted. Else it is considered to be in the same state as its parent transaction (which again can be a subtransaction). The effects of tup.xmin are considered visible, if ... (This is not a formal specification. It shall only illustrate the difference to the existing implementation of HeapTupleSatisfiesXxx() in tqual.c) if (tup.XMIN_ABORTED) // flag set by prior visitor return false; if (tup.XMIN_COMMITTED) // flag set by prior visitor return true; // xmin neither known aborted nor known committed, // could be active // or finished and tup not yet visited for(xmin = tup.xmin; IsValid(xmin); xmin = GetParentXact(xmin)) { if (TransactionIdDidAbort(xmin)) { tup.XMIN_ABORTED= true; return false; }/*if*/ if (IsCurrentTransaction(xmin)) { // tup.xmin is one of my own subtransactions, // it is alreadycommitted. So tup can be // considered belonging to the current transaction. tup.xmin = xmin; tup.XMIN_IS_SUB = CurrentTransactionIsSub(); return true; // or rather check cmin ... }/*if*/ if (TransactionIdDidCommit(xmin)) { // xmin is a top level transaction tup.xmin =xmin; tup.XMIN_IS_SUB = false; tup.XMIN_COMMITTED = true; return true; }/*if*/ if (!tup.XMIN_IS_SUB) { // Don't try expensive GetParentXact() break; }/*if*/ }/*for*/ // tup.xmin still active return false; TransactionId GetParentXact(TransactionId xnum) uses pg_subtrans to find the parent transaction of xnum. It returns InvalidTransaction, if it doesn't find one. Performance: ------------ . Zero overhead, if nested transactions are not used. . BEGIN SUB has to insert a pair of TransactionIds into pg_subtrans. Apart from that it is not slower than BEGIN top level transaction. . COMMIT SUB is faster than COMMIT. . ROLLBACK SUB is much like ROLLBACK, plus (optionally) deletes one entry from pg_subtrans. . COMMIT and ROLLBACK of top level transactions don't care about subtransactions. . Access a tuple inserted/deleted by a subtransaction: Zero overhead, if the subtransaction has been rolled back, otherwise the parent transaction has to be looked up in pg_subtrans (possibly recursive). This price has to be paid only once per tuple (well, once for xmin and once for xmax). More accurate: "once after the inserting/deleting top level transaction has finished". Problems: --------- . pg_subtrans grows by 8 bytes per subtransaction. . Other pitfalls??? Administration: --------------- As soon as a top level transaction has finished, its subtransaction ids are replaced by the top level transaction id on the next access to each tuple. VACUUM (*not* VACUUM tablename) removes old entries from pg_subtrans. An entry is old, if the parent transaction has finished, before VACUUM started. Challenge: ---------- For heavy use of subtransactions there has to be a really fast implementation of pg_subtrans, maybe something like a b-tree. AFAICS small WAL changes: pg_subtrans inserts (and deletes?) have to be logged. Everything else is straightforward. Comments? ServusManfred
Manfred Koizar <mkoi-pg@aon.at> writes: > TransactionId GetParentXact(TransactionId xnum) uses pg_subtrans to > find the parent transaction of xnum. This is not only extremely expensive, but in practice would cause infinite recursion: any attempt to validate the commit state of a row in pg_subtrans would result in a recursive attempt to search pg_subtrans. I don't think we can do table access from inside the tqual.c routines. A practical implementation, which would cost little except tuple header space (and yes I know that won't make you happy) would require 3 fields instead of 2 for both the min and the max:transaction IDsubtransaction IDcommand ID First check the transaction ID: if aborted or (in-progress and not mine), tuple is not visible. Next, if the subtransaction ID is not zero, similarly check it. Finally, if xid and sub-xid are both mine, the command ID has to be checked. In this scenario, subtransactions commit or abort by marking their pg_clog entries, but no one else will care until the parent transaction commits. So there is no extra state anywhere except for the stack of active transaction numbers inside each backend. Possibly we could use techniques similar to what you already suggested for cmin/cmax to reduce the amount of physical storage needed for the six logical fields involved. regards, tom lane PS: unfortunately, tuple validity checking is only a small part of what has to be done to support subtransactions. The really nasty part is in fixing error recovery inside the backend so that (most) errors can be dealt with by aborting only the innermost subtransaction.
Tom, reading my message again and your response, I see, that some points were a bit unclear. On Fri, 10 May 2002 13:12:21 +0200, I wrote: |if it is acceptable for subtransactions to use up transaction numbers, Of course, "use up" is nonsense, as it sounds like "use all available"; this should have been "use" or "draw from the pool of". Should have listened better to my English teacher :-) On Sat, 11 May 2002 11:51:37 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Manfred Koizar <mkoi-pg@aon.at> writes: >> TransactionId GetParentXact(TransactionId xnum) uses pg_subtrans to >> find the parent transaction of xnum. > >This is not only extremely expensive, but in practice would cause >infinite recursion: any attempt to validate the commit state of a >row in pg_subtrans would result in a recursive attempt to search >pg_subtrans. I don't think we can do table access from inside the >tqual.c routines. I wrote: |It needs an additional system table |pg_subtrans (child XactId PRIMARY KEY, parent XactId). But no! "Table" is not the correct word for what I mean. I rather want something living outside transactions and not accessed via normal SQL statements. It is to be handled by highly specialized and optimized routines, because fast access is crucial for the whole proposal. That's why I called "a really fast implementation of pg_subtrans" a challenge. I had pg_clog in mind, but didn't find the right words. >A practical implementation, which would cost little except tuple header >space (and yes I know that won't make you happy) would require 3 fields :-) >instead of 2 for both the min and the max: > transaction ID > subtransaction ID > command ID This was my first attempt. I've dismissed it for several reasons. >First check the transaction ID: if aborted or (in-progress and not >mine), tuple is not visible. I agree up to here. >Next, if the subtransaction ID is not >zero, similarly check it. Now imagine BEGIN 1; BEGIN 2; BEGIN 3; INSERT tup3; COMMIT 3; ROLLBACK 2; COMMIT 1; Then in tup3 we would have xid==1 and subxid==3, both of which are committed, but nevertheless tup3 is invisible, because xact 2 aborted. >Finally, if xid and sub-xid are both mine, >the command ID has to be checked. > >In this scenario, subtransactions commit or abort by marking their >pg_clog entries, but no one else will care until the parent transaction >commits. So there is no extra state anywhere except for the stack >of active transaction numbers inside each backend. A *stack* of _active_ transaction numbers is not sufficient, we need the whole *tree* of _all_ transactions belonging to the current top level transaction. This is, want I wanted to model in my pg_subtrans "table". And pg_subtrans cannot be a private structure, because it has to be inspected by other transactions too (cf. example above). >PS: unfortunately, tuple validity checking is only a small part of what >has to be done to support subtransactions. The really nasty part is >in fixing error recovery inside the backend so that (most) errors can >be dealt with by aborting only the innermost subtransaction. Is this really related to subtransactions? The current behaviour is, that an error not only aborts the offending command, but the whole (top level) transaction. My proposal doesn't change anything regarding this. Though I agree it would be desirable to have finer grained error handling. You have quoted only small parts of my posting. Do you agree to the rest? Or didn't you bother to comment, because you considered the whole proposal refuted by your counter-arguments? I'll be fine either way, I just want to know. BTW, there's something missing from my visibility checks: | if (IsCurrentTransaction(xmin)) { here we have to add "or xmin is one of my (grand)*parents". And of course, it would be nice to have named savepoints: BEGIN; BEGIN foo; BEGIN bar; ... ROLLBACK foo; COMMIT; -- top level transaction ServusManfred
Manfred Koizar <mkoi-pg@aon.at> writes: > A *stack* of _active_ transaction numbers is not sufficient, we need > the whole *tree* of _all_ transactions belonging to the current top > level transaction. This is, want I wanted to model in my pg_subtrans > "table". And pg_subtrans cannot be a private structure, because it > has to be inspected by other transactions too (cf. example above). Hmm. This seems to me to be vastly overdesigning the feature. I've never yet seen a practical application for more than one level of subtransaction, so I question whether we should buy into a substantially more complex implementation to support the more general case. > Is this really related to subtransactions? The current behaviour is, > that an error not only aborts the offending command, but the whole > (top level) transaction. My proposal doesn't change anything > regarding this. Every single application that I've seen for subtransactions is all about error recovery. If we don't fix that then there's no point. > You have quoted only small parts of my posting. I don't believe in quoting whole postings, only enough to remind people what it was I'm responding to. regards, tom lane
On Sun, 12 May 2002, Tom Lane wrote: > Manfred Koizar <mkoi-pg@aon.at> writes: > > A *stack* of _active_ transaction numbers is not sufficient, we need > > the whole *tree* of _all_ transactions belonging to the current top > > level transaction. This is, want I wanted to model in my pg_subtrans > > "table". And pg_subtrans cannot be a private structure, because it > > has to be inspected by other transactions too (cf. example above). > > Hmm. This seems to me to be vastly overdesigning the feature. I've > never yet seen a practical application for more than one level of > subtransaction, so I question whether we should buy into a substantially > more complex implementation to support the more general case. I think it'd depend on how pervasive the feature is going to be. If we allow functions/rules/etc to start subtransactions I'm not sure it'd be safe to say that only one level is safe since you might not know that your subtransaction calls something that wants to start a subtransaction, but you'd probably expect that anything it does would be undone when you rollback your subtransaction, just like it would if the items weren't in a subtransaction.
Tom Lane wrote: > > Manfred Koizar <mkoi-pg@aon.at> writes: > > A *stack* of _active_ transaction numbers is not sufficient, we need > > the whole *tree* of _all_ transactions belonging to the current top > > level transaction. This is, want I wanted to model in my pg_subtrans > > "table". And pg_subtrans cannot be a private structure, because it > > has to be inspected by other transactions too (cf. example above). > > Hmm. This seems to me to be vastly overdesigning the feature. I've > never yet seen a practical application for more than one level of > subtransaction, so I question whether we should buy into a substantially > more complex implementation to support the more general case. I'm for Manfred with this point. I would never suppose that nested transactions supports only 1 level. > > > Is this really related to subtransactions? The current behaviour is, > > that an error not only aborts the offending command, but the whole > > (top level) transaction. My proposal doesn't change anything > > regarding this. > > Every single application that I've seen for subtransactions is all about > error recovery. If we don't fix that then there's no point. The problem exists with any implementation. Though tuple validity checking may be only a small part (I don't think so), I've never seen such proposal other than Manfred's one. If I remember correctly, savepoints functionality was planned for 7.0 but probably we wouldn't have it in 7.3. The TODO may be a TODO for ever unless the direction to solve the TODO is decided. 1) without UNDO for individual tuples. 2) with UNDO for individual tuples under no overwriting smgr. 3) UNDO under overwriting smgr. regards, Hiroshi Inouehttp://w2422.nsk.ne.jp/~inoue/