Re: [HACKERS] logical decoding of two-phase transactions
От | Tomas Vondra |
---|---|
Тема | Re: [HACKERS] logical decoding of two-phase transactions |
Дата | |
Msg-id | dd5a9cb7-bb10-cb5c-834e-08dc3ffc057f@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] logical decoding of two-phase transactions (Nikhil Sontakke <nikhils@2ndquadrant.com>) |
Ответы |
Re: [HACKERS] logical decoding of two-phase transactions
(Andres Freund <andres@anarazel.de>)
Re: [HACKERS] logical decoding of two-phase transactions (Nikhil Sontakke <nikhils@2ndquadrant.com>) |
Список | pgsql-hackers |
Hi, I've been reviewing the last patch version, focusing mostly on the decoding group part. Let me respond to several points first, then new review bits. On 03/28/2018 05:28 PM, Nikhil Sontakke wrote: > Hi Tomas, > >> Now, about the interlock implementation - I see you've reused the "lock >> group" concept from parallel query. That may make sense, unfortunately >> there's about no documentation explaining how it works, what is the >> "protocol" etc. There is fairly extensive documentation for "lock >> groups" in src/backend/storage/lmgr/README, but while the "decoding >> group" code is inspired by it, the code is actually very different. >> Compare for example BecomeLockGroupLeader and BecomeDecodeGroupLeader, >> and you'll see what I mean. >> >> So I think the first thing we need to do is add proper documentation >> (possibly into the same README), explaining how the decode groups work, >> how the decodeAbortPending works, etc. >> > > I have added details about this in src/backend/storage/lmgr/README as > suggested by you. > Thanks. I think the README is a good start, but I think we also need to improve the comments, which is usually more detailed than the README. For example, it's not quite acceptable that LogicalLockTransaction and LogicalUnlockTransaction have about no comments, especially when it's meant to be public API for decoding plugins. >> >> BTW, do we need to do any of this with (wal_level < logical)? I don't >> see any quick bail-out in any of the functions in this case, but it >> seems like a fairly obvious optimization. >> > > The calls to the LogicalLockTransaction/LogicalUnLockTransaction APIs > will be from inside plugins or the reorderbuffer code paths. Those > will get invoked only in the wal_level logical case, hence I did not > add further checks. > Oh, right. >> Similarly, can't the logical workers indicate that they need to decode >> 2PC transactions (or in-progress transactions in general) in some way? >> If we knew there are no such workers, that would also allow ignoring the >> interlock, no? >> > > These APIs check if the transaction is already committed and cache > that information for further calls, so for regular transactions this > becomes a no-op > I see. So when the output plugin never calls LogicalLockTransaction on an in-progress transaction (e.g. 2PC after PREPARE), it never actually initializes the decoding group. Works for me. >> >> 2) regression tests >> ------------------- >> >> I really dislike the use of \set to run the same query repeatedly. It >> makes analysis of regression failures even more tedious than it already >> is. I'd just copy the query to all the places. >> > > They are long-winded queries and IMO made the test file look too > cluttered and verbose.. > Well, I don't think that's a major problem, and it certainly makes it more difficult to investigate regression failures. >> >> 3) worker.c >> ----------- >> >> The comment in apply_handle_rollback_prepared_txn says this: >> >> /* >> * During logical decoding, on the apply side, it's possible that a >> * prepared transaction got aborted while decoding. In that case, we >> * stop the decoding and abort the transaction immediately. However >> * the ROLLBACK prepared processing still reaches the subscriber. In >> * that case it's ok to have a missing gid >> */ >> if (LookupGXact(commit_data->gid)) { ... } >> >> But is it safe to assume it never happens due to an error? In other >> words, is there a way to decide that the GID really aborted? Or, why >> should the provider sent the rollback at all - surely it could know if >> the transaction/GID was sent to subscriber or not, right? >> > > Since we decode in commit WAL order, when we reach the ROLLBACK > PREPARED wal record, we cannot be sure that we did infact abort the > decoding mid ways because of this concurrent rollback. It's possible > that this rollback comes much much later as well when all decoding > backends have successfully prepared it on the subscribers already. > Ah, OK. So when the transaction gets aborted (by ROLLBACK PREPARED) concurrently with the decoding, we abort the apply transaction and discard the ReorderBufferTXN. Which means that later, when we decode the abort, we don't know whether the decoding reached abort or prepare, and so we have to send the ROLLBACK PREPARED to the subscriber too. For a moment I was thinking we might simply remember TXN outcome in reorder buffer, but obviously that does not work - the decoding might restart in between, and as you say the distance (in terms of WAL) may be quite significant. >> >> 7) proto.c / worker.c >> --------------------- >> >> Until now, the 'action' (essentially the first byte of each message) >> clearly identified what the message does. So 'C' -> commit, 'I' -> >> insert, 'D' -> delete etc. This also means the "handle" methods were >> inherently simple, because each handled exactly one particular action >> and nothing else. >> >> You've expanded the protocol in a way that suddenly 'C' means either >> COMMIT or ROLLBACK, and 'P' means PREPARE, ROLLBACK PREPARED or COMMIT >> PREPARED. I don't think that's how the protocol should be extended - if >> anything, it's damn confusing and unlike the existing code. You should >> define new action, and keep the handlers in worker.c simple. >> > > I thought this grouped regular commit and 2PC transactions properly. > Can look at this again if this style is not favored. > Hmmm, it's not how I'd do it, but perhaps someone who originally designed the protocol should review this bit. Now, the new bits ... attached is a .diff with a couple of changes and comments on various places. 1) LogicalLockTransaction - This function is part of a public API, yet it has no comment. That needs fixing - it has to be clear how to use it. The .diff suggests a comment, but it may need improvements. - As I mentioned in the previous review, BecomeDecodeGroupLeader is a misleading name. It suggest the called becomes a leader, while in fact it looks up the PROC running the XID and makes it a leader. This is obviously due to copying the code from lock groups, where the caller actually becomes the leader. It's incorrect here. I suggest something like LookupDecodeGroupLeader() or something. - In the "if (MyProc->decodeGroupLeader == NULL)" block there are two blocks rechecking the transaction status: if (proc == NULL) { ... recheck ... } if (!BecomeDecodeGroupMember(proc, proc->pid, rbtxn_prepared(txn))) { ... recheck ...} I suggest to join them into a single block. - This Assert() is either bogus and there can indeed be cases with (MyProc->decodeGroupLeader==NULL), or the "if" is unnecessary: Assert(MyProc->decodeGroupLeader); if (MyProc->decodeGroupLeader) { ... } - I'm wondering why we're maintaining decodeAbortPending flags both for the leader and all the members. ISTM it'd be perfectly fine to only check the leader, particularly because RemoveDecodeGroupMemberLocked removes the members from the decoding group. So that seems unnecessary, and we can remove the if (MyProc->decodeAbortPending) { ... } - LogicalUnlockTransaction needs a comment(s) too. 2) BecomeDecodeGroupLeader - Wrong name (already mentioned above). - It can bail out when (!proc), which will simplify the code a bit. - Why does it check PID of the process at all? Seems unnecessary, considering we're already checking the XID. - Can a proc executing a XID have a different leader? I don't think so, so I'd make that an Assert(). Assert(!proc || (proc->decodeGroupLeader == proc)); And it'll allow simplification of some of the conditions. - We're only dealing with prepared transactions now, so I'd just drop the is_prepared flag - it'll make the code a bit simpler, we can add it later in patch adding decoding of regular in-progress transactions. We can't test the (!is_prepared) anyway. - Why are we making the leader also a member of the group? Seems rather unnecessary, and it complicates the abort handling, because we need to skip the leader when deciding to wait. 3) LogicalDecodeRemoveTransaction - It's not clear to me what happens when a decoding backend gets killed between LogicalLockTransaction/LogicalUnlockTransaction. Doesn't that mean LogicalDecodeRemoveTransaction will get stuck, because the proc is still in the decoding group? - The loop now tweaks decodeAbortPending of the members, but I don't think that's necessary either - the LogicalUnlockTransaction can check the leader flag just as easily. 4) a bunch of comment / docs improvements, ... I'm suggesting rewording a couple of comments. I've also added a couple of missing comments - e.g. to LogicalLockTransaction and the lock group methods in general. Also, a couple more questions and suggestions in XXX comments. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
В списке pgsql-hackers по дате отправления:
Следующее
От: Andres FreundДата:
Сообщение: Re: [HACKERS] logical decoding of two-phase transactions