Re: [HACKERS] logical decoding of two-phase transactions

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: [HACKERS] logical decoding of two-phase transactions
Дата
Msg-id CAMsr+YGX1z5t_MccrJxd4VJPgZKx17EsAQcAQ_1iJ4dKDp0LOg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] logical decoding of two-phase transactions  (Nikhil Sontakke <nikhils@2ndquadrant.com>)
Ответы Re: [HACKERS] logical decoding of two-phase transactions  (Nikhil Sontakke <nikhils@2ndquadrant.com>)
Список pgsql-hackers
On 23 November 2017 at 20:27, Nikhil Sontakke wrote: > > Is there any other locking scenario that we need to consider? > Otherwise, are we all ok on this point being a non-issue for 2PC > logical decoding? > Yeah. I didn't find any sort of sensible situation where locking would pose issues. Unless you're taking explicit LOCKs on catalog tables, you should be fine. There may be issues with CLUSTER or VACUUM FULL of non-relmapped catalog relations I guess. Personally I put that in the "don't do that" box, but if we really have to guard against it we could slightly expand the limits on which txns you can PREPARE to any txn that has a strong lock on a catalog relation. > The issue is with a concurrent rollback of the prepared transaction. > We need a way to ensure that > the 2PC does not abort when we are in the midst of a change record > apply activity. > The *reason* we care about this is that tuples created by aborted txns are not considered "recently dead" by vacuum. They can be marked invalid and removed immediately due to hint-bit setting and HOT pruning, vacuum runs, etc. This could create an inconsistent view of the catalogs if our prepared txn did any DDL. For example, we might've updated a pg_class row, so we created a new row and set xmax on the old one. Vacuum may merrily remove our new row so there's no way we can find the correct data anymore, we'd have to find the outdated row or no row. By my reading of HeapTupleSatisfiesMVCC we'll see the old pg_class tuple. Similar issues apply for pg_attribute etc etc. We might try to decode a record according to the wrong table structure because relcache lookups performed by the plugin will report outdated information. The sanest option here seems to be to stop the txn from aborting while we're decoding it, hence Nikhil's suggestions. > * We introduce two new booleans in the TwoPhaseState > GlobalTransactionData structure. > bool beingdecoded; > bool abortpending; > I think it's premature to rule out the simple option of doing a LockGXact when we start decoding. Improve the error "prepared transaction with identifier \"%s\" is busy" to report the locking pid too. It means you cannot rollback or commit a prepared xact while it's being decoded, but for the intended use of this patch, I think that's absolutely fine anyway. But I like your suggestion much more than the idea of taking a LWLock on TwoPhaseStateLock while decoding a record. Lots of LWLock churn, and LWLocks held over arbitrary user plugin code. Not viable. With your way we just have to take a LWLock once on TwoPhaseState when we test abortpending and set beingdecoded. After that, during decoding, we can do unlocked tests of abortpending, since a stale read will do nothing worse than delay our response a little. The existing 2PC ops already take the LWLock and can examine beingdecoded then. I expect they'd need to WaitLatch in a loop until beingdecoded was cleared, re-acquiring the LWLock and re-checking each time it's woken. We should probably add a field there for a waiter proc that wants its latch set, so 2pc ops don't usually have to poll for decoding to finish. (Unless condition variables will help us here?) However, let me make an entirely alternative suggestion. Should we add a heavyweight lock class for 2PC xacts instead, and leverage the existing infrastructure? We already use transaction locks widely after all. That way, we just take some kind of share lock on the 2PC xact by xid when we start logical decoding of the 2pc xact. ROLLBACK PREPARED and COMMIT PREPARED would acquire the same heavyweight lock in an exclusive mode before grabbing TwoPhaseStateLock and doing their work. That way we get automatic cleanup when decoding procs exit, we get wakeups for waiters, etc, all for "free". How practical is adding a lock class? (Frankly I've often wished I could add new heavyweight lock classes when working on complex extensions like BDR, too, and in an ideal world we'd be able to register lock classes for use by extensions...) 3) Before starting decode of the next change record, we re-check if > "abortpending" is set. If "abortpending" > is set, we do not decode the next change record. Thus the abort is > delay-bounded to a maximum of one change record decoding/apply cycle > after we signal our intent to abort it. Then, we need to send ABORT > (regular, not rollback prepared, since we have not sent "PREPARE" yet. > Just to be explicit, this means "tell the downstream the xact has aborted". Currently logical decoding does not ever start decoding an xact until it's committed, so it has never needed an abort callback on the output plugin interface. But we'll need one when we start doing speculative logical decoding of big txns before they commit, and we'll need it for this. It's relatively trivial. > This abort_prepared callback will abort the dummy PREPARED query from > step (3) above. Instead of doing this, we could actually check if the > 'GID' entry exists and then call ROLLBACK PREPARED on the subscriber. > But in that case we can't be sure if the GID does not exist because of > a rollback-during-decode-issue on the provider or due to something > else. If we are ok with not finding GIDs on the subscriber side, then > am fine with removing the DUMMY prepare from step (3). > I prefer the latter approach personally, not doing the dummy 2pc xact. Instead we can just ignore a ROLLBACK PREPARED for a txn whose gid does not exist on the downstream. I can easily see situations where we might manually abort a txn and wouldn't want logical decoding to get perpetually stuck waiting to abort a gid that doesn't exist, for example. Ignoring commit prepared for a missing xact would not be great, but I think it's sensible enough to ignore missing GIDs for rollback prepared. We'd need a race-free way to do that though, so I think we'd have to extend FinishPreparedTransaction and LockGXact with some kind of missing_ok flag. I doubt that'd be controversial. A couple of other considerations not covered in what you wrote: - It's really important that the hook that decides whether to decode an xact at prepare or commit prepared time reports the same answer each and every time, including if it's called after a prepared txn has committed. It probably can't look at anything more than the xact's origin replica identity, xid, and gid. This also means we need to know the gid of prepared txns when processing their commit record, so we can tell logical decoding whether we already sent the data to the client at prepare-transaction time, or if we should send it at commit-prepared time instead. - You need to flush the syscache when you finish decoding a PREPARE TRANSACTION of an xact that made catalog changes, unless it's immediately followed by COMMIT PREPARED of the same xid. Because xacts between the two cannot see changes the prepared xact made to the catalogs. - For the same reason we need to ensure that the historic snapshot used to decode a 2PC xact that made catalog changes isn't then used for subsequent xacts between the prepare and commit. They'd see the uncommitted catalogs of the prepared xact. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: [HACKERS] INSERT ON CONFLICT and partitioned tables
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] HASH_CHUNK_SIZE vs malloc rounding