Re: [HACKERS] logical decoding of two-phase transactions

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [HACKERS] logical decoding of two-phase transactions
Дата
Msg-id CA+TgmoZGhw7WJ++w+GF13EbxHm58aCWUTc8an6HvfacQEY-94Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] logical decoding of two-phase transactions  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: [HACKERS] logical decoding of two-phase transactions  (Nikhil Sontakke <nikhils@2ndquadrant.com>)
Список pgsql-hackers
On Wed, Jul 18, 2018 at 11:27 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>> One idea is that maybe the running transaction could communicate with
>> the decoding process through shared memory.  For example, suppose that
>> before you begin decoding an ongoing transaction, you have to send
>> some kind of notification to the process saying "hey, I'm going to
>> start decoding you" and wait for that process to acknowledge receipt
>> of that message (say, at the next CFI).  Once it acknowledges receipt,
>> you can begin decoding.  Then, we're guaranteed that the foreground
>> process knows when that it  must be careful about catalog changes. If
>> it's going to make one, it sends a note to the decoding process and
>> says, hey, sorry, I'm about to do catalog changes, please pause
>> decoding.  Once it gets an acknowledgement that decoding has paused,
>> it continues its work.  Decoding resumes after commit (or maybe
>> earlier if it's provably safe).
> Let's assume running transaction is holding an exclusive lock on something.
> We start decoding it and do this little dance with sending messages,
> confirmations etc. The decoding starts, and the plugin asks for the same
> lock (and starts waiting). Then the transaction decides to do some catalog
> changes, and sends a "pause" message to the decoding. Who's going to
> respond, considering the decoding is waiting for the lock (and it's not easy
> to jump out, because it might be deep inside the output plugin, i.e. deep in
> some extension).

I think it's inevitable that any solution that is based on pausing
decoding might have to wait for a theoretically unbounded time for
decoding to get back to a point where it can safely pause.  That is
one of several reasons why I don't believe that any solution based on
holding off aborts has any chance of being acceptable -- mid-abort is
a terrible time to pause.  Now, if the time is not only theoretically
unbounded but also in practice likely to be very long (e.g. the
foreground transaction could easily have to wait minutes for the
decoding process to be able to process the pause request), then this
whole approach is probably not going to work.  If, on the other hand,
the time is theoretically unbounded but in practice likely to be no
more than a few seconds in almost every case, then we might have
something.  I don't know which is the case.  It probably depends on
where you put the code to handle pause requests, and I'm not sure what
options are viable.  For example, if there's a loop that eats WAL
records one at a time, and we can safely pause after any given
iteration of that loop, that sounds pretty good, unless a single
iteration of that loop might hang inside of a network I/O, in which
case it sounds ... less good, probably?  But there might be ways
around that, too, like ... could we pause at the next CFI?  I don't
understand the constraints well enough to comment intelligently here.

>> The newer approach could be considered an improvement in that you've
>> tried to get your hands around the problem at an earlier point, but
>> it's not early enough.  To take a very rough analogy, the original
>> approach was like trying to install a sprinkler system after the
>> building had already burned down, while the new approach is like
>> trying to install a sprinkler system when you notice that the building
>> is on fire.
>
> When an oil well is burning, they detonate a small bomb next to it to
> extinguish it. What would be the analogy to that, here? pg_resetwal? ;-)

Yep.  :-)

>> But we need to install the sprinkler system in advance.
>
> Damn causality!

I know, right?

>> Are you talking about HOT updates, or HOT pruning?  Disabling the
>> former wouldn't help, and disabling the latter would break VACUUM,
>> which assumes that any tuple not removed by HOT pruning is not a dead
>> tuple (cf. 1224383e85eee580a838ff1abf1fdb03ced973dc, which was caused
>> by a case where that wasn't true).
>
> I'm talking about the issue you described here:
>
> https://www.postgresql.org/message-id/CA+TgmoZP0SxEfKW1Pn=ackUj+KdWCxs7PumMAhSYJeZ+_61_GQ@mail.gmail.com

There are several issues there.  The second and third ones boil down
to this: As soon as the system thinks that your transaction is no
longer in process, it is going to start making decisions based on
whether that transaction committed or aborted.  If it thinks your
transaction aborted, it is going to feel entirely free to make
decisions that permanently lose information -- like removing tuples or
overwriting CTIDs or truncating CLOG or killing index entries.  I
doubt it makes any sense to try to fix each of those problems
individually -- if we're going to do something about this, it had
better be broad enough to nail all or nearly all of the problems in
this area in one fell swoop.

The first issue in that email is different.  That's really about the
possibility that the aborted transaction itself has created chaos,
whereas the other ones are about the chaos that the rest of the system
might impose based on the belief that the transaction is no longer
needed for anything after an abort has occurred.

> A dumb question - would this work with subtransaction-level aborts? I mean,
> a transaction that does some catalog changes in a subxact which then however
> aborts, but then still continues.

Well, I would caution you against relying on me to design this for
you.  The fact that I can identify the pitfalls of trying to install a
sprinkler system while the building is on fire does not mean that I
know what diameter of pipe should be used to provide for proper fire
containment.  It's really important that this gets designed by someone
who knows -- or learns -- enough to make it really good and safe.
Replacing obvious problems (the building has already burned down!)
with subtler problems (the water pressure is insufficient to reach the
upper stories!) might get the patch committed, but that's not the
right goal.

That having been said, I cannot immediately see any reason why the
idea that I sketched there couldn't be made to work just as well or
poorly for subtransactions as it would for toplevel transactions.  I
don't really know that it will work even for toplevel transactions --
that would require more thought and careful study than I've given it
(or, given that this is not my patch, feel that I should need to give
it).  However, if it does, and if there are no other problems that
I've missed in thinking casually about it, then I think it should be
possible to make it work for subtransactions, too.  Likely, as the
decoding process first encountered each new sub-XID, it would need to
magically acquire a duplicate lock and advertise the subxid just as it
did for the toplevel XID, so that at any given time the set of XIDs
advertised by the decoding process would be a subset (not necessarily
proper) of the set advertised by the foreground process.

To try to be a little clearer about my overall position, I am
suggesting that you (1) abandon the current approach and (2) make sure
that everything is done by making sufficient preparations in advance
of any abort rather than trying to cope after it's already started.  I
am also suggesting that, to get there, it might be helpful to (a)
contemplate communication and active cooperation between the running
process and the decoding process(es), but it might turn out not to be
needed and I don't know exactly what needs to be communicated, (b)
consider whether it there's a reasonable way to make it look to other
parts of the system like the aborted transaction is still running, but
this also might turn out not to be the right approach, (c) consider
whether logical decoding already does or can be made to use historical
catalog snapshots that only see command IDs prior to the current one
so that incompletely-made changes by the last CID aren't seen if an
abort happens.  I think there is a good chance that a full solution
involves more than one of these things, and maybe some other things I
haven't thought about.  These are ideas, not a plan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Bossart, Nathan"
Дата:
Сообщение: Re: Add SKIP LOCKED to VACUUM and ANALYZE
Следующее
От: Robert Haas
Дата:
Сообщение: Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)"