Re: Transactions involving multiple postgres foreign servers, take 2

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Transactions involving multiple postgres foreign servers, take 2
Дата
Msg-id CA+fd4k70uUbvkHp8q4DJ52ZNWFKPQryFruyuniHvmecvnbXszw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Transactions involving multiple postgres foreign servers, take 2  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы Re: Transactions involving multiple postgres foreign servers, take 2  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-hackers
On Tue, 13 Oct 2020 at 10:00, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
>
> At Fri, 9 Oct 2020 21:45:57 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in
> > On Fri, 9 Oct 2020 at 14:55, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> > >
> > > At Fri, 9 Oct 2020 02:33:37 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in
> > > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
> > > > > What about temporary network failures? I think there are users who
> > > > > don't want to give up resolving foreign transactions failed due to a
> > > > > temporary network failure. Or even they might want to wait for
> > > > > transaction completion until they send a cancel request. If we want to
> > > > > call the commit routine only once and therefore want FDW to retry
> > > > > connecting the foreign server within the call, it means we require all
> > > > > FDW implementors to write a retry loop code that is interruptible and
> > > > > ensures not to raise an error, which increases difficulty.
> > > > >
> > > > > Yes, but if we don’t retry to resolve foreign transactions at all on
> > > > > an unreliable network environment, the user might end up requiring
> > > > > every transaction to check the status of foreign transactions of the
> > > > > previous distributed transaction before starts. If we allow to do
> > > > > retry, I guess we ease that somewhat.
> > > >
> > > > OK.  As I said, I'm not against trying to cope with temporary network failure.  I just don't think it's
mandatory. If the network failure is really temporary and thus recovers soon, then the resolver will be able to commit
thetransaction soon, too. 
> > >
> > > I should missing something, though...
> > >
> > > I don't understand why we hate ERRORs from fdw-2pc-commit routine so
> > > much. I think remote-commits should be performed before local commit
> > > passes the point-of-no-return and the v26-0002 actually places
> > > AtEOXact_FdwXact() before the critical section.
> > >
> >
> > So you're thinking the following sequence?
> >
> > 1. Prepare all foreign transactions.
> > 2. Commit the all prepared foreign transactions.
> > 3. Commit the local transaction.
> >
> > Suppose we have the backend process call the commit routine, what if
> > one of FDW raises an ERROR during committing the foreign transaction
> > after committing other foreign transactions? The transaction will end
> > up with an abort but some foreign transactions are already committed.
>
> Ok, I understand what you are aiming.
>
> It is apparently out of the focus of the two-phase commit
> protocol. Each FDW server can try to keep the contract as far as its
> ability reaches, but in the end such kind of failure is
> inevitable. Even if we require FDW developers not to respond until a
> 2pc-commit succeeds, that just leads the whole FDW-cluster to freeze
> even not in an extremely bad case.
>
> We have no other choices than shutting the server down (then the
> succeeding server start removes the garbage commits) or continueing
> working leaving some information in a system storage (or reverting the
> garbage commits). What we can do in that case is to provide a
> automated way to resolve the inconsistency.
>
> > Also, what if the backend process failed to commit the local
> > transaction? Since it already committed all foreign transactions it
> > cannot ensure the global atomicity in this case too. Therefore, I
> > think we should commit the distributed transactions in the following
> > sequence:
>
> Ditto. It's out of the range of 2pc. Using p2c for local transaction
> could reduce that kind of failure but I'm not sure. 3pc, 4pc ...npc
> could reduce the probability but can't elimite failure cases.

IMO the problems I mentioned arise from the fact that the above
sequence doesn't really follow the 2pc protocol in the first place.

We can think of the fact that we commit the local transaction without
preparation while preparing foreign transactions as that we’re using
the 2pc with last resource transaction optimization (or last agent
optimization)[1]. That is, we prepare all foreign transactions first
and the local node is always the last resource to process. At this
time, the outcome of the distributed transaction completely depends on
the fate of the last resource (i.g., the local transaction). If it
fails, the distributed transaction must be abort by rolling back
prepared foreign transactions. OTOH, if it succeeds, all prepared
foreign transaction must be committed. Therefore, we don’t need to
prepare the last resource and can commit it. In this way, if we want
to commit the local transaction without preparation, the local
transaction must be committed at last. But since the above sequence
doesn’t follow this protocol, we will have such problems. I think if
we follow the 2pc properly, such basic failures don't happen.

>
> > 1. Prepare all foreign transactions.
> > 2. Commit the local transaction.
> > 3. Commit the all prepared foreign transactions.
> >
> > But this is still not a perfect solution. If we have the backend
>
> 2pc is not a perfect solution in the first place. Attaching a similar
> phase to it cannot make it "perfect".
>
> > process call the commit routine and an error happens during executing
> > the commit routine of an FDW (i.g., at step 3) it's too late to report
> > an error to the client because we already committed the local
> > transaction. So the current solution is to have a background process
> > commit the foreign transactions so that the backend can just wait
> > without the possibility of errors.
>
> Whatever process tries to complete a transaction, the client must wait
> for the transaction to end and anyway that's just a freeze in the
> client's view, unless you intended to respond to local commit before
> all participant complete.

Yes, but the point of using a separate process is that even if FDW
code raises an error, the client wanting for transaction resolution
doesn't get it and it's interruptible.

[1] https://docs.oracle.com/cd/E13222_01/wls/docs91/jta/llr.html

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kasahara Tatsuhito
Дата:
Сообщение: Re: Add a description to the documentation that toast_tuple_target affects "Main"
Следующее
От: Masahiro Ikeda
Дата:
Сообщение: Re: New statistics for tuning WAL buffer size