Re: Transactions involving multiple postgres foreign servers, take 2

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: Transactions involving multiple postgres foreign servers, take 2
Дата
Msg-id 20201013.100013.944300454073643592.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: Transactions involving multiple postgres foreign servers, take 2  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Ответы Re: Transactions involving multiple postgres foreign servers, take 2  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Список pgsql-hackers
At Fri, 9 Oct 2020 21:45:57 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in 
> On Fri, 9 Oct 2020 at 14:55, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> >
> > At Fri, 9 Oct 2020 02:33:37 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in
> > > From: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
> > > > What about temporary network failures? I think there are users who
> > > > don't want to give up resolving foreign transactions failed due to a
> > > > temporary network failure. Or even they might want to wait for
> > > > transaction completion until they send a cancel request. If we want to
> > > > call the commit routine only once and therefore want FDW to retry
> > > > connecting the foreign server within the call, it means we require all
> > > > FDW implementors to write a retry loop code that is interruptible and
> > > > ensures not to raise an error, which increases difficulty.
> > > >
> > > > Yes, but if we don’t retry to resolve foreign transactions at all on
> > > > an unreliable network environment, the user might end up requiring
> > > > every transaction to check the status of foreign transactions of the
> > > > previous distributed transaction before starts. If we allow to do
> > > > retry, I guess we ease that somewhat.
> > >
> > > OK.  As I said, I'm not against trying to cope with temporary network failure.  I just don't think it's
mandatory. If the network failure is really temporary and thus recovers soon, then the resolver will be able to commit
thetransaction soon, too.
 
> >
> > I should missing something, though...
> >
> > I don't understand why we hate ERRORs from fdw-2pc-commit routine so
> > much. I think remote-commits should be performed before local commit
> > passes the point-of-no-return and the v26-0002 actually places
> > AtEOXact_FdwXact() before the critical section.
> >
> 
> So you're thinking the following sequence?
> 
> 1. Prepare all foreign transactions.
> 2. Commit the all prepared foreign transactions.
> 3. Commit the local transaction.
> 
> Suppose we have the backend process call the commit routine, what if
> one of FDW raises an ERROR during committing the foreign transaction
> after committing other foreign transactions? The transaction will end
> up with an abort but some foreign transactions are already committed.

Ok, I understand what you are aiming.

It is apparently out of the focus of the two-phase commit
protocol. Each FDW server can try to keep the contract as far as its
ability reaches, but in the end such kind of failure is
inevitable. Even if we require FDW developers not to respond until a
2pc-commit succeeds, that just leads the whole FDW-cluster to freeze
even not in an extremely bad case.

We have no other choices than shutting the server down (then the
succeeding server start removes the garbage commits) or continueing
working leaving some information in a system storage (or reverting the
garbage commits). What we can do in that case is to provide a
automated way to resolve the inconsistency.

> Also, what if the backend process failed to commit the local
> transaction? Since it already committed all foreign transactions it
> cannot ensure the global atomicity in this case too. Therefore, I
> think we should commit the distributed transactions in the following
> sequence:

Ditto. It's out of the range of 2pc. Using p2c for local transaction
could reduce that kind of failure but I'm not sure. 3pc, 4pc ...npc
could reduce the probability but can't elimite failure cases.

> 1. Prepare all foreign transactions.
> 2. Commit the local transaction.
> 3. Commit the all prepared foreign transactions.
> 
> But this is still not a perfect solution. If we have the backend

2pc is not a perfect solution in the first place. Attaching a similar
phase to it cannot make it "perfect".

> process call the commit routine and an error happens during executing
> the commit routine of an FDW (i.g., at step 3) it's too late to report
> an error to the client because we already committed the local
> transaction. So the current solution is to have a background process
> commit the foreign transactions so that the backend can just wait
> without the possibility of errors.

Whatever process tries to complete a transaction, the client must wait
for the transaction to end and anyway that's just a freeze in the
client's view, unless you intended to respond to local commit before
all participant complete.

I don't think most of client applications wouldn't wait for frozen
server forever.  We have the same issue at the time the client decided
to give up the transacton, or the leader session is killed.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #15858: could not stat file - over 4GB
Следующее
От: "tsunakawa.takay@fujitsu.com"
Дата:
Сообщение: RE: [Patch] Optimize dropping of relation buffers using dlist