Re: Transactions involving multiple postgres foreign servers, take 2

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Transactions involving multiple postgres foreign servers, take 2
Дата
Msg-id CA+fd4k5BcRP8=inj0jB2KGKjQd7iu8sv3XdGiiq-qG-2_5_+Vw@mail.gmail.com
обсуждение исходный текст
Ответ на RE: Transactions involving multiple postgres foreign servers, take 2  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Ответы RE: Transactions involving multiple postgres foreign servers, take 2  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Список pgsql-hackers
On Thu, 8 Oct 2020 at 18:05, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> Sorry to be late to respond.  (My PC is behaving strangely after upgrading Win10 2004)
>
> From: Masahiko Sawada <sawada.mshk@gmail.com>
> > After more thoughts on Tsunakawa-san’s idea it seems to need the
> > following conditions:
> >
> > * At least postgres_fdw is viable to implement these APIs while
> > guaranteeing not to happen any error.
> > * A certain number of FDWs (or majority of FDWs) can do that in a
> > similar way to postgres_fdw by using the guideline and probably
> > postgres_fdw as a reference.
> >
> > These are necessary for FDW implementors to implement APIs while
> > following the guideline and for the core to trust them.
> >
> > As far as postgres_fdw goes, what we need to do when committing a
> > foreign transaction resolution is to get a connection from the
> > connection cache or create and connect if not found, construct a SQL
> > query (COMMIT/ROLLBACK PREPARED with identifier) using a fixed-size
> > buffer, send the query, and get the result. The possible place to
> > raise an error is limited. In case of failures such as connection
> > error FDW can return false to the core along with a flag indicating to
> > ask the core retry. Then the core will retry to resolve foreign
> > transactions after some sleep. OTOH if FDW sized up that there is no
> > hope of resolving the foreign transaction, it also could return false
> > to the core along with another flag indicating to remove the entry and
> > not to retry. Also, the transaction resolution by FDW needs to be
> > cancellable (interruptible) but cannot use CHECK_FOR_INTERRUPTS().
> >
> > Probably, as Tsunakawa-san also suggested, it’s not impossible to
> > implement these APIs in postgres_fdw while guaranteeing not to happen
> > any error, although not sure the code complexity. So I think the first
> > condition may be true but not sure about the second assumption,
> > particularly about the interruptible part.
>
> Yeah, I expect the commit of the second phase should not be difficult for the FDW developer.
>
> As for the cancellation during commit retry, I don't think we necessarily have to make the TM responsible for
retryingthe commits.  Many DBMSs have their own timeout functionality such as connection timeout, socket timeout, and
statementtimeout. 
> Users can set those parameters in the foreign server options based on how long the end user can wait.  That is, TM
callsFDW's commit routine just once. 

What about temporary network failures? I think there are users who
don't want to give up resolving foreign transactions failed due to a
temporary network failure. Or even they might want to wait for
transaction completion until they send a cancel request. If we want to
call the commit routine only once and therefore want FDW to retry
connecting the foreign server within the call, it means we require all
FDW implementors to write a retry loop code that is interruptible and
ensures not to raise an error, which increases difficulty.

Also, what if the user sets the statement timeout to 60 sec and they
want to cancel the waits after 5 sec by pressing ctl-C? You mentioned
that client libraries of other DBMSs don't have asynchronous execution
functionality. If the SQL execution function is not interruptible, the
user will end up waiting for 60 sec, which seems not good.

> If the TM makes efforts to retry commits, the duration would be from a few seconds to 30 seconds.  Then, we can hold
backthe cancellation during that period. 
>
>
> > I thought we could support both ideas to get their pros; supporting
> > Tsunakawa-san's idea and then my idea if necessary, and FDW can choose
> > whether to ask the resolver process to perform 2nd phase of 2PC or
> > not. But it's not a good idea in terms of complexity.
>
> I don't feel the need for leaving the commit to the resolver during normal operation.

I meant it's for FDWs that cannot guarantee not to happen error during
resolution.

>  seems like if failed to resolve, the backend would return an
> > acknowledgment of COMMIT to the client and the resolver process
> > resolves foreign prepared transactions in the background. So we can
> > ensure that the distributed transaction is completed at the time when
> > the client got an acknowledgment of COMMIT if 2nd phase of 2PC is
> > successfully completed in the first attempts. OTOH, if it failed for
> > whatever reason, there is no such guarantee. From an optimistic
> > perspective, i.g., the failures are unlikely to happen, it will work
> > well but IMO it’s not uncommon to fail to resolve foreign transactions
> > due to network issue, especially in an unreliable network environment
> > for example geo-distributed database. So I think it will end up
> > requiring the client to check if preceding distributed transactions
> > are completed or not in order to see the results of these
> > transactions.
>
> That issue exists with any method, doesn't it?

Yes, but if we don’t retry to resolve foreign transactions at all on
an unreliable network environment, the user might end up requiring
every transaction to check the status of foreign transactions of the
previous distributed transaction before starts. If we allow to do
retry, I guess we ease that somewhat.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: partition routing layering in nodeModifyTable.c
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Resetting spilled txn statistics in pg_stat_replication