Re: Transactions involving multiple postgres foreign servers

Поиск
Список
Период
Сортировка
От Jim Nasby
Тема Re: Transactions involving multiple postgres foreign servers
Дата
Msg-id 54B06C18.8060001@BlueTreble.com
обсуждение исходный текст
Ответ на Re: Transactions involving multiple postgres foreign servers  (Kevin Grittner <kgrittn@ymail.com>)
Ответы Re: Transactions involving multiple postgres foreign servers  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-hackers
On 1/8/15, 12:00 PM, Kevin Grittner wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Jan 8, 2015 at 10:19 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
>>> Robert Haas <robertmhaas@gmail.com> wrote:
>>>> Andres is talking in my other ear suggesting that we ought to
>>>> reuse the 2PC infrastructure to do all this.
>>>
>>> If you mean that the primary transaction and all FDWs in the
>>> transaction must use 2PC, that is what I was saying, although
>>> apparently not clearly enough.  All nodes *including the local one*
>>> must be prepared and committed with data about the nodes saved
>>> safely off somewhere that it can be read in the event of a failure
>>> of any of the nodes *including the local one*.  Without that, I see
>>> this whole approach as a train wreck just waiting to happen.
>>
>> Clearly, all the nodes other than the local one need to use 2PC.  I am
>> unconvinced that the local node must write a 2PC state file only to
>> turn around and remove it again almost immediately thereafter.
>
> The key point is that the distributed transaction data must be
> flagged as needing to commit rather than roll back between the
> prepare phase and the final commit.  If you try to avoid the
> PREPARE, flagging, COMMIT PREPARED sequence by building the
> flagging of the distributed transaction metadata into the COMMIT
> process, you still have the problem of what to do on crash
> recovery.  You really need to use 2PC to keep that clean, I think.

If we had an independent transaction coordinator then I agree with you Kevin. I think Robert is proposing that if we
arecontrolling one of the nodes that's participating as well as coordinating the overall transaction that we can take
someshortcuts. AIUI a PREPARE means you are completely ready to commit. In essence you're just waiting to write and
fsyncthe commit message. That is in fact the state that a coordinating PG node would be in by the time everyone else
hasdone their prepare. So from that standpoint we're OK.
 

Now, as soon as ANY of the nodes commit, our coordinating node MUST be able to commit as well! That would require it to
havea real prepared transaction of it's own created. However, as long as there is zero chance of any other prepared
transactionscommitting before our local transaction, that step isn't actually needed. Our local transaction will either
commitor abort, and that will determine what needs to happen on all other nodes.
 

I'm ignoring the question of how the local node needs to store info about the other nodes in case of a crash, but
AFAICTyou could reliably recover manually from what I just described.
 

I think the question is: are we OK with "going under the skirt" in this fashion? Presumably it would provide better
performance,whereas forcing ourselves to eat our own 2PC dogfood would presumably make it easier for someone to plugin
anexternal coordinator instead of using our own. I think there's also a lot to be said for getting a partial
implementationof this available today (requiring manual recovery), so long as it's not in core.
 

BTW, I found https://www.cs.rutgers.edu/~pxk/417/notes/content/transactions.html a useful read, specifically the 2PC
portion.

>>> I'm not really clear on the mechanism that is being proposed for
>>> doing this, but one way would be to have the PREPARE of the local
>>> transaction be requested explicitly and to have that cause all FDWs
>>> participating in the transaction to also be prepared.  (That might
>>> be what Andres meant; I don't know.)
>>
>> We want this to be client-transparent, so that the client just says
>> COMMIT and everything Just Works.
>
> What about the case where one or more nodes doesn't support 2PC.
> Do we silently make the choice, without the client really knowing?

We abort. (Unless we want to have a running_with_scissors GUC...)

>>> That doesn't strike me as the
>>> only possible mechanism to drive this, but it might well be the
>>> simplest and cleanest.  The trickiest bit might be to find a good
>>> way to persist the distributed transaction information in a way
>>> that survives the failure of the main transaction -- or even the
>>> abrupt loss of the machine it's running on.
>>
>> I'd be willing to punt on surviving a loss of the entire machine.  But
>> I'd like to be able to survive an abrupt reboot.
>
> As long as people are aware that there is an urgent need to find
> and fix all data stores to which clusters on the failed machine
> were connected via FDW when there is a hard machine failure, I
> guess it is OK.  In essence we just document it and declare it to
> be somebody else's problem.  In general I would expect a
> distributed transaction manager to behave well in the face of any
> single-machine failure, but if there is one aspect of a
> full-featured distributed transaction manager we could give up, I
> guess that would be it.

ISTM that one option here would be to "simply" write and sync WAL record(s) of all externally prepared transactions.
Thatwould be enough for a hot standby to find all the other servers and tell them to either commit or abort, based on
whetherour local transaction committed or aborted. If you wanted, you could even have the standby be responsible for
tellingall the other participants to commit...
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: libpq 9.4 requires /etc/passwd?
Следующее
От: Jim Nasby
Дата:
Сообщение: Re: Parallel Seq Scan