Re: Transactions involving multiple postgres foreign servers

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: Transactions involving multiple postgres foreign servers
Дата
Msg-id CAFjFpRf3=ByJ-k5jSveWa5auwo3SK2Z8FamWKUfvTCoSNjEM-w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Transactions involving multiple postgres foreign servers  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers


On Thu, Jan 8, 2015 at 8:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jan 7, 2015 at 11:20 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> If we are going to include a distributed transaction manager with
> PostgreSQL, it *must* persist enough information about the
> transaction ID and where it is used in a way that will survive a
> subsequent crash before beginning the PREPARE on any of the
> systems.  After all nodes are PREPAREd it must flag that persisted
> data to indicate that it is now at a point where ROLLBACK is no
> longer an option.  Only then can it start committing the prepared
> transactions.  After the last node is committed it can clear this
> information.  On start-up the distributed transaction manager must
> check for any distributed transactions left "in progress" and
> commit or rollback based on the preceding; doing retries
> indefinitely until it succeeds or is told to stop.

I think one key question here is whether all of this should be handled
in PostgreSQL core or whether some of it should be handled in other
ways.  Is the goal to make postgres_fdw (and FDWs for other databases
that support 2PC) to persist enough information that someone *could*
write a transaction manager for PostgreSQL, or is the goal to actually
write that transaction manager?

Just figuring out how to persist the necessary information is a
non-trivial problem by itself.  You might think that you could just
insert a row into a local table saying, hey, I'm about to prepare a
transaction remotely, but of course that doesn't work: if you then go
on to PREPARE before writing and flushing the local commit record,
then a crash before that's done leaves a dangling prepared transaction
on the remote note.  You might think to write the record, then after
writing and flush the local commit record do the PREPARE.  But you
can't do that either, because now if the PREPARE fails you've already
committed locally.

I guess what you need to do is something like:

1. Write and flush a WAL record indicating an intent to prepare, with
a list of foreign server OIDs and GUIDs.
2. Prepare the remote transaction on each node.  If any of those
operations fail, roll back any prepared nodes and error out.
3. Commit locally (i.e. RecordTransactionCommit, writing and flushing WAL).
4. Try to commit the remote transactions.
5. Write a WAL record indicating that you committed the remote transactions OK.

If you fail after step 1, you can straighten things out by looking at
the status of the transaction: if the transaction committed, any
transactions we intended-to-prepare need to be checked.  If they are
still prepared, we need to commit them or roll them back according to
what happened to our XID.

When you want to strengthen and commit things, the foreign server may not be available to do that. As Kevin pointed out in above, we need to keep on retrying to resolve (commit or rollback based on the status of local transaction) the PREPAREd transactions on foreign server till they are resolved. So, we will have to persist the information somewhere else than the WAL OR keep on persisting the WALs even after the corresponding local transaction has been committed or aborted, which I don't think is a good idea, since that will have impact on replication, VACUUM esp. because it's going to affect the oldest transaction in WAL.
 
That's where Andres's suggestion might help.

(Andres is talking in my other ear suggesting that we ought to reuse
the 2PC infrastructure to do all this.  I'm not convinced that's a
good idea, but I'll let him present his own ideas here if he wants to
rather than trying to explain them myself.)


We can persist the information about distributed transaction (which esp. require 2PC) similar to the way as 2PC infrastructure in pg_twophase directory. I am still investigating whether we can re-use existing 2PC infrastructure or not. My initial reaction is no, since 2PC persists information about local transaction including locked objects, WALs (?) in pg_twophase directory, which is not required for a distributed transaction. But rest of the mechanism like the manner of processing the records during normal processing and recovery looks very useful.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tatsuo Ishii
Дата:
Сообщение: Re: List of table names of a DB
Следующее
От: Mark Kirkwood
Дата:
Сообщение: Re: List of table names of a DB