Re: The plan for FDW-based sharding

Поиск
Список
Период
Сортировка
От Kevin Grittner
Тема Re: The plan for FDW-based sharding
Дата
Msg-id CACjxUsOM+HyYB4WCyGngRds2dR4P1Tc2qOh8YToFSH+j9fjGYw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: The plan for FDW-based sharding  (Simon Riggs <simon@2ndQuadrant.com>)
Ответы Re: The plan for FDW-based sharding  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
On Fri, Feb 26, 2016 at 5:37 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 26 February 2016 at 22:48, Kevin Grittner <kgrittn@gmail.com> wrote:

>> if we want logical
>> replication to be free of serialization anomalies for those using
>> serializable transactions, we need to support applying transactions
>> in an order which may not be the same as commit order -- CSN (as
>> such) would be the wrong thing.  If serializable transaction 1 (T1)
>> modifies a row and concurrent serializable transaction 2 (T2) reads
>> the old version of the row, and modifies something based on that,
>> T2 must be applied to a logical replica first even if T1 commits
>> before it; otherwise the logical replica could see a state not
>> consistent with business rules and which could not have been seen
>> (due to SSI) on the source database.
>
> How would SSI allow that commit order?
>
> Surely there is a read-write dependency that would cause T2 to be
> aborted?

*A* read-write dependency does not cause an abort under SSI, it
takes a *pattern* of read-write dependencies which has been proven
to appear in any set of concurrent transactions which can cause a
serialization anomaly.  A read-only transaction can be part of that
pattern.  On a single database SSI can see whether a read has
caused such a problem.  If you replicate the transactions to
somewhere else and read them SSI cannot tell whether there is an
anomaly (at least, not without exchanging a lot of information that
isn't currently happening), so some other mechanism would probably
need to be used.  One possibility is to pass along information
about when things are in a state on the source that is known to be
free of anomalies if read; another would be to reorder the
application of transactions to match the apparent order of
execution.  The latter would not work for "physical" replication,
but should be fine for logical replication.  An implementation
might create a list in commit order, but not release the front of
the list for processing if it is a SERIALIZABLE transaction which
has written data until all overlapping SERIALIZABLE transactions
complete, so it can move any subsequently-committed SERIALIZABLE
transaction which read the "old" version of the data ahead of it.

>> Any DTM API which does not
>> support some mechanism to rearrange the order of transactions from
>> commit order to some other order (based on, for example, read-write
>> dependencies) is not complete.  If it does support that, it gives
>> us a way forward for presenting consistent data on logical
>> replicas.
>
> You appear to be saying that SSI allows transactions to commit in a
> non-serializable order.

Absolutely not.  If you want to understand this better, this paper
might be helpful:

http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf

> Do you have a test case?

There are a couple in this section of the Wiki page of examples:

https://wiki.postgresql.org/wiki/SSI#Read_Only_Transactions

Just picture the read-only transaction executing on a replica.

Thinking of commit sequence number as the right order to apply
transactions during replication seems to me to be a holdover from
the techniques initially developed for transaction in the 1960s --
specifically, strict two-phase locking (S2PL) is very easy to get
one's head around and when using it the apparent order of execution
always *does* match commit order.  Unfortunately S2PL performs so
poorly that it was ripped out of PostgreSQL years ago.  In general,
I think it is time we gave up on thinking that is based on it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: 大山真実
Дата:
Сообщение: Re: [PROPOSAL] VACUUM Progress Checker.
Следующее
От: John Gorman
Дата:
Сообщение: Re: Re: [COMMITTERS] pgsql: Respect TEMP_CONFIG when running contrib regression tests.