Обсуждение: How to reproduce serialization failure for a read only transaction.

Поиск
Список
Период
Сортировка

How to reproduce serialization failure for a read only transaction.

От
AK
Дата:
If two transactions both read and write, I can easily reproduce the
following: "could not serialize access due to read/write dependencies among
transactions". However, the 9.3 documentation says that "When relying on
Serializable transactions to prevent anomalies, it is important that any
data read from a permanent user table not be considered valid until the
transaction which read it has successfully committed. This is true even for
read-only transactions".

I cannot have a read-only transaction fail because of serialization
anomalies. Can someone show me a working example please?



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/How-to-reproduce-serialization-failure-for-a-read-only-transaction-tp5785569.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: How to reproduce serialization failure for a read only transaction.

От
Florian Pflug
Дата:
On Jan6, 2014, at 20:41 , AK <alkuzo@gmail.com> wrote:
> If two transactions both read and write, I can easily reproduce the
> following: "could not serialize access due to read/write dependencies among
> transactions". However, the 9.3 documentation says that "When relying on
> Serializable transactions to prevent anomalies, it is important that any
> data read from a permanent user table not be considered valid until the
> transaction which read it has successfully committed. This is true even for
> read-only transactions".
>
> I cannot have a read-only transaction fail because of serialization
> anomalies. Can someone show me a working example please?

A read-only transaction will abort due to a serialization failure if
observes a state of the database which doesn't exist in any serial transaction
schedule. Here's an example (default isolation level is assumed to be
serializable, of course)

W1: START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
W1: UPDATE t SET count=count+1 WHERE id=1; -- (*2)
W1: SELECT data FROM t WHERE id=2;         -- (*1)
W2: START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
W2: UPDATE t SET count=count+1 WHERE id=2; -- (*1, *2)
W2: COMMIT;
R : START TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY;
R : SELECT data FROM t WHERE id IN (1,2);  -- (*2)
W1: COMMIT; -- R will now report a serialization error!

Due to (*1), W1 must execute before W2 in any serial schedule, since W1
reads record 2 which is later modified by W2.

Due to (*2), R must execute after W2 but before W1 since it reads record
2 previously modified by W2 and record 1 later modified by W1. (Note that
W1 hasn't committed at time R acquires its snapshot)

The dependencies induced by (*1) or (*2) alone are satisfyable by a serial
schedule, but both together aren't - if W1 must execute before W2 as required
by (*1), then surely every transaction that runs after W2 in such a schedule
also runs after W1, thus contradicting (*2).

Now since (*1) alone isn't contradictory, committing W1 succeeds. That leaves
only the last line, the COMMIT of R, to fail, which it does.

The gist of this example is that whether the state observed by R exists in
any serial transaction schedule or not is only certain after all concurrent
read-write transactions (W1 and W2) have committed. You can avoid the error
above by specifying DEFERRABLE in R's START TRANSACTION command. The session
will then acquire a snapshot and wait for all possibly interfering read-write
transactions to commit. If the snapshot turns out to be observable in some
serial schedule, the session will continue, otherwise the database will
acquire a new snapshot and wait again. Thus, once the START TRANSACTION
with the DEFERRABLE flag has committed, you can be sure that the transaction
won't later be aborted due to a serialization error.

BTW, since this is a question about how to use postgres rather than
how to extend it, it actually belongs on pgsql-general, not on the hackers list.

best regards,
Florian Pflug




Re: How to reproduce serialization failure for a read only transaction.

От
AK
Дата:
Hi Florian,

can you explain why do you state that "default isolation level is assumed to
be
serializable, of course", when you explicitly specify isolation level for
every session - why should he default matter at all?

When I am trying to reproduce the scenario which you have posted, I am
observing different results. Here is my full scenario:

Session 1. Setting up:

CREATE TABLE cars( license_plate VARCHAR NOT NULL, reserved_by VARCHAR NULL
);
INSERT INTO cars(license_plate)
VALUES ('SUPRUSR'),('MIDLYPH');

Session 2: W1

BEGIN ISOLATION LEVEL SERIALIZABLE;

UPDATE cars SET reserved_by = 'Julia' WHERE license_plate = 'SUPRUSR' AND reserved_by IS NULL;

SELECT * FROM Cars
WHERE license_plate IN('SUPRUSR','MIDLYPH');

Session 3: W2

BEGIN ISOLATION LEVEL SERIALIZABLE;

UPDATE cars SET reserved_by = 'Ryan' WHERE license_plate = 'MIDLYPH' AND reserved_by IS NULL;

COMMIT;

Session 4: R

BEGIN ISOLATION LEVEL SERIALIZABLE READ ONLY;

SELECT * FROM Cars 
WHERE license_plate IN('SUPRUSR','MIDLYPH');

Session 2: W1

COMMIT;

ERROR:  could not serialize access due to read/write dependencies among
transactions

What am I doing wrong?

Thank you for your help!



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/How-to-reproduce-serialization-failure-for-a-read-only-transaction-tp5785569p5785597.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: Re: How to reproduce serialization failure for a read only transaction.

От
Florian Pflug
Дата:
On Jan6, 2014, at 23:28 , AK <alkuzo@gmail.com> wrote:
> can you explain why do you state that "default isolation level is assumed to
> be
> serializable, of course", when you explicitly specify isolation level for
> every session - why should he default matter at all?

Sorry, that was a leftover - I initially wrote just START TRANSACTION with
specifying an isolation level.

>
> When I am trying to reproduce the scenario which you have posted, I am
> observing different results.

Hm, yeah, I missed two things.

First, dependency tracking can produce false positives, i.e. assume that
dependencies exist between transactions which are actually independent.
In my example, postgres fails to realize that W2 can be executed after W1,
unless it uses an index scan for the UPDATE in W2. You can avoid that either
by creating an index on the id column, and forcing W2 to use that by setting
enable_seqscan to off, or by creating two tables t1 and t2 instead of one
table t with two records (You'll have to modify the SELECT to scan both tables
too).

Second, since R executes it's SELECT before W1 commits, postgres is already
aware that R poses a problem when W1 commits, and it chooses to cancel W1
instead of R. To avoid that, R needs to do the SELECT after W1 committed.
Yet still force R to acquire a snapshot *before* that commit (without that,
there's no serialization failure since R than simply executes after W1 and
W2), you'll need to do e.g. SELECT 1 after R's START TRANSACTION command.

I think the following should work (or, rather, fail)

CREATE TABLE t (id INT PRIMARY KEY, count INT);
INSERT INTO t (id, count) SELECT i, 0 FROM generate_series(1,2);

W1: START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
W1: UPDATE t SET count=count+1 WHERE id=1;
W1: SELECT count FROM t WHERE id=2;
W2: SET enable_seqscan=off;
W2: START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
W2: UPDATE t SET count=count+1 WHERE id=2;
W2: COMMIT;
R : START TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY;
R : SELECT 1;
W1: COMMIT;
R : SELECT data FROM t WHERE id IN (1,2);  -- Should fail

best regards,
Florian Pflug




Re: Re: How to reproduce serialization failure for a read only transaction.

От
Jim Nasby
Дата:
On 1/6/14, 5:27 PM, Florian Pflug wrote:> On Jan6, 2014, at 23:28 , AK <alkuzo@gmail.com> wrote:> First, dependency
trackingcan produce false positives, i.e. assume that> dependencies exist between transactions which are actually
independent.>In my example, postgres fails to realize that W2 can be executed after W1,> unless it uses an index scan
forthe UPDATE in W2. You can avoid that either> by creating an index on the id column, and forcing W2 to use that by
setting>enable_seqscan to off, or by creating two tables t1 and t2 instead of one> table t with two records (You'll
haveto modify the SELECT to scan both tables> too).>> Second, since R executes it's SELECT before W1 commits, postgres
isalready> aware that R poses a problem when W1 commits, and it chooses to cancel W1> instead of R. To avoid that, R
needsto do the SELECT after W1 committed.> Yet still force R to acquire a snapshot *before* that commit (without that,>
there'sno serialization failure since R than simply executes after W1 and> W2), you'll need to do e.g. SELECT 1 after
R'sSTART TRANSACTION command.>> I think the following should work (or, rather, fail)
 

This email and the previous one are an awesome bit of information, can we add it to the docs somehow? Even if it's just
dumpingthe emails into a wiki page and referencing it?
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: How to reproduce serialization failure for a read only transaction.

От
AK
Дата:
This worked for me - thank you so much! The SELECT did fail.

Also I cannot reproduce a scenario when "applications must not depend on
results read during a transaction that later aborted;". In this example the
SELECT itself has failed.
Can you show an example where a SELECT completes, but the COMMIT blows up?



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/How-to-reproduce-serialization-failure-for-a-read-only-transaction-tp5785569p5785618.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: Re: How to reproduce serialization failure for a read only transaction.

От
Dan Ports
Дата:
On Mon, Jan 06, 2014 at 05:14:12PM -0800, AK wrote:
> Also I cannot reproduce a scenario when "applications must not depend on
> results read during a transaction that later aborted;". In this example the
> SELECT itself has failed.
> Can you show an example where a SELECT completes, but the COMMIT blows up?

Actually, no, not for a read-only transaction. It happens that the
final serialization failure check executed on COMMIT only affects
read/write transactions, not read-only ones. That's a pretty specific
implementation detail, though, so I wouldn't necessarily rely on it...

Here's an example of why applications must not depend on results read
during a transaction that later aborted:
  W2: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE  W2: UPDATE t SET count=1 WHERE id=1;  W1: BEGIN TRANSACTION
ISOLATIONLEVEL SERIALIZABLE  W1: SELECT * FROM t WHERE id=1;  W2: COMMIT;  R : BEGIN TRANSACTION ISOLATION LEVEL
SERIALIZABLEREAD ONLY  R : SELECT * FROM t;  R : COMMIT;! W1: UPDATE t SET count=1 WHERE id=2;  W1: COMMIT;
 

If you try this, it'll cause a serialization failure on the line marked
with a '!'. W1 saw (1,0) in the table, so W1 appears to have executed
before W2. But R saw both (1,1) and (2,0) in the table, and that has to
be a consistent snapshot of the database state, meaning W2 appears to
have executed before W1. That's an inconsistency, so something has to
be rolled back. This particular anomaly requires all three of the
transactions, and so it can't be detected until W1 does its UPDATE.
Postgres detects the conflict at that point and rolls back W1.

So what does this have to do with relying on the results of read-only
transactions that abort? Well, what if you had instead had R ROLLBACK
instead of COMMIT -- maybe because you expected ROLLBACK and COMMIT to
be equivalent for transactions that don't modify the database, or maybe
because something else caused the transaction to abort? When W1 does
its update, it will be checked for serialization failures, but aborted
transactions are (intentionally) not included in those checks. W1 is
therefore allowed to commit; the apparent serial order of execution is
W1 followed by W2, and the results of the aborted transaction R aren't
consistent with that.

Dan

-- 
Dan R. K. Ports                UW CSE                http://drkp.net/



Re: How to reproduce serialization failure for a read only transaction.

От
Kevin Grittner
Дата:
AK <alkuzo@gmail.com> wrote:

> I cannot have a read-only transaction fail because of
> serialization anomalies. Can someone show me a working example
> please?

A common case is a read-only transaction reading a closed batch
without seeing all of its entries.

http://wiki.postgresql.org/wiki/SSI#Read_Only_Transactions

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: How to reproduce serialization failure for a read only transaction.

От
Florian Pflug
Дата:
On Jan7, 2014, at 00:38 , Jim Nasby <jim@nasby.net> wrote:
> This email and the previous one are an awesome bit of information,
> can we add it to the docs somehow? Even if it's just dumping the
> emails into a wiki page and referencing it?

Most of what I wrote there can be found in README-SSE, I think,
under "Apparent Serial Order of Execution", "Heap locking" and
"Index AM implementations".

I guess it'd be nice if we explained these things in the docs
somewhere, though I'm not sure what level of detail would be
appropriate. Maybe a good compromise would be to explain dependency
graphs, but skip over the different kinds of dependencies (ww, rw, wr).
Instead we could say that whenever a transaction *does see* another
transaction's modifications it must appear after that transaction in any
serial schedule, and whenever a transaction *might see* another
transaction's modifications but doesn't due to begin/commit ordering
it must appear before that transaction.

best regards,
Florian Pflug




Re: Re: How to reproduce serialization failure for a read only transaction.

От
Kevin Grittner
Дата:
Dan Ports <drkp@csail.mit.edu> wrote:
> On Mon, Jan 06, 2014 at 05:14:12PM -0800, AK wrote:

> If you try this, it'll cause a serialization failure on the line
> marked with a '!'. W1 saw (1,0) in the table, so W1 appears to
> have executed before W2. But R saw both (1,1) and (2,0) in the
> table, and that has to be a consistent snapshot of the database
> state, meaning W2 appears to have executed before W1. That's an
> inconsistency, so something has to be rolled back. This
> particular anomaly requires all three of the transactions, and so
> it can't be detected until W1 does its UPDATE. Postgres detects
> the conflict at that point and rolls back W1.

Yeah, neither of the provided examples rolled back the read only
transaction itself; the read only transaction caused a situation
where something needed to be rolled back, but since we try to roll
back a transaction which has a good chance of succeeding on retry,
the read only transaction is not usually a good candidate.  I
created a new example on the Wiki page where the read only
transaction itself must be rolled back because both of the other
transactions involved have already committed:

https://wiki.postgresql.org/wiki/SSI#Rollover

Regarding other questions on the thread:

I have no objections to moving the Wiki examples into the docs, but
it seemed like a lot to include, and I'm not sure where it belongs.
Ideas?

Regarding the different results AK got, I set
default_transaction_isolation = 'serializable' on my connections
before running these for two reasons.
(1)  It keeps the examples more concise.
(2)  I think most people using serializable transactions in
PostgreSQL set the default and don't set the transaction isolation
level on each transaction, since (unlike strategies which rely on
blocking, like S2PL) all transactions must be participating in the
stricter isolation level for it to be reliable.  In fact, given the
performance benefits of declaring transactions READ ONLY when
possible, I have seen shops that make *that* a default, too, and
override it for transactions which need to write.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Re: How to reproduce serialization failure for a read only transaction.

От
Kevin Grittner
Дата:
AK <alkuzo@gmail.com> wrote:

> Session 1. Setting up:
>
> CREATE TABLE cars(
>   license_plate VARCHAR NOT NULL,
>   reserved_by VARCHAR NULL
> );
> INSERT INTO cars(license_plate)
> VALUES ('SUPRUSR'),('MIDLYPH');
>
> Session 2: W1
>
> BEGIN ISOLATION LEVEL SERIALIZABLE;
>
> UPDATE cars SET reserved_by = 'Julia'
>   WHERE license_plate = 'SUPRUSR'
>   AND reserved_by IS NULL;
>
> SELECT * FROM Cars
> WHERE license_plate IN('SUPRUSR','MIDLYPH');
>
> Session 3: W2
>
> BEGIN ISOLATION LEVEL SERIALIZABLE;
>
> UPDATE cars SET reserved_by = 'Ryan'
>   WHERE license_plate = 'MIDLYPH'
>   AND reserved_by IS NULL;
>
> COMMIT;
>
> Session 4: R
>
> BEGIN ISOLATION LEVEL SERIALIZABLE READ ONLY;
>
> SELECT * FROM Cars
> WHERE license_plate IN('SUPRUSR','MIDLYPH');
>
> Session 2: W1
>
> COMMIT;
>
> ERROR:  could not serialize access due to read/write dependencies
>         among transactions
>
> What am I doing wrong?

Even without the read only transaction the W1 and W2 transactions
are a classic case of write skew.  It looks like it might actually
be benign, since neither transaction is updating license_plate, but
serializable logic works at the row level, not the column level.
After both transactions update the table there is write skew which
must be resolved by cancelling one of the transactions.  The first
to commit "wins" and the other one will be cancelled when it
attempts to run its next statement, which may or may not be a
COMMIT.

If, for purposes of demonstration, you add a unique index on
license_plate and set enable_seqscan = off, you eliminate the
simple write skew and get into more complex ways of breaking
things.  With that tweak you can run all of those transactions if
W1 skips the SELECT.  You can let W1 do the SELECT as long as you
don't run R.  The problem is that the SELECT in W1 sees the work of
W1 but not W2 and the SELECT in R sees the work of W2 but not W1.
We can't allow that.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: How to reproduce serialization failure for a read only transaction.

От
AK
Дата:
Regarding this: >> So what does this have to do with relying on the results
of read-only 
transactions that abort? Well, what if you had instead had R ROLLBACK 
instead of COMMIT -- maybe because you expected ROLLBACK and COMMIT to 
be equivalent for transactions that don't modify the database, or maybe 
because something else caused the transaction to abort? When W1 does 
its update, it will be checked for serialization failures, but aborted 
transactions are (intentionally) not included in those checks. W1 is 
therefore allowed to commit; the apparent serial order of execution is 
W1 followed by W2, and the results of the aborted transaction R aren't 
consistent with that. <<

So if I am reading the data and then commit, I should be always fine,
correct?



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/How-to-reproduce-serialization-failure-for-a-read-only-transaction-tp5785569p5785757.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: Re: How to reproduce serialization failure for a read only transaction.

От
Kevin Grittner
Дата:
AK <alkuzo@gmail.com> wrote:

> So if I am reading the data and then commit, I should be always
> fine, correct?

If a serializable transaction successfully commits, that means that
all data read within that transaction can be trusted.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: How to reproduce serialization failure for a read only transaction.

От
Florian Pflug
Дата:
On Jan7, 2014, at 20:11 , Kevin Grittner <kgrittn@ymail.com> wrote:
> Yeah, neither of the provided examples rolled back the read only
> transaction itself;

Actually, the fixed version [1] of my example does.

[1] http://www.postgresql.org/message-id/8721AAD3-7A3A-4576-B10E-F2CBD1E5337A@phlo.org

best regards,
Florian Pflug




Re: How to reproduce serialization failure for a read only transaction.

От
Kevin Grittner
Дата:
Florian Pflug <fgp@phlo.org> wrote:
> On Jan7, 2014, at 20:11 , Kevin Grittner <kgrittn@ymail.com> wrote:

>> Yeah, neither of the provided examples rolled back the read only
>> transaction itself;
>
> Actually, the fixed version [1] of my example does.
>
> [1] http://www.postgresql.org/message-id/8721AAD3-7A3A-4576-B10E-F2CBD1E5337A@phlo.org

Due to my lame email provider, that post didn't show for me until I
had already replied.  :-(  You had already showed an example almost
exactly like what I described in my post.  I tweaked it a bit more
for the Wiki page to show more clearly why SSI has to care about
what the writing transaction reads.  For all the database engine
knows, what was read contributed to whether the application allowed
it to successfully commit.  By using the value from the SELECT in
the UPDATE it is easier to see why it matters, although it needs to
be considered either way.

In other words, we seem to be in full agreement, just using
different language to describe it.  :-)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company