Обсуждение: Postgresql MVCC, Read Committed Isolation Level and taking "snapshot"

Поиск
Список
Период
Сортировка

Postgresql MVCC, Read Committed Isolation Level and taking "snapshot"

От
Ivan Sergio Borgonovo
Дата:
My target is to take a snapshot of a slice of some tables and put the
result in a log, regardless of writes made after I started to take
the snapshot.

something like:

create or replace function snapshot(out stats int) as
$$
begin
-- prepare stuff from t1, t2, t3
-- save it in other tables
  return;
end;
$$ language plpgsql;


suppose I do something like:

update t1 set col1=7 where col2=5;
select stats from snapshot();
update t2 set col4=2 where col1=3;

from different connections but starting in that temporal order and
that snapshot is still running when the second update is fired.

I'd like snapshot() seeing what the first update did but NOT see what
the second update is doing.

I'd expect this to be "auto-magic" according to
http://searchwarp.com/swa9860.htm

Read Committed Isolation Level
Read Committed is the default isolation level in PostgreSQL. When a
transaction runs on this isolation level, a SELECT query sees only
data committed before the query began it never sees either
uncommitted data or changes committed during query execution by
concurrent.

but:

create table t1(a int);
insert into t1 values(1);

create or replace function ft(out a1 int, out a2 int) as
$$
begin
  select into a1 a from t1 limit 1;
  for i in 1..700000000 loop
  end loop;
  select into a2 a from t1 limit 1;
  return;
end;
$$ language plpgsql;

select * from ft();

update t1 set a=5;

I'd expect this function to return (1,1) or (5,5) and never (1,5).

Then I read:
http://www.postgresql.org/docs/8.1/static/transaction-iso.html#XACT-READ-COMMITTED

Notice that two successive SELECT commands can see different data,
even though they are within a single transaction, if other
transactions commit changes during execution of the first SELECT.

Is
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE
what I'm looking for?


--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: Postgresql MVCC, Read Committed Isolation Level and taking "snapshot"

От
Andrew Sullivan
Дата:
On Fri, May 16, 2008 at 09:06:11AM +0200, Ivan Sergio Borgonovo wrote:
> Is
> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE
> what I'm looking for?

Yes.

A

--
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/

Re: Postgresql MVCC, Read Committed Isolation Level and taking "snapshot"

От
Ivan Sergio Borgonovo
Дата:
On Fri, 16 May 2008 09:55:56 -0400
Andrew Sullivan <ajs@commandprompt.com> wrote:

> On Fri, May 16, 2008 at 09:06:11AM +0200, Ivan Sergio Borgonovo
> wrote:
> > Is
> > BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE
> > what I'm looking for?

> Yes.

Perfect, thanks.

What is the effect of having nested functions all declared
SERIALIZABLE?

What if just the outermost is declared SERIALIZABLE?

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: Postgresql MVCC, Read Committed Isolation Level and taking "snapshot"

От
"Douglas McNaught"
Дата:
On Sun, May 18, 2008 at 8:04 AM, Ivan Sergio Borgonovo
<mail@webthatworks.it> wrote:

> What is the effect of having nested functions all declared
> SERIALIZABLE?
>
> What if just the outermost is declared SERIALIZABLE?

SERIALIZABLE applies to the entire transaction, not to individual
function calls.

-Doug

Re: Postgresql MVCC, Read Committed Isolation Level and taking "snapshot"

От
Andrew Sullivan
Дата:
On Sun, May 18, 2008 at 02:04:56PM +0200, Ivan Sergio Borgonovo wrote:

> What is the effect of having nested functions all declared
> SERIALIZABLE?

You can't do that.  You have to declare SERIALIZABLE right at the
beginning of the transaction, or it won't work (and, IIRC, it throws
an error).  That means _before_ you call the first function (since
calling the function is then the first statement in the transaction,
before serializable).

A


--
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/

Re: Postgresql MVCC, Read Committed Isolation Level and taking "snapshot"

От
Ivan Sergio Borgonovo
Дата:
On Fri, 16 May 2008 09:55:56 -0400
Andrew Sullivan <ajs@commandprompt.com> wrote:

> On Fri, May 16, 2008 at 09:06:11AM +0200, Ivan Sergio Borgonovo
> wrote:
> > Is
> > BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE
> > what I'm looking for?
>
> Yes.

OK...

What if I want to avoid the rollback problem and avoid to deal with
the

ERROR:  could not serialize access due to concurrent update

I don't have to update/insert rows in the same tables I'm examining.
I just would like to have a coherent snapshot of some tables.

It should be something like:

1) check is some conditions are met with a bunch of selects and
computation on returned rows
2) if everything is OK copy a "slice" of the snapshot in *other*
tables.

It would be something like

select into _a, _b c, d from t1 join t2...
select into _c, _d c, d from t3 join t4...

if(...) ...
if(...) ...
if(everything is fine)

insert into t10 (a,b,h,i,l,m) select _a, _b, h, i, l, m
  from t1
  join t2...

I'm not interested into the fact that _a, _b, _c, _d, t1 or t2 may be
different when other transaction commit... I'm just interested in
"saving" in t10 a coherent state.

Since I'm going to save all the collected data into another table
other than the one that may be updated by another transaction am I
going to incur in the above ERROR: could not serialize...?

Can I reach my target of getting a coherent snapshot with a set of
"for share"?

Is there a set of examples that could highlight the differences of
"for share/update" and serializable with their pitfalls?

thanks


--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: Postgresql MVCC, Read Committed Isolation Level and taking "snapshot"

От
Andrew Sullivan
Дата:
On Tue, May 20, 2008 at 08:56:41PM +0200, Ivan Sergio Borgonovo wrote:

> I just would like to have a coherent snapshot of some tables.

If you have a multi-statement transaction, then if you are in READ
COMMITTED you can see changes, and if you are in SERIALIZABLE you
can't.  You can't of course see changes "in the same statement" as it
were -- that's not meaningful (because either they committed before
you saw the row or after).

So,

> 1) check is some conditions are met with a bunch of selects and
> computation on returned rows
> 2) if everything is OK copy a "slice" of the snapshot in *other*
> tables.

This is more than one statement.  So you will be able to see changes
in between those statements.  If you don't care about that, then your
approach will work.

A

--
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/

Re: Postgresql MVCC, Read Committed Isolation Level and taking "snapshot"

От
Simon Riggs
Дата:
On Tue, 2008-05-20 at 17:04 -0400, Andrew Sullivan wrote:
> On Tue, May 20, 2008 at 08:56:41PM +0200, Ivan Sergio Borgonovo wrote:
>
> > I just would like to have a coherent snapshot of some tables.
>
> If you have a multi-statement transaction, then if you are in READ
> COMMITTED you can see changes, and if you are in SERIALIZABLE you
> can't.

Unless the changes were made by yourself in the top-level transaction or
an un-aborted subtransaction, in which case they are visible.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: Postgresql MVCC, Read Committed Isolation Level and taking "snapshot"

От
Ivan Sergio Borgonovo
Дата:
On Tue, 20 May 2008 17:04:25 -0400
Andrew Sullivan <ajs@commandprompt.com> wrote:

> On Tue, May 20, 2008 at 08:56:41PM +0200, Ivan Sergio Borgonovo
> wrote:
>
> > I just would like to have a coherent snapshot of some tables.
>
> If you have a multi-statement transaction, then if you are in READ
> COMMITTED you can see changes, and if you are in SERIALIZABLE you
> can't.  You can't of course see changes "in the same statement" as
> it were -- that's not meaningful (because either they committed
> before you saw the row or after).

"read committed" is surely not enough and I'd like to avoid the side
effect of serializable that will force me to check if the
serializable transaction succeeded.

My target is writing something like:
- is the status I see "now" coherent with a set of rules?
 * yes, save a "summary" into another table
 * no, report an error to the client

Now if statements see changes I can't evaluate if the status is
coherent and/or I may save a summary that is different to the one I
checked.

It is not clear to me what does it mean:

http://www.postgresql.org/docs/8.1/interactive/transaction-iso.html

"Note that only updating transactions may need to be retried;
read-only transactions will never have serialization conflicts."

Then:
"because a serializable transaction cannot modify or lock rows
changed by other transactions after the serializable transaction
began."

Now, I'm not writing to the "source" tables, the one that have to be
checked for coherency. But I'm writing elsewhere.
I'm not concerned if what I save in the "summary" is different to
what I'll find in the DB once the "logging" function reach an end. I
just want to save a coherent status.

The logging/summary function will end in some
insert into logtable1 select ... from source1 join source2
insert into logtable2 select ... from source3 join source4

the above select will still have to see the same snapshot as when the
transaction started. Other updates to the source tables may go
further I'm not interested in locking them.

To my understanding if source and destination tables are different,
and target row of destination will have different pk/fk
serializable should provide what I need without being worried of the
ERROR:  could not serialize access due to concurrent update


Since I'm not writing to the source tables I thought I could try the
"for share" path avoiding the "serializable".
In my situation "select for share" shouldn't miss rows if another
transaction update the source tables.
The selects are going to return the same set of rows... just with
different data inside.
If another transaction commit and they have to be re-evaluated this
should just mean they should still take a coherent snapshot... (?)

But then... that should mean that the overall function is going to be
re-evaluated.
eg.

select into _a a from source1 where condition for share.
if(_a is null) then

What's going to happen to this code if another transaction update the
rows involved?

Will the if condition be re-evaluated?

What is going to happen if I've something like

insert into target select from source for share?

If I've to make any difference between using "for share" and
serializable in my case I'd say:
- "for share" inside a transaction may miss "added" rows
- if there is another transaction updating the row interested into
"for share" the "logging" function will save an AFTER commit view
while serializable will save a BEFORE commit view and if something
get into the way it will abort.
- if writing into a different target with warranty of not overlapping
write I shouldn't incur in aborted transaction and make the
implementation easier and faster, since changed row won't force a
re-evaluation of the selects.

Explicit locking doesn't seem the way to go since I'm not writing to
the source table... so other transactions shouldn't be blocked from
writing to the source tables provided I can have a snapshot of the DB
at a certain time.

How does it sound?

--
Ivan Sergio Borgonovo
http://www.webthatworks.it