Обсуждение: Bit by "commands ignored until end of transaction block" again

Поиск

Список

Период

Сортировка

Bit by "commands ignored until end of transaction block" again

От

Glenn Maynard

Дата:

23 июля 2009 г., 05:06:18

Postgres kills a transaction when an error happens.  This is a pain;
it assumes that all SQL errors are unexpected and fatal to the
transaction.

There's a very simple case where it's not: UNIQUE checks.  I'm
generating a cache, with a simple flow:- Search for the cache key; if it exists, return its value.- If it didn't exist,
createthe data based on the key, insert it
 
into the table, and return it.

This has an obvious race: another thread looks up the same key and
creates it between the search and the insert.  Both threads will
create the cached data, thread A will insert it into the table, and
thread B will get an integrity error when it tries to insert it, since
it duplicates the unique key.

Here, by far the simplest fix is simply to ignore the integrity error.Both threads generated the same data; the failed
insertis expected
 
and harmless.  Postgres is turning this into a fatal error.

There's so much that could make this trivially easy:
- SQLite has the handy ON CONFLICT IGNORE, but Postgres has nothing
like that.  (ON CONFLICT REPLACE is great, too.)- Let me use SAVEPOINT outside of a transaction, with the effect of
starting a transaction with the savepoint and ending it when it's
committed.  Then, I could use savepoints without needing to know
whether I'm already in a transaction or not; one would simply be
started and committed for me if necessary.  (That's by far my biggest
issue with savepoints: they force me to either specify "a transaction
must be open when this function is called", or need to be able to
query whether one is running to decide whether to start a transaction
or a savepoint.  My function's use of transactions should be invisible
to the caller.)- Let me disable this error.  I don't want it.  (We're grownups; we
can decide for ourselves which errors are fatal.)

The first two are cleaner, since ignoring the error means I might
ignore some other integrity error from the same statement, but I can
live with that.

Lacking anything better, I'll probably end up dropping out of the ORM
and using some uglier SQL to work around this, but this is so trivial
that it's silly to have to do that.  I can't do it within the ORM; it
doesn't have the vocabulary.

Any tricks I'm missing?  It feels like Postgres is fighting me at
every turn with this one, and this isn't the first time I've had this
problem.

-- 
Glenn Maynard

Re: Bit by "commands ignored until end of transaction block" again

От

Richard Huxton

Дата:

23 июля 2009 г., 05:31:47

Glenn Maynard wrote:
> Postgres kills a transaction when an error happens.  This is a pain;
> it assumes that all SQL errors are unexpected and fatal to the
> transaction.
> 
> There's a very simple case where it's not: UNIQUE checks. 

Ah, it's usually "syntax errors" wrt interactive sessions.
> I'm
> generating a cache, with a simple flow:
>  - Search for the cache key; if it exists, return its value.
>  - If it didn't exist, create the data based on the key, insert it
> into the table, and return it.
> 
> This has an obvious race: another thread looks up the same key and
> creates it between the search and the insert.  Both threads will
> create the cached data, thread A will insert it into the table, and
> thread B will get an integrity error when it tries to insert it, since
> it duplicates the unique key.

Yep.

> Here, by far the simplest fix is simply to ignore the integrity error.
>  Both threads generated the same data; the failed insert is expected
> and harmless.  Postgres is turning this into a fatal error.

Well, all errors are considered fatal. But same difference.

> There's so much that could make this trivially easy:
> 
>  - SQLite has the handy ON CONFLICT IGNORE, but Postgres has nothing
> like that.  (ON CONFLICT REPLACE is great, too.)

True. Been proposed. Fiddly to implement for all use-cases if I remember 
correctly.

>  - Let me use SAVEPOINT outside of a transaction,

You are never outside a transaction. All queries are executed within a 
transaction.
> with the effect of
> starting a transaction with the savepoint and ending it when it's
> committed.  Then, I could use savepoints without needing to know
> whether I'm already in a transaction or not; one would simply be
> started and committed for me if necessary.  (That's by far my biggest
> issue with savepoints: they force me to either specify "a transaction
> must be open when this function is called", or need to be able to
> query whether one is running to decide whether to start a transaction
> or a savepoint.  My function's use of transactions should be invisible
> to the caller.)

I think this is the root of your problem - all queries are within a 
transaction so either:
1. You have a transaction that wraps a single statement. If you get an 
error then only that statement was affected.
2. You have an explicit BEGIN...COMMIT transaction which could use a 
savepoint.

Nothing to stop you setting savepoints in #1 (although they're not much 
use).

Typically, if you're in a plpgsql function you would just catch "unique" 
exception codes from your insert. Or, update, see if any rows were 
affected, if not try an insert and if that gives a duplicate go back and 
try the update. You might want the second approach if 99% of the time 
the cache is already populated.

>  - Let me disable this error.  I don't want it.  (We're grownups; we
> can decide for ourselves which errors are fatal.)

You could always try submitting a patch. However, that's really what 
savepoints do - let you decide whether an error can be worked around.

> The first two are cleaner, since ignoring the error means I might
> ignore some other integrity error from the same statement, but I can
> live with that.
> 
> Lacking anything better, I'll probably end up dropping out of the ORM
> and using some uglier SQL to work around this, but this is so trivial
> that it's silly to have to do that.  I can't do it within the ORM; it
> doesn't have the vocabulary.

The ORM can't control transactions, can't call functions or can't set 
savepoints?

--   Richard Huxton  Archonet Ltd

Re: Bit by "commands ignored until end of transaction block" again

От

Glenn Maynard

Дата:

23 июля 2009 г., 06:11:53

On Thu, Jul 23, 2009 at 1:31 AM, Richard Huxton<dev@archonet.com> wrote:
>>  - Let me use SAVEPOINT outside of a transaction,
>
> You are never outside a transaction. All queries are executed within a
> transaction.

"Transaction block", then, if you insist.

> I think this is the root of your problem - all queries are within a
> transaction so either:
> 1. You have a transaction that wraps a single statement. If you get an error
> then only that statement was affected.
> 2. You have an explicit BEGIN...COMMIT transaction which could use a
> savepoint.

Savepoints can only be used inside transaction blocks.  My function
has no idea whether it's being called inside a transaction block.

From inside a transaction block, my function would need to call
SAVEPOINT/RELEASE SAVEPOINT.

If it's not in a transaction block, it needs to call BEGIN/COMMIT
instead.  SAVEPOINT will fail with "SAVEPOINT can only be used in
transaction blocks".

This would be very simple and clean if the SAVEPOINT command
transparently issued BEGIN if executed outside of a transaction block,
marking the savepoint so it knows that when the savepoint is released
or rolled back, the associated transaction block needs to be committed
or rolled back, too.  At that point, you could stop using
BEGIN/COMMIT/ROLLBACK entirely, and just let savepoints do it, if you
wanted--with this, the transaction commands are essentially redundant.

I can't count the number of times I've wished for this.

> Typically, if you're in a plpgsql function you would just catch "unique"
> exception codes from your insert. Or, update, see if any rows were affected,
> if not try an insert and if that gives a duplicate go back and try the
> update. You might want the second approach if 99% of the time the cache is
> already populated.

It's just a simple INSERT, generated from a Model.objects.create() in Django.

>> Lacking anything better, I'll probably end up dropping out of the ORM
>> and using some uglier SQL to work around this, but this is so trivial
>> that it's silly to have to do that.  I can't do it within the ORM; it
>> doesn't have the vocabulary.
>
> The ORM can't control transactions, can't call functions or can't set
> savepoints?

It can't write the necessary SQL to say "insert this unless it already
exists", namely:
INSERT INTO cache (key, data) (    SELECT i.key, "data",    FROM        (VALUES ("key")) AS i(key)        LEFT JOIN
cacheprior_entry ON (prior_entry.key = "key")    WHERE        prior_entry.key IS NULL) 

It--Django--also doesn't have a mature transaction/savepoint system;
in fact, its transaction handling is an absolute mess.  I've written
helpers for my main codebase that simply says "wrap this in a
transaction block if one isn't already started, otherwise wrap it in a
savepoint".  I don't want to use that code here, because it's nitty
code: it needs to poke at Django internals to figure out whether it's
in a transaction block or not, and dealing with other API
compatibility issues.

--
Glenn Maynard

Re: Bit by "commands ignored until end of transaction block" again

От

Richard Huxton

Дата:

23 июля 2009 г., 06:41:23

Glenn Maynard wrote:
> On Thu, Jul 23, 2009 at 1:31 AM, Richard Huxton<dev@archonet.com> wrote:
>>>  - Let me use SAVEPOINT outside of a transaction,
>> You are never outside a transaction. All queries are executed within a
>> transaction.
> 
> "Transaction block", then, if you insist.
> 
>> I think this is the root of your problem - all queries are within a
>> transaction so either:
>> 1. You have a transaction that wraps a single statement. If you get an error
>> then only that statement was affected.
>> 2. You have an explicit BEGIN...COMMIT transaction which could use a
>> savepoint.
> 
> Savepoints can only be used inside transaction blocks.

Says who? Wouldn't work in plpgsql as exception handling if that was the 
case.
>  My function
> has no idea whether it's being called inside a transaction block.
> 
> From inside a transaction block, my function would need to call
> SAVEPOINT/RELEASE SAVEPOINT.
> 
> If it's not in a transaction block, it needs to call BEGIN/COMMIT
> instead.  SAVEPOINT will fail with "SAVEPOINT can only be used in
> transaction blocks".

Ah [cue light-bulb effect], I think I understand. Your function isn't in 
the database is it? Surely your application knows if it's issuing 
BEGIN..COMMIT?

> This would be very simple and clean if the SAVEPOINT command
> transparently issued BEGIN if executed outside of a transaction block,
> marking the savepoint so it knows that when the savepoint is released
> or rolled back, the associated transaction block needs to be committed
> or rolled back, too.  At that point, you could stop using
> BEGIN/COMMIT/ROLLBACK entirely, and just let savepoints do it, if you
> wanted--with this, the transaction commands are essentially redundant.
> 
> I can't count the number of times I've wished for this.

You'll have people with torches and pitchforks after you if you change 
RELEASE SAVEPOINT to mean COMMIT. I might even lend them my pitchfork.

>> Typically, if you're in a plpgsql function you would just catch "unique"
>> exception codes from your insert. Or, update, see if any rows were affected,
>> if not try an insert and if that gives a duplicate go back and try the
>> update. You might want the second approach if 99% of the time the cache is
>> already populated.
> 
> It's just a simple INSERT, generated from a Model.objects.create() in Django.

OK, it sounds like create() isn't the method to call then, since you 
don't want to generate a simple INSERT.

>>> Lacking anything better, I'll probably end up dropping out of the ORM
>>> and using some uglier SQL to work around this, but this is so trivial
>>> that it's silly to have to do that.  I can't do it within the ORM; it
>>> doesn't have the vocabulary.
>> The ORM can't control transactions, can't call functions or can't set
>> savepoints?
> 
> It can't write the necessary SQL to say "insert this unless it already
> exists", namely:
> 
>     INSERT INTO cache (key, data) (
>         SELECT i.key, "data",
>         FROM
>             (VALUES ("key")) AS i(key)
>             LEFT JOIN cache prior_entry ON (prior_entry.key = "key")
>         WHERE
>             prior_entry.key IS NULL
>     )

Could it generate: "SELECT ensure_cache_contains(key,data)"? Then ten 
lines of plpgsql will neatly encapsulate the problem. That plpgsql can 
be automatically generated easily enough too.

I know nothing of Django, but perhaps it's possible to subclass Model 
and add an "ensure" method that will call your plpgsql function?

> It--Django--also doesn't have a mature transaction/savepoint system;
> in fact, its transaction handling is an absolute mess.  I've written
> helpers for my main codebase that simply says "wrap this in a
> transaction block if one isn't already started, otherwise wrap it in a
> savepoint".  I don't want to use that code here, because it's nitty
> code: it needs to poke at Django internals to figure out whether it's
> in a transaction block or not, and dealing with other API
> compatibility issues.

Ah, the joys of badly designed ORMs. The nice thing is that there seem 
to be plenty of bad ones to choose from too. If your ORM doesn't handle 
transactions well, the more you use it the more difficult your life will 
become. I'd be tempted to tidy up your existing fixes and wrap Django's 
ORM as cleanly as you can. That's assuming they're not interested in 
patches.

--   Richard Huxton  Archonet Ltd

Re: Bit by "commands ignored until end of transaction block" again

От

Joshua Tolley

Дата:

23 июля 2009 г., 06:47:28

On Thu, Jul 23, 2009 at 02:04:53AM -0400, Glenn Maynard wrote:
> On Thu, Jul 23, 2009 at 1:31 AM, Richard Huxton<dev@archonet.com> wrote:
> >>  - Let me use SAVEPOINT outside of a transaction,
> >
> > You are never outside a transaction. All queries are executed within a
> > transaction.
>
> "Transaction block", then, if you insist.
>
> > I think this is the root of your problem - all queries are within a
> > transaction so either:
> > 1. You have a transaction that wraps a single statement. If you get an error
> > then only that statement was affected.
> > 2. You have an explicit BEGIN...COMMIT transaction which could use a
> > savepoint.
>
> Savepoints can only be used inside transaction blocks.  My function
> has no idea whether it's being called inside a transaction block.
>
> From inside a transaction block, my function would need to call
> SAVEPOINT/RELEASE SAVEPOINT.
>
> If it's not in a transaction block, it needs to call BEGIN/COMMIT
> instead.  SAVEPOINT will fail with "SAVEPOINT can only be used in
> transaction blocks".

Have you tried this? I expect if you give it a shot, you'll find you don't
actually have this problem. Really, everything is always in a transaction. If
you haven't explicitly opened one, PostgreSQL opens one for you before each
statement, and issues a COMMIT afterwards (or a ROLLBACK, if your statement
ran into an error). Statements within functions are always executed within the
same transaction, so you can issue SAVEPOINT commands anywhere in PL/pgSQL
functions without problems, because you're always in a transaction.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com

Re: Bit by "commands ignored until end of transaction block" again

От

Chris

Дата:

23 июля 2009 г., 07:07:02

Joshua Tolley wrote:
> On Thu, Jul 23, 2009 at 02:04:53AM -0400, Glenn Maynard wrote:
>> On Thu, Jul 23, 2009 at 1:31 AM, Richard Huxton<dev@archonet.com> wrote:
>>>>  - Let me use SAVEPOINT outside of a transaction,
>>> You are never outside a transaction. All queries are executed within a
>>> transaction.
>> "Transaction block", then, if you insist.
>>
>>> I think this is the root of your problem - all queries are within a
>>> transaction so either:
>>> 1. You have a transaction that wraps a single statement. If you get an error
>>> then only that statement was affected.
>>> 2. You have an explicit BEGIN...COMMIT transaction which could use a
>>> savepoint.
>> Savepoints can only be used inside transaction blocks.  My function
>> has no idea whether it's being called inside a transaction block.
>>
>> From inside a transaction block, my function would need to call
>> SAVEPOINT/RELEASE SAVEPOINT.
>>
>> If it's not in a transaction block, it needs to call BEGIN/COMMIT
>> instead.  SAVEPOINT will fail with "SAVEPOINT can only be used in
>> transaction blocks".
> 
> Have you tried this? I expect if you give it a shot, you'll find you don't
> actually have this problem. Really, everything is always in a transaction.

Each statement is in it's own transaction, but the problem (as I 
understand it) is that you're in this sort of situation:

psql -d dbname
..
# select now();              now
------------------------------- 2009-07-23 17:04:21.406424+10
(1 row)

Time: 2.434 ms
(csmith@[local]:5432) 17:04:21 [test]
# savepoint xyz;
ERROR:  SAVEPOINT can only be used in transaction blocks
(csmith@[local]:5432) 17:04:25 [test]


You haven't explicitly started a transaction, therefore savepoints won't 
work.

Django (it seems) just issues queries with no knowledge of (and no way 
to support) them.

-- 
Postgresql & php tutorials
http://www.designmagick.com/

Re: Bit by "commands ignored until end of transaction block" again

От

Thomas Kellerer

Дата:

23 июля 2009 г., 07:32:39

Chris, 23.07.2009 09:06:
> psql -d dbname
> ..
> # select now();
>               now
> -------------------------------
>  2009-07-23 17:04:21.406424+10
> (1 row)
> 
> Time: 2.434 ms
> (csmith@[local]:5432) 17:04:21 [test]
> # savepoint xyz;
> ERROR:  SAVEPOINT can only be used in transaction blocks
> (csmith@[local]:5432) 17:04:25 [test]
> 
> You haven't explicitly started a transaction, therefore savepoints won't 
> work.
> 
> Django (it seems) just issues queries with no knowledge of (and no way 
> to support) them.

The above situation only arises if you run in autocommit mode which is the default for psql (which I have *never*
understood).
 

If you do a "\set AUTOCOMMIT off", then you can set a savepoint without using BEGIN. I have this in my psqlrc.conf and
yourexample looks like this on my computer:
 

c:\Temp>psql training thomas
psql (8.4.0)
Type "help" for help.

training=> select now();           now
----------------------------2009-07-23 09:30:55.791+02
(1 row)


training=> savepoint abc;
SAVEPOINT
training=> release abc;
RELEASE
training=>


I don't believe any serious ORM would run in autocommit mode, so that shouldn't be a problem. 

Thomas

Re: Bit by "commands ignored until end of transaction block" again

От

Glenn Maynard

Дата:

23 июля 2009 г., 07:39:36

On Thu, Jul 23, 2009 at 2:41 AM, Richard Huxton<dev@archonet.com> wrote:
> Ah [cue light-bulb effect], I think I understand. Your function isn't in the
> database is it? Surely your application knows if it's issuing BEGIN..COMMIT?

I'm writing a Python library call.  It has no idea whether the caller
happens to be inside a transaction already, and I don't want to
specify something like "always run this inside a transaction".
(Callers are equally likely to want to do either, and it's bad API to
force them to start a transaction--the fact that I'm using the
database at al should be transparent.)

> You'll have people with torches and pitchforks after you if you change
> RELEASE SAVEPOINT to mean COMMIT. I might even lend them my pitchfork.

RELEASE SAVEPOINT would only COMMIT the transaction *if* the savepoint
that it's releasing started it.  Every currently-valid case requires
that a transaction is already started, so no existing code would be
affected by this.

SAVEPOINT a; -- implicitly issues BEGIN because one wasn't started
RELEASE SAVEPOINT a; -- implicitly issues COMMIT because savepoint "a"
issued the BEGIN, not the user

BEGIN;
SAVEPOINT a;
RELEASE SAVEPOINT a; -- will not commit, because savepoint "a" didn't
start the transaction

Of course, there are other details--it probably shouldn't allow
ROLLBACK or COMMIT on an implicit transaction block, for example.

> Could it generate: "SELECT ensure_cache_contains(key,data)"? Then ten lines
> of plpgsql will neatly encapsulate the problem. That plpgsql can be
> automatically generated easily enough too.

I don't think so, at least not without digging into internals.  Django
is built around knowing all data types, so it'd need to be givne types
explicitly--for example, to know whether a timestamp should be
formatted as a timestamp, date or time.  (I do have a couple other
columns here--timestamps for cache expiration, etc.)  I'll have to ask
Django-side if there's a public API to do this, but I don't think
there is.

> Ah, the joys of badly designed ORMs. The nice thing is that there seem to be
> plenty of bad ones to choose from too. If your ORM doesn't handle
> transactions well, the more you use it the more difficult your life will
> become. I'd be tempted to tidy up your existing fixes and wrap Django's ORM
> as cleanly as you can. That's assuming they're not interested in patches.

The ORM on a whole is decent, but there are isolated areas where it's
very braindamaged--this is one of them.  They have a stable-release
API-compatibility policy, which I think just gets them stuck with some
really bad decisions for a long time.

-- 
Glenn Maynard

Re: Bit by "commands ignored until end of transaction block" again

От

Richard Huxton

Дата:

23 июля 2009 г., 08:07:58

Glenn Maynard wrote:
> On Thu, Jul 23, 2009 at 2:41 AM, Richard Huxton<dev@archonet.com> wrote:
>> Ah [cue light-bulb effect], I think I understand. Your function isn't in the
>> database is it? Surely your application knows if it's issuing BEGIN..COMMIT?
> 
> I'm writing a Python library call.  It has no idea whether the caller
> happens to be inside a transaction already, and I don't want to
> specify something like "always run this inside a transaction".
> (Callers are equally likely to want to do either, and it's bad API to
> force them to start a transaction--the fact that I'm using the
> database at al should be transparent.)

That last bit is never going to work. There always needs to be some 
basic level of understanding between systems and transactions really 
have to be part of that for talking to a RDBMS. There will have to be a 
piece of code responsible for managing transactions somewhere in the 
middleware/application layers.

>> You'll have people with torches and pitchforks after you if you change
>> RELEASE SAVEPOINT to mean COMMIT. I might even lend them my pitchfork.
> 
> RELEASE SAVEPOINT would only COMMIT the transaction *if* the savepoint
> that it's releasing started it.  Every currently-valid case requires
> that a transaction is already started, so no existing code would be
> affected by this.
> 
> SAVEPOINT a; -- implicitly issues BEGIN because one wasn't started
> RELEASE SAVEPOINT a; -- implicitly issues COMMIT because savepoint "a"
> issued the BEGIN, not the user
[snip]
> Of course, there are other details--it probably shouldn't allow
> ROLLBACK or COMMIT on an implicit transaction block, for example.

All you're doing here is moving the point of confusion around, surely? 
At some point you still need to know whether you can issue 
BEGIN/ROLLBACK/COMMIT etc.

>> Could it generate: "SELECT ensure_cache_contains(key,data)"? Then ten lines
>> of plpgsql will neatly encapsulate the problem. That plpgsql can be
>> automatically generated easily enough too.
> 
> I don't think so, at least not without digging into internals.  Django
> is built around knowing all data types, so it'd need to be givne types
> explicitly--for example, to know whether a timestamp should be
> formatted as a timestamp, date or time.  (I do have a couple other
> columns here--timestamps for cache expiration, etc.)  I'll have to ask
> Django-side if there's a public API to do this, but I don't think
> there is.

Well, the types would be exactly the same as for your existing insert. 
All it's really doing is changing the template those values get 
substituted into. It presumably does mean patching the ORM (or 
subclassing from it anyway).

>> Ah, the joys of badly designed ORMs. The nice thing is that there seem to be
>> plenty of bad ones to choose from too. If your ORM doesn't handle
>> transactions well, the more you use it the more difficult your life will
>> become. I'd be tempted to tidy up your existing fixes and wrap Django's ORM
>> as cleanly as you can. That's assuming they're not interested in patches.
> 
> The ORM on a whole is decent, but there are isolated areas where it's
> very braindamaged--this is one of them.  They have a stable-release
> API-compatibility policy, which I think just gets them stuck with some
> really bad decisions for a long time.

Presumably they targetted MySQL first, where there's a lot less use in 
multi-statement transactions with their different behaviour of their 
various storage-engines.

--   Richard Huxton  Archonet Ltd

Re: Bit by "commands ignored until end of transaction block" again

От

Stephen Frost

Дата:

23 июля 2009 г., 10:56:47

* Glenn Maynard (glenn@zewt.org) wrote:
> > The ORM can't control transactions, can't call functions or can't set
> > savepoints?
>
> It can't write the necessary SQL to say "insert this unless it already
> exists", namely:

If it can't cleanly handle failure cases like this one, then I think
your issue is with your ORM and not with PG.  An INSERT failing on a
uniqueness violation is actually a rather big deal in a relational
database and not erroring on it goes quite against data integrity
considerations.

If your ORM could call a function instead, you could handle the insert
and error-check in the function, to make up for the lack of intelligence
in the ORM.  Another option would be to have a 'fake' table, which has
no rows in it and just has an 'ON INSERT' trigger that calls a function
to handle this.  That could also be a view with a do-instead rule, if
the ORM has to query the same table.

I would think the time would better be spent fixing the ORM though.
Thanks,
    Stephen

Re: Bit by "commands ignored until end of transaction block" again

От

Adrian Klaver

Дата:

23 июля 2009 г., 14:20:28

On Thursday 23 July 2009 12:39:23 am Glenn Maynard wrote:

>
> The ORM on a whole is decent, but there are isolated areas where it's
> very braindamaged--this is one of them.  They have a stable-release
> API-compatibility policy, which I think just gets them stuck with some
> really bad decisions for a long time.
>
> --
> Glenn Maynard

None of the options listed in the URL below work?:
http://docs.djangoproject.com/en/dev/topics/db/transactions/#topics-db-transactions

This is the development version of the docs so may contain some new options. In 
particular look at Savepoint rollback and  Database-level autocommit.

-- 
Adrian Klaver
aklaver@comcast.net

Re: Bit by "commands ignored until end of transaction block" again

От

Craig Ringer

Дата:

24 июля 2009 г., 09:43:59

On Thu, 2009-07-23 at 17:06 +1000, Chris wrote:
> Joshua Tolley wrote:
> > On Thu, Jul 23, 2009 at 02:04:53AM -0400, Glenn Maynard wrote:
> >> On Thu, Jul 23, 2009 at 1:31 AM, Richard Huxton<dev@archonet.com> wrote:
> >>>>  - Let me use SAVEPOINT outside of a transaction,
> >>> You are never outside a transaction. All queries are executed within a
> >>> transaction.
> >> "Transaction block", then, if you insist.
> >>
> >>> I think this is the root of your problem - all queries are within a
> >>> transaction so either:
> >>> 1. You have a transaction that wraps a single statement. If you get an error
> >>> then only that statement was affected.
> >>> 2. You have an explicit BEGIN...COMMIT transaction which could use a
> >>> savepoint.
> >> Savepoints can only be used inside transaction blocks.  My function
> >> has no idea whether it's being called inside a transaction block.
> >>
> >> From inside a transaction block, my function would need to call
> >> SAVEPOINT/RELEASE SAVEPOINT.
> >>
> >> If it's not in a transaction block, it needs to call BEGIN/COMMIT
> >> instead.  SAVEPOINT will fail with "SAVEPOINT can only be used in
> >> transaction blocks".
> > 
> > Have you tried this? I expect if you give it a shot, you'll find you don't
> > actually have this problem. Really, everything is always in a transaction.

[snip]

> You haven't explicitly started a transaction, therefore savepoints won't 
> work.

True. However, he's talking about code within a PL/PgSQL function. To a
PL/PgSQL function there is NO difference between:


begin;
select my_function();
commit;

and a standalone:

select my_function();

in both cases the statement executes in a transaction, and in both cases
individual statements within the function are within the same
transaction. That's why any function can EXCEPTION blocks, etc, which
rely on savepoints.

-- 
Craig Ringer

Re: Bit by "commands ignored until end of transaction block" again

От

Craig Ringer

Дата:

24 июля 2009 г., 09:52:41

On Thu, 2009-07-23 at 03:39 -0400, Glenn Maynard wrote:

> I'm writing a Python library call.  It has no idea whether the caller
> happens to be inside a transaction already, and I don't want to
> specify something like "always run this inside a transaction".
> (Callers are equally likely to want to do either, and it's bad API to
> force them to start a transaction--the fact that I'm using the
> database at al should be transparent.)

Personally, I'd think about moving the function into the database, using
PL/PgSQL or even PL/PythonU if you have to.

Why should DB use be transparent when you're modifying the DB? In one
case you immediately make a change. In another case, you schedule a
change to be applied if/when the current transaction commits, so the
change may or may not occur at some point in the future. That is, IMO, a
big difference.

Most applications with this sort of thing will have app-level
transaction APIs that contain and manage the DB-level ones anyway.

> RELEASE SAVEPOINT would only COMMIT the transaction *if* the savepoint
> that it's releasing started it.

So, what you're really asking for boils down to nestable transactions?

-- 
Craig Ringer

Re: Bit by "commands ignored until end of transaction block" again

От

Alvaro Herrera

Дата:

24 июля 2009 г., 16:56:41

Joshua Tolley escribió:

> Have you tried this? I expect if you give it a shot, you'll find you don't
> actually have this problem. Really, everything is always in a transaction. If
> you haven't explicitly opened one, PostgreSQL opens one for you before each
> statement, and issues a COMMIT afterwards (or a ROLLBACK, if your statement
> ran into an error). Statements within functions are always executed within the
> same transaction, so you can issue SAVEPOINT commands anywhere in PL/pgSQL
> functions without problems, because you're always in a transaction.

No, actually you can't call SAVEPOINT inside a PL/pgSQL function (or any
SPI user for that matter -- have you tried savepoints in LOLCODE?)
Inside PL/pgSQL the only way to use savepoints is with EXCEPTION blocks.

You are correct that you're always in a transaction, but in this context
not all transactions are equal :-(

(The problem, as we found out, is that the function must always have
control at the same level of transaction nestedness in SPI; you can't
just let the user define and release savepoints arbitrarily.)

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Bit by "commands ignored until end of transaction block" again

От

Joshua Tolley

Дата:

24 июля 2009 г., 18:10:45

On Fri, Jul 24, 2009 at 12:54:31PM -0400, Alvaro Herrera wrote:
> Joshua Tolley escribió:
>
> > Have you tried this? I expect if you give it a shot, you'll find you don't
> > actually have this problem. Really, everything is always in a transaction. If
> > you haven't explicitly opened one, PostgreSQL opens one for you before each
> > statement, and issues a COMMIT afterwards (or a ROLLBACK, if your statement
> > ran into an error). Statements within functions are always executed within the
> > same transaction, so you can issue SAVEPOINT commands anywhere in PL/pgSQL
> > functions without problems, because you're always in a transaction.
>
> No, actually you can't call SAVEPOINT inside a PL/pgSQL function (or any
> SPI user for that matter -- have you tried savepoints in LOLCODE?)
> Inside PL/pgSQL the only way to use savepoints is with EXCEPTION blocks.
>
> You are correct that you're always in a transaction, but in this context
> not all transactions are equal :-(
>
> (The problem, as we found out, is that the function must always have
> control at the same level of transaction nestedness in SPI; you can't
> just let the user define and release savepoints arbitrarily.)

That makes sense -- and although I did try this before sending the email,
apparently I didn't try it well enough. :)

- Josh / eggyknap

Re: Bit by "commands ignored until end of transaction block" again

От

Glenn Maynard

Дата:

26 июля 2009 г., 21:28:53

> The above situation only arises if you run in autocommit mode which is the default for psql (which I have *never*
understood).

This is the short answer, in practice--assume that either a
transaction is started or will be started by the SAVEPOINT command,
and that if a COMMIT is needed (as a result of the SAVEPOINT or which
was already needed), that the caller will do it.

(I hate non-autocommit.  It defies basic code design instincts, which
tell me that whoever starts a transaction should finish it.  I
shouldn't be issuing a non-autocommit SAVEPOINT/RELEASE, and then
assuming the caller will COMMIT the transaction that was started
automatically.  I'm stuck with it in Django.  Yuck, but oh well;
battling the framework's idioms isn't going to help anything.)

On Thu, Jul 23, 2009 at 4:06 AM, Richard Huxton<dev@archonet.com> wrote:
>> I'm writing a Python library call.  It has no idea whether the caller
>> happens to be inside a transaction already, and I don't want to
>> specify something like "always run this inside a transaction".
>> (Callers are equally likely to want to do either, and it's bad API to
>> force them to start a transaction--the fact that I'm using the
>> database at al should be transparent.)
>
> That last bit is never going to work. There always needs to be some basic
> level of understanding between systems and transactions really have to be
> part of that for talking to a RDBMS. There will have to be a piece of code
> responsible for managing transactions somewhere in the
> middleware/application layers.

It's never 100% transparent--the case of making calls during a
transaction and then rolling the whole thing back still needs to be
documented.  The point, though, is that this isn't a database-centric
operation, so it shouldn't have usage restrictions like "must always"
or "must never be inside a transaction".

> All you're doing here is moving the point of confusion around, surely? At
> some point you still need to know whether you can issue
> BEGIN/ROLLBACK/COMMIT etc.

Not at all--I don't need to use any of these commands.  I just do this:

SAVEPOINT s;
INSERT INTO table ...;
RELEASE SAVEPOINT s;

to guarantee that my code's effect on the database is atomic.

someone else wrote:
> So, what you're really asking for boils down to nestable transactions?

That's how I've thought of savepoints from day one.  When I use them
in Python code, I use a with_transaction wrapper, which transparently
uses a transaction or a savepoint.

--
Glenn Maynard

Re: Bit by "commands ignored until end of transaction block" again

От

Science

Дата:

26 июля 2009 г., 23:15:26

> Date: Thu, 23 Jul 2009 09:06:50 +0100
> From: Richard Huxton <dev@archonet.com>
> To: Glenn Maynard <glenn@zewt.org>
> Cc: pgsql-sql@postgresql.org
> Subject: Re: Bit by "commands ignored until end of transaction block"
>      again
> Message-ID: <4A681A1A.1090902@archonet.com>
> 
> Glenn Maynard wrote:
>> On Thu, Jul 23, 2009 at 2:41 AM, Richard Huxton<dev@archonet.com> wrote:
>>> Ah [cue light-bulb effect], I think I understand. Your function isn't in the
>>> database is it? Surely your application knows if it's issuing BEGIN..COMMIT?
>> I'm writing a Python library call.  It has no idea whether the caller
>> happens to be inside a transaction already, and I don't want to
>> specify something like "always run this inside a transaction".
>> (Callers are equally likely to want to do either, and it's bad API to
>> force them to start a transaction--the fact that I'm using the
>> database at al should be transparent.)
> 
> That last bit is never going to work. There always needs to be some 
> basic level of understanding between systems and transactions really 
> have to be part of that for talking to a RDBMS. There will have to be a 
> piece of code responsible for managing transactions somewhere in the 
> middleware/application layers.
> 
>>> You'll have people with torches and pitchforks after you if you change
>>> RELEASE SAVEPOINT to mean COMMIT. I might even lend them my pitchfork.
>> RELEASE SAVEPOINT would only COMMIT the transaction *if* the savepoint
>> that it's releasing started it.  Every currently-valid case requires
>> that a transaction is already started, so no existing code would be
>> affected by this.
>>
>> SAVEPOINT a; -- implicitly issues BEGIN because one wasn't started
>> RELEASE SAVEPOINT a; -- implicitly issues COMMIT because savepoint "a"
>> issued the BEGIN, not the user
> [snip]
>> Of course, there are other details--it probably shouldn't allow
>> ROLLBACK or COMMIT on an implicit transaction block, for example.
> 
> All you're doing here is moving the point of confusion around, surely? 
> At some point you still need to know whether you can issue 
> BEGIN/ROLLBACK/COMMIT etc.
> 
>>> Could it generate: "SELECT ensure_cache_contains(key,data)"? Then ten lines
>>> of plpgsql will neatly encapsulate the problem. That plpgsql can be
>>> automatically generated easily enough too.
>> I don't think so, at least not without digging into internals.  Django
>> is built around knowing all data types, so it'd need to be givne types
>> explicitly--for example, to know whether a timestamp should be
>> formatted as a timestamp, date or time.  (I do have a couple other
>> columns here--timestamps for cache expiration, etc.)  I'll have to ask
>> Django-side if there's a public API to do this, but I don't think
>> there is.
> 
> Well, the types would be exactly the same as for your existing insert. 
> All it's really doing is changing the template those values get 
> substituted into. It presumably does mean patching the ORM (or 
> subclassing from it anyway).
> 
>>> Ah, the joys of badly designed ORMs. The nice thing is that there seem to be
>>> plenty of bad ones to choose from too. If your ORM doesn't handle
>>> transactions well, the more you use it the more difficult your life will
>>> become. I'd be tempted to tidy up your existing fixes and wrap Django's ORM
>>> as cleanly as you can. That's assuming they're not interested in patches.
>> The ORM on a whole is decent, but there are isolated areas where it's
>> very braindamaged--this is one of them.  They have a stable-release
>> API-compatibility policy, which I think just gets them stuck with some
>> really bad decisions for a long time.
> 
> Presumably they targetted MySQL first, where there's a lot less use in 
> multi-statement transactions with their different behaviour of their 
> various storage-engines.
> 

FWIW, the way the Rails ORM ActiveRecord (another fairly damaged ORM) 
handles this is by allowing you to open any number of transaction 
blocks, but only the outer transaction block commits (in Pg):

Property.transaction { # SQL => 'BEGIN'  User.transaction {    Foo.transaction {      Foo.connection.execute('--some
sqlcode') # SQL => '--some sql code'    }  }
 
} # SQL => 'COMMIT'

This is pretty kludgy but lets me start any arbitrary transaction 
without worrying if there's already another one running "on top" of me 
(which I gather is your worry in this case). Dunno if the approach could 
work using a class wrapper and the Django ORM, but I would have thought 
that the implementation shouldn't be too hard..

And apologies to the list for going a little off-track from Pg.

Steve

Re: Bit by "commands ignored until end of transaction block" again

От

Craig Ringer

Дата:

27 июля 2009 г., 00:01:06

On Sun, 2009-07-26 at 19:15 -0400, Science wrote:

> FWIW, the way the Rails ORM ActiveRecord (another fairly damaged ORM) 
> handles this is by allowing you to open any number of transaction 
> blocks, but only the outer transaction block commits (in Pg):
> 
> Property.transaction { # SQL => 'BEGIN'
>    User.transaction {
>      Foo.transaction {
>        Foo.connection.execute('--some sql code') # SQL => '--some sql code'
>      }
>    }
> } # SQL => 'COMMIT'

What happens if, Foo.transaction does something that causes an error,
though, or issues a rollback? It's not creating savepoints, so if
Foo.transaction rolls back it throws out the work of User.transaction
and Property.transaction too.

Ugh.

That design would be quite good _IF_ it used savepoints:

Property.transaction { # SQL => 'BEGIN'  User.transaction {  # SQL => SAVEPOINT User    Foo.transaction { # SQL =>
SAVEPOINTFoo      Foo.connection.execute('--some sql code') # SQL => '--some sql code'    }                 # SQL =>
RELEASESAVEPOINT Foo  }                   # SQL => RELEASE SAVEPOINT User

}                      # SQL => 'COMMIT'

... so that inner transactions could ROLLBACK TO SAVEPOINT on error ,
and so that asking for a rollback would give you a ROLLBACK TO SAVEPOINT
if the transaction is a subtransaction.

-- 
Craig Ringer

Re: Bit by "commands ignored until end of transaction block" again

От

Science

Дата:

27 июля 2009 г., 12:40:45

Craig Ringer wrote:
> On Sun, 2009-07-26 at 19:15 -0400, Science wrote:
> 
>> FWIW, the way the Rails ORM ActiveRecord (another fairly damaged ORM) 
>> handles this is by allowing you to open any number of transaction 
>> blocks, but only the outer transaction block commits (in Pg):
>>
>> Property.transaction { # SQL => 'BEGIN'
>>    User.transaction {
>>      Foo.transaction {
>>        Foo.connection.execute('--some sql code') # SQL => '--some sql code'
>>      }
>>    }
>> } # SQL => 'COMMIT'
> 
> What happens if, Foo.transaction does something that causes an error,
> though, or issues a rollback? It's not creating savepoints, so if
> Foo.transaction rolls back it throws out the work of User.transaction
> and Property.transaction too.
> 
> Ugh.
> 
> That design would be quite good _IF_ it used savepoints:
> 
> 
> Property.transaction { # SQL => 'BEGIN'
>    User.transaction {  # SQL => SAVEPOINT User
>      Foo.transaction { # SQL => SAVEPOINT Foo
>        Foo.connection.execute('--some sql code') # SQL => '--some sql code'
>      }                 # SQL => RELEASE SAVEPOINT Foo
>    }                   # SQL => RELEASE SAVEPOINT User
> }                      # SQL => 'COMMIT'
> 
> ... so that inner transactions could ROLLBACK TO SAVEPOINT on error ,
> and so that asking for a rollback would give you a ROLLBACK TO SAVEPOINT
> if the transaction is a subtransaction.
> 

For all I know that's how it works these days. I haven't looked at the 
code underneath this in a couple of years. It should be trivial to 
implement in the way you describe based on how the Ruby codebase is set 
up -- hopefully all this will help OP with the Django/Python version of 
the same problem.

Steve

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Bit by "commands ignored until end of transaction block" again