Обсуждение: Rollback on close [Was: Fwd: [DB-SIG] conn.close() idempotence]

Поиск
Список
Период
Сортировка

Rollback on close [Was: Fwd: [DB-SIG] conn.close() idempotence]

От
Daniele Varrazzo
Дата:
This excerpt is from the discussion going on about conn.close()
idempotence on the dbsig mailing list. However this is related to
psycopg implementation:

> Me wrote:
>On Wed, Oct 19, 2011 at 8:59 AM, M.-A. Lemburg <mal@egenix.com> wrote:
>
>> [...] .close() issues an implicit
>> .rollback() and usually also tells the database backend to free up
>> resources maintained for the connection, so it does require communication
>> with the backend.
>
> If the rollback is implicit in the backend instead of in the driver,
> the driver can just cut the communication with the server and obtain
> the dbapi semantics without issuing a new command. While this is just
> a trick when close() is called explicitly, it is about necessary on
> del: not only sending a rollback is a complex operation to be executed
> by __del__ (just because: it can fail and del would swallow the
> exception, thing we don't like): in complex environments (such as
> nonblocking aynchronous communication) pushing the rollback to the
> server and reading its response implies interaction between other
> python code and the connection being destroyed, whose results are at
> best undetermined.

Now, what happened exactly in psycopg has been:

- before 2.2 we used to call ROLLBACK on close() and del, in a common code path
- in 2.2 this was dropped as an error of mine during some code refactoring
- talking about reintroducing them, it's been deemed a bad idea, for
the problems highlighted at del time (as async communication had been
added)
- we decided to keep the bug promoting it as feature.

Actually though, while we are fine from the PoV of the data integrity,
it is true that cutting the communication without a rollback causes
some resource issue, specifically we know pgpool is not happy and
discards the connection. We suggest to call rollback explicitly to
make middleware happy, but this violates the dbapi requirement of the
"implicit rollback": resources should be well handled on close.

The problem only stems from close() and del sharing almost all the
code. I think we could split the two implementations instead and call
an explicit ROLLBACK when close() is called (and we are in
transaction), and not call it instead when the destructor is invoked,
but just close the communication and let the backend roll back for us.
The rollback wouldn't be called in async mode, which is always
autocommit, and would cause no problem in green mode, as the greenlet
would be blocked on the close() until communication has finished, so
the object is guaranteed to be alive.

-- Daniele

Re: Rollback on close [Was: Fwd: [DB-SIG] conn.close() idempotence]

От
Marko Kreen
Дата:
On Wed, Oct 19, 2011 at 12:46 PM, Daniele Varrazzo
<daniele.varrazzo@gmail.com> wrote:
> This excerpt is from the discussion going on about conn.close()
> idempotence on the dbsig mailing list. However this is related to
> psycopg implementation:
>
>> Me wrote:
>>On Wed, Oct 19, 2011 at 8:59 AM, M.-A. Lemburg <mal@egenix.com> wrote:
>>
>>> [...] .close() issues an implicit
>>> .rollback() and usually also tells the database backend to free up
>>> resources maintained for the connection, so it does require communication
>>> with the backend.
>>
>> If the rollback is implicit in the backend instead of in the driver,
>> the driver can just cut the communication with the server and obtain
>> the dbapi semantics without issuing a new command. While this is just
>> a trick when close() is called explicitly, it is about necessary on
>> del: not only sending a rollback is a complex operation to be executed
>> by __del__ (just because: it can fail and del would swallow the
>> exception, thing we don't like): in complex environments (such as
>> nonblocking aynchronous communication) pushing the rollback to the
>> server and reading its response implies interaction between other
>> python code and the connection being destroyed, whose results are at
>> best undetermined.
>
> Now, what happened exactly in psycopg has been:
>
> - before 2.2 we used to call ROLLBACK on close() and del, in a common code path
> - in 2.2 this was dropped as an error of mine during some code refactoring
> - talking about reintroducing them, it's been deemed a bad idea, for
> the problems highlighted at del time (as async communication had been
> added)
> - we decided to keep the bug promoting it as feature.
>
> Actually though, while we are fine from the PoV of the data integrity,
> it is true that cutting the communication without a rollback causes
> some resource issue, specifically we know pgpool is not happy and
> discards the connection. We suggest to call rollback explicitly to
> make middleware happy, but this violates the dbapi requirement of the
> "implicit rollback": resources should be well handled on close.
>
> The problem only stems from close() and del sharing almost all the
> code. I think we could split the two implementations instead and call
> an explicit ROLLBACK when close() is called (and we are in
> transaction), and not call it instead when the destructor is invoked,
> but just close the communication and let the backend roll back for us.
> The rollback wouldn't be called in async mode, which is always
> autocommit, and would cause no problem in green mode, as the greenlet
> would be blocked on the close() until communication has finished, so
> the object is guaranteed to be alive.

First, "in transaction" is not enough, it must check if connections
is "idle in transaction" and no query has been sent.

Secondly, I think there are two types of code to consider:

- Sloppy code - eg read-only web page that does

  db = connect()
  curs.execute('select ...')
  curs.execute('select ...')
  db.close()

- Robust code, where in-transaction-close means
  problem, and it wants to get rid of connection
  without touching network.

Although I understand the urge to optimize for first case,
you take away the possibility of robust code to behave well.

So if you really want to restore the rollback-on-close
behaviour, at least make it configurable.

OTOH, as the lightweight .close() is only problematic
with middleware, it seems to hint that this idle-in-tx
check should be moved into middleware, and psycopg
should not need to worry about it..

--
marko

Re: Rollback on close [Was: Fwd: [DB-SIG] conn.close() idempotence]

От
Daniele Varrazzo
Дата:
On Wed, Oct 19, 2011 at 1:12 PM, Marko Kreen <markokr@gmail.com> wrote:

> First, "in transaction" is not enough, it must check if connections
> is "idle in transaction" and no query has been sent.

Question: is for the middleware different if a connection is idle in
transaction with or without a query has been sent to the database?
However it's easy to check the "idle in transaction" state in the
connection; and actually, because it's us who send the commands, we
could also detect about any sent query.

Your question also makes me think about what should happen if a
close() is issued in a separate thread while a query is running... but
this should be just handled by the serialization code in the
transaction, i.e. the close should wait until the query has finished.

> Secondly, I think there are two types of code to consider:
>
> - Sloppy code - eg read-only web page that does
>
>  db = connect()
>  curs.execute('select ...')
>  curs.execute('select ...')
>  db.close()
>
> - Robust code, where in-transaction-close means
>  problem, and it wants to get rid of connection
>  without touching network.
>
> Although I understand the urge to optimize for first case,
> you take away the possibility of robust code to behave well.

What is an example of situation where a close in transaction without
rollback is a better option than rolling back too?

> So if you really want to restore the rollback-on-close
> behaviour, at least make it configurable.

I'm more for making a decision instead of leaving too many things to
be configured, so if we deem that closing without explicit rollback is
still a better solution I'm fine with leaving it this way and suggest
users to write less sloppy code. I.E. I would *not* like to add an
option such as conn.close(rollback=False).

> OTOH, as the lightweight .close() is only problematic
> with middleware, it seems to hint that this idle-in-tx
> check should be moved into middleware, and psycopg
> should not need to worry about it..

Well, you know the middleware much better than me: I was assuming that
if pgpool discards connection returned idle in transaction to the pool
you have very strong reasons :) I just want to optimize the
communication between the driver and the middleware: what do you think
the "protocol" between psycopg and pgpool should be?

-- Daniele

Re: Rollback on close [Was: Fwd: [DB-SIG] conn.close() idempotence]

От
Marko Kreen
Дата:
On Wed, Oct 19, 2011 at 3:59 PM, Daniele Varrazzo
<daniele.varrazzo@gmail.com> wrote:
> On Wed, Oct 19, 2011 at 1:12 PM, Marko Kreen <markokr@gmail.com> wrote:
>> First, "in transaction" is not enough, it must check if connections
>> is "idle in transaction" and no query has been sent.
>
> Question: is for the middleware different if a connection is idle in
> transaction with or without a query has been sent to the database?

It does not make sense to send rollback packet in the middle of copy
or long query, or even when the query resultset has not fully arrive yet,
as you will not know how big it is.

> However it's easy to check the "idle in transaction" state in the
> connection; and actually, because it's us who send the commands, we
> could also detect about any sent query.

Yes.

> Your question also makes me think about what should happen if a
> close() is issued in a separate thread while a query is running... but
> this should be just handled by the serialization code in the
> transaction, i.e. the close should wait until the query has finished.

Bad idea, the query resultset may be big.

>> Secondly, I think there are two types of code to consider:
>>
>> - Sloppy code - eg read-only web page that does
>>
>>  db = connect()
>>  curs.execute('select ...')
>>  curs.execute('select ...')
>>  db.close()
>>
>> - Robust code, where in-transaction-close means
>>  problem, and it wants to get rid of connection
>>  without touching network.
>>
>> Although I understand the urge to optimize for first case,
>> you take away the possibility of robust code to behave well.
>
> What is an example of situation where a close in transaction without
> rollback is a better option than rolling back too?

- incoming copy, incoming big resultset
- network down (why wait for timeout on every connection)
- signal (need fast shutdown, connection may be in unstable state)
- closing from other thread, without locking
  (some sort of disaster/quick shutdown scenario)

Actually, now that I think about it, the *only* scenario
where the rollback is desirable is above-mentioned
sloppy code with middleware (eg. pgbouncer)
that drops connections when closed in middle of tx.

In every other case I can think of, the ROLLBACK
will cause problems.

>> So if you really want to restore the rollback-on-close
>> behaviour, at least make it configurable.
>
> I'm more for making a decision instead of leaving too many things to
> be configured, so if we deem that closing without explicit rollback is
> still a better solution I'm fine with leaving it this way and suggest
> users to write less sloppy code. I.E. I would *not* like to add an
> option such as conn.close(rollback=False).

But would you like to document that:

"To reliable close connection, do os.close(db.fileno()) before db.close()"

?

>> OTOH, as the lightweight .close() is only problematic
>> with middleware, it seems to hint that this idle-in-tx
>> check should be moved into middleware, and psycopg
>> should not need to worry about it..
>
> Well, you know the middleware much better than me: I was assuming that
> if pgpool discards connection returned idle in transaction to the pool
> you have very strong reasons :) I just want to optimize the
> communication between the driver and the middleware: what do you think
> the "protocol" between psycopg and pgpool should be?

Thats easy - do not rollback.  Only reason for rollback is to work around
current middleware, so if current middleware behaviour is bad,
it should be fixed.  Middleware should be transparent, client code
should not need to have workarounds.

Btw, I know pgbouncer, I do not know pgpool.  I saw a mention of pgpool
above and guessed it behaves the same.

PgBouncer drops server connection if client closes in the middle of transaction,
on the principle of "in-tx close is sign of problem, lets propagate it further".
So the only question for me is - should I fix pgbouncer to keep server
connection in such situation?  The answer seems to be "yes" if I think
about maximal pooling efficiency, but "no" if I think that only use would
be to keep crap code running and potential problems in disaster situations.

Eg, pgbouncer has some logic for disaster situations that depends
on server connections going away from pool (fast_fail).
But the fix would keep server connection around longer.
OTOH, when strictly idle-in-tx, the client-side problem
should not say anything about server connection.

--
marko

Re: Rollback on close [Was: Fwd: [DB-SIG] conn.close() idempotence]

От
Marko Kreen
Дата:
On Wed, Oct 19, 2011 at 6:13 PM, Marko Kreen <markokr@gmail.com> wrote:
> On Wed, Oct 19, 2011 at 3:59 PM, Daniele Varrazzo
> <daniele.varrazzo@gmail.com> wrote:
>> On Wed, Oct 19, 2011 at 1:12 PM, Marko Kreen <markokr@gmail.com> wrote:
>> Well, you know the middleware much better than me: I was assuming that
>> if pgpool discards connection returned idle in transaction to the pool
>> you have very strong reasons :) I just want to optimize the
>> communication between the driver and the middleware: what do you think
>> the "protocol" between psycopg and pgpool should be?
>
> Thats easy - do not rollback.

Let's clarify that further - DB-API *MUST* say that .close()
[and also .__del__()] will rollback.  Eg - imagine
driver accessing .dbf / .csv files directly.

But there is no good reason to require communication
with backend if the backend will rollback anyway
on connection close.

DB-API 2.0:
    .close()

            Close the connection now (rather than whenever __del__ is
            called).  The connection will be unusable from this point
            forward; an Error (or subclass) exception will be raised
            if any operation is attempted with the connection. The
            same applies to all cursor objects trying to use the
            connection.  Note that closing a connection without
            committing the changes first will cause an implicit
            rollback to be performed.

From my reading, .close() and .__del__() *are* equivalent,
and no explicit communication is required, if the backend
will rollback anyway on connection close.

Not that I'm against breaking DB-API if the wording is insane,
but the wording seems to direct towards sane behaviour.

--
marko

Re: Rollback on close [Was: Fwd: [DB-SIG] conn.close() idempotence]

От
Daniele Varrazzo
Дата:
On Wed, Oct 19, 2011 at 6:50 PM, Marko Kreen <markokr@gmail.com> wrote:

> Let's clarify that further - DB-API *MUST* say that .close()
> [and also .__del__()] will rollback.  Eg - imagine
> driver accessing .dbf / .csv files directly.
>
> But there is no good reason to require communication
> with backend if the backend will rollback anyway
> on connection close.

Yes, this is our current position. Changing from it could only improve
the interaction between "sloppy code" and middleware, and as you have
pointed out this would make edge case more troublesome.

Thank you for the discussion, I wanted to hear the opinion of somebody
on the other side of the socket :)


-- Daniele