Обсуждение: idle in txn query cancellation

Поиск
Список
Период
Сортировка

idle in txn query cancellation

От
Andres Freund
Дата:
Hi all,

I know it is late in the cycle, but I still think that the current behaviour 
of ERRORing during execution of a query but FATALing during IDLE IN 
TRANSACTION is very confusing to the user. Especially as you are not even able 
to read the reason for getting disconnected because the client doesnt expect 
input in that state...

The first patch adds the capability to add a flag to ereport like:
ereport(ERROR | LOG_NO_CLIENT)
Tom earlier suggested using COMERROR but thats just a version of LOG which 
doesnt report to the client. The patch makes that to be a synonym of LOG | 
LOG_NO_CLIENT.
While its not the most pretty API I dont think its that bad because the 
directionality is somewhat a property of the loglevel. Beside it would 
generate a lot of useless noise and breakage.

The second patch changes the FATAL during cancelling an idle in txn query into  
ERROR | LOG_NO_CLIENT.
To avoid breaking the known state there also may no "ready for query" message 
get sent. The patch ensures that by setting and checking a 
"silent_error_while_idle" variable.

That way the client will not see that an error occured until the next command 
sent but I dont think there is a solution to that in 9.0 timeframe if at all.

The patch only adds that for the recovery conflict path for now.

What do you think? Is it worth applying something like that now? If yes I 
would try to test the patch some more (obviously the patch survives the 
regression tests, but they do not seem to check the extended query protocol at 
all).

One could argue that the LOG_NO_CLIENT flag should be added when a idle 
transaction gets terminated by force but I wouldn't bother.

On a related note I would also like to get rid of the restriction that a 
normal query cancellation will only be done if no subtransactions are stacked. 
But I guess its too late for that? (I have a patch ready, some cleanup would 
be needed)
The latter works by:
- adding a explicit error code (which should be done regardless of this 
discussion)
- avoiding to catch such error at a few places (plperl, plpython)
- recursively aborting the subtransactions once the mainloop is reached
- relying on the fact that the cancellation signal will get resent
- possibly escalating to a FATAL if nothing happens after a certain number of 
tries

Andres


PS: I know I sort-of-promised a patch earlier, but it didn't work out time-
wise.

Re: idle in txn query cancellation

От
Simon Riggs
Дата:
On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
> I know it is late in the cycle

No problem here. Thanks for your diligence. Will review.

-- Simon Riggs           www.2ndQuadrant.com



Re: idle in txn query cancellation

От
Simon Riggs
Дата:
On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
> On a related note I would also like to get rid of the restriction that
> a normal query cancellation will only be done if no subtransactions
> are stacked. 
> But I guess its too late for that? (I have a patch ready, some cleanup
> would be needed)
> The latter works by:
> - adding a explicit error code (which should be done regardless of
> this 
> discussion)
> - avoiding to catch such error at a few places (plperl, plpython)
> - recursively aborting the subtransactions once the mainloop is
> reached
> - relying on the fact that the cancellation signal will get resent
> - possibly escalating to a FATAL if nothing happens after a certain
> number of tries

Such an action needs to have a good, clear theoretical explanation with
it to show that the interaction with savepoints is a good one.

I toyed with the idea of a new level between ERROR and FATAL to allow
ERRORs to be handled by savepoints still in all cases.

-- Simon Riggs           www.2ndQuadrant.com



Re: idle in txn query cancellation

От
Simon Riggs
Дата:
On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
> The first patch adds the capability to add a flag to ereport like:
> ereport(ERROR | LOG_NO_CLIENT)
> Tom earlier suggested using COMERROR but thats just a version of LOG
> which doesnt report to the client. The patch makes that to be a
> synonym of LOG | LOG_NO_CLIENT.
> While its not the most pretty API I dont think its that bad because
> the directionality is somewhat a property of the loglevel. Beside it
> would generate a lot of useless noise and breakage.
> 
> The second patch changes the FATAL during cancelling an idle in txn
> query into  ERROR | LOG_NO_CLIENT.
> To avoid breaking the known state there also may no "ready for query"
> message get sent. The patch ensures that by setting and checking a 
> "silent_error_while_idle" variable.
> 
> That way the client will not see that an error occured until the next
> command sent but I dont think there is a solution to that in 9.0
> timeframe if at all.
> 
> The patch only adds that for the recovery conflict path for now.
> 
> What do you think? Is it worth applying something like that now? If
> yes I would try to test the patch some more (obviously the patch
> survives the regression tests, but they do not seem to check the
> extended query protocol at all).

I think that is much better than FATAL. If it works I think we should
apply it for this release.

-- Simon Riggs           www.2ndQuadrant.com



Re: idle in txn query cancellation

От
Andres Freund
Дата:
On Monday 15 February 2010 09:50:08 Simon Riggs wrote:
> On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
> > The first patch adds the capability to add a flag to ereport like:
> > ereport(ERROR | LOG_NO_CLIENT)
> > Tom earlier suggested using COMERROR but thats just a version of LOG
> > which doesnt report to the client. The patch makes that to be a
> > synonym of LOG | LOG_NO_CLIENT.
> > While its not the most pretty API I dont think its that bad because
> > the directionality is somewhat a property of the loglevel. Beside it
> > would generate a lot of useless noise and breakage.
> > 
> > The second patch changes the FATAL during cancelling an idle in txn
> > query into  ERROR | LOG_NO_CLIENT.
> > To avoid breaking the known state there also may no "ready for query"
> > message get sent. The patch ensures that by setting and checking a
> > "silent_error_while_idle" variable.
> > 
> > That way the client will not see that an error occured until the next
> > command sent but I dont think there is a solution to that in 9.0
> > timeframe if at all.
> > 
> > The patch only adds that for the recovery conflict path for now.
> > 
> > What do you think? Is it worth applying something like that now? If
> > yes I would try to test the patch some more (obviously the patch
> > survives the regression tests, but they do not seem to check the
> > extended query protocol at all).
> 
> I think that is much better than FATAL. If it works I think we should
> apply it for this release.
It does work for me at least ;-). I only have marginal testing with the 
extended query protocol though and I think the error message needs to get 
improved somewhat.

I plan to make testing the extended query protocol easier by making pgbench 
able to restart after a such an error (thats why I like the seperate error 
code for such cancellations...)

The problem with the error message is, that errdetail_abort() uses MyProc-
>recoveryConflictPending which is already unset when the errdetail is used. 
Unless you beat me I plan to provide a patch here (havent looked at how to do 
so yet though).

Andres


Re: idle in txn query cancellation

От
Andres Freund
Дата:
On Monday 15 February 2010 09:47:09 Simon Riggs wrote:
> On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
> > On a related note I would also like to get rid of the restriction that
> > a normal query cancellation will only be done if no subtransactions
> > are stacked.
> > But I guess its too late for that? (I have a patch ready, some cleanup
> > would be needed)
> > The latter works by:
> > - adding a explicit error code (which should be done regardless of
> > this
> > discussion)
> > - avoiding to catch such error at a few places (plperl, plpython)
> > - recursively aborting the subtransactions once the mainloop is
> > reached
> > - relying on the fact that the cancellation signal will get resent
> > - possibly escalating to a FATAL if nothing happens after a certain
> > number of tries
> 
> Such an action needs to have a good, clear theoretical explanation with
> it to show that the interaction with savepoints is a good one.
I can provide a bit more explanation. The patch (other thread) already added 
some more comments but its definitely good to explain/define some more.
Will post that to the thread with the patch, ok?

> I toyed with the idea of a new level between ERROR and FATAL to allow
> ERRORs to be handled by savepoints still in all cases.
I have a hard time believing that it will help in that situation. Either you 
allow cleaning up process local resources in PG_TRY/PG_TRY in which situation 
you cant abort recursively at all places because the catching code block may 
very well reference resources associated with that snapshot or you abort the 
process in a way that there are no process local resources.

How would the middleway between those work?

Andres


Re: idle in txn query cancellation

От
Andres Freund
Дата:
On Sunday 14 February 2010 06:29:45 Simon Riggs wrote:
> On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
> > I know it is late in the cycle
> 
> No problem here. Thanks for your diligence. Will review.
Got a chance to look at it?

Andres


Re: idle in txn query cancellation

От
Simon Riggs
Дата:
On Sun, 2010-03-14 at 19:50 +0100, Andres Freund wrote:
> On Sunday 14 February 2010 06:29:45 Simon Riggs wrote:
> > On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
> > > I know it is late in the cycle
> > 
> > No problem here. Thanks for your diligence. Will review.
> Got a chance to look at it?

I need to spend my time on ensuring we can avoid the cancellation
altogether, so I apologise for not reviewing. That's not a comment on
your work or the possible effectiveness of the patch. Possibly others
have the time to review?

-- Simon Riggs           www.2ndQuadrant.com



Re: idle in txn query cancellation

От
Andres Freund
Дата:
Hi Simon,

On Sunday 14 March 2010 20:12:00 Simon Riggs wrote:
> On Sun, 2010-03-14 at 19:50 +0100, Andres Freund wrote:
> > On Sunday 14 February 2010 06:29:45 Simon Riggs wrote:
> > > On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote:
> > > > I know it is late in the cycle
> > > 
> > > No problem here. Thanks for your diligence. Will review.
> > 
> > Got a chance to look at it?
> 
> I need to spend my time on ensuring we can avoid the cancellation
> altogether, so I apologise for not reviewing. That's not a comment on
> your work or the possible effectiveness of the patch. Possibly others
> have the time to review?
I guess that wont go anywhere before 9.1?

I think at least the error code should be adjusted before 9.0.

Andres