Обсуждение: BUG #13518: CancelRequest lacks statement identifier

Поиск
Список
Период
Сортировка

BUG #13518: CancelRequest lacks statement identifier

От
niallfr@btinternet.com
Дата:
The following bug has been logged on the website:

Bug reference:      13518
Logged by:          Niall Ross
Email address:      niallfr@btinternet.com
PostgreSQL version: 9.3.1
Operating system:   Mac OSX 10.9.5, Windows 7, Linux RHel 5.6
Description:

Postgres lacks a way to cancel a specific identified statement.  Sending
CancelRequest cancels the current statement running on a connection.  But
this means the current statement when the cancel reaches the server, _not_
the current statement when the cancel was sent.  Thus we are caught between
a rock and a hard place when our system has to dismiss an
incompletely-processed statement, since the system cannot tell which of the
following applies.

- If the statement will run on the server for a while yet, we have time to
send cancel-on-server and will complete the dismiss quicker by doing so
(because we buffer, draining all messages for that statement from the
socket, and cancelling on server truncates the flow of messages).

- If the statement will soon finish anyway, the asynchronous
cancel-on-server can kill the _next_ statement on that connection, not the
one our system seeks to dismiss, exactly the opposite of what is wanted.

When dismissing a query, the aim is to do so as fast as possible.  For a
long-running query, sending CancelRequest to the server is exactly what you
want, but until Postgres knows which session is meant, we can see no way for
an automatic system to do this safely (unless by a post-send delay,
defeating the aim, which is performance).

If CancelRequest could include a statement name (or other means of
identifying a statement) and did nothing if that statement was no longer
running by the time the server processed it, the usability of CancelRequest
would be significantly enhanced.

Re: BUG #13518: CancelRequest lacks statement identifier

От
Kevin Grittner
Дата:
"niallfr@btinternet.com" <niallfr@btinternet.com> wrote:

> If CancelRequest could include a statement name (or other means
> of identifying a statement) and did nothing if that statement was
> no longer running by the time the server processed it, the
> usability of CancelRequest would be significantly enhanced.

Would it be sufficient for your purposes to cancel a *transaction*
rather than a *statement*?  There is already a virtual transaction
ID exposed in pg_locks, and it could probably be added to
pg_stat_activity; we could probably create a
pg_cancel_transaction() function that took a text representation of
that and only canceled the transaction if it was running.  There
would need to be locking on a heavily contended lock or two to make
that happen correctly, but presumably this would not ba a
high-volume activity.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #13518: CancelRequest lacks statement identifier

От
Tom Lane
Дата:
Kevin Grittner <kgrittn@ymail.com> writes:
> "niallfr@btinternet.com" <niallfr@btinternet.com> wrote:
>> If CancelRequest could include a statement name (or other means
>> of identifying a statement) and did nothing if that statement was
>> no longer running by the time the server processed it, the
>> usability of CancelRequest would be significantly enhanced.

> Would it be sufficient for your purposes to cancel a *transaction*
> rather than a *statement*?  There is already a virtual transaction
> ID exposed in pg_locks, and it could probably be added to
> pg_stat_activity; we could probably create a
> pg_cancel_transaction() function that took a text representation of
> that and only canceled the transaction if it was running.  There
> would need to be locking on a heavily contended lock or two to make
> that happen correctly, but presumably this would not ba a
> high-volume activity.

If I'm correctly visualizing what you've got in mind, it seems like
that would only narrow the race-condition window not close it entirely.
I don't believe there's any guarantee that signals-in-flight are delivered
immediately, so the SIGINT might still arrive at the target process after
it's finished out the transaction that was meant to be canceled.

The bigger picture though is that what you're suggesting would require
that a whole new connection be made in order to issue a Cancel.  That's
kind of an expensive solution ... not that what we're doing now is
exactly free, but at least it doesn't fail if you're up against the
MaxBackends limit, for instance.

            regards, tom lane

Re: BUG #13518: CancelRequest lacks statement identifier

От
Andres Freund
Дата:
On 2015-07-28 15:57:26 -0400, Tom Lane wrote:
> The bigger picture though is that what you're suggesting would require
> that a whole new connection be made in order to issue a Cancel.  That's
> kind of an expensive solution ...

On the other hand that'd allow us to do authentication, ssl, etc for
cancel requests. Which would also be kinda nice...

Re: BUG #13518: CancelRequest lacks statement identifier

От
Niall Ross
Дата:
Dear Tom, Kevin et al,

 > The bigger picture though is that what you're suggesting would require
 > that a whole new connection be made in order to issue a Cancel.

Am I misunderstanding something?  My experience with using Postgres, and
my reading of the documents, is that a new statement, but not a new
connection, is needed to cancel a statement on that connection.  The
cancel is signalled by a fresh statement using the same connection (and
specifically that connection's PID and Key).  I do not see why a new
connection would enter into it.

The point is that in any situation other than explicit user intervention
- or a built-in post-send-cancel delay - issuing a cancel seems too
risky to do, because the cancel could kill the _next_ statement, not the
one intended.  The motive for issuing a cancel - at least in my scenario
- is to increase the probability that the current statement will end as
soon as possible, so the next can run sooner.  Any risk of cancelling
that next statement that we're keen to run sooner is very
counterproductive (as would be making the system wait after issuing a
cancel for some guessed time delay after which for sure it will have
been processed).  Hence the desire to include some identifier of the
statement we wish to cancel, so either that intended statement is
cancelled or nothing happens.

More generally, I'm puzzled what the scenario is for an app driving
postgres to use cancel automatically.  Why cancel at all except for
performance, yet how should a system cancel and move to next statement
performantly without risking next-statement cancel?

 > Would it be sufficient for your purposes to cancel a *transaction*
 > rather than a *statement*?

No, AFAICS;  there is no reason why the statement needing cancelling
would necessarily be in a transaction, and conversely it could be inside
a transaction there was no desire to cancel.

This is not an immediate major hold-up for us at the moment, because
we've just removed the code that was exploiting cancel, choosing
functional certainty over the occasional performance gain of cancelling
a long query.  However the experience left me puzzled how to use cancel.

            Thanks for thinking about this
                   Niall Ross

>Kevin Grittner <kgrittn@ymail.com> writes:
>
>
>>"niallfr@btinternet.com" <niallfr@btinternet.com> wrote:
>>
>>
>>>If CancelRequest could include a statement name (or other means
>>>of identifying a statement) and did nothing if that statement was
>>>no longer running by the time the server processed it, the
>>>usability of CancelRequest would be significantly enhanced.
>>>
>>>
>
>
>
>>Would it be sufficient for your purposes to cancel a *transaction*
>>rather than a *statement*?  There is already a virtual transaction
>>ID exposed in pg_locks, and it could probably be added to
>>pg_stat_activity; we could probably create a
>>pg_cancel_transaction() function that took a text representation of
>>that and only canceled the transaction if it was running.  There
>>would need to be locking on a heavily contended lock or two to make
>>that happen correctly, but presumably this would not ba a
>>high-volume activity.
>>
>>
>
>If I'm correctly visualizing what you've got in mind, it seems like
>that would only narrow the race-condition window not close it entirely.
>I don't believe there's any guarantee that signals-in-flight are delivered
>immediately, so the SIGINT might still arrive at the target process after
>it's finished out the transaction that was meant to be canceled.
>
>The bigger picture though is that what you're suggesting would require
>that a whole new connection be made in order to issue a Cancel.  That's
>kind of an expensive solution ... not that what we're doing now is
>exactly free, but at least it doesn't fail if you're up against the
>MaxBackends limit, for instance.
>
>            regards, tom lane
>
>
>
>

Re: BUG #13518: CancelRequest lacks statement identifier

От
Tom Lane
Дата:
Niall Ross <niallfr@btinternet.com> writes:
> Am I misunderstanding something?  My experience with using Postgres, and
> my reading of the documents, is that a new statement, but not a new
> connection, is needed to cancel a statement on that connection.  The
> cancel is signalled by a fresh statement using the same connection (and
> specifically that connection's PID and Key).

Uh, no, that's not how it works at all.  PQcancel() involves opening a
fresh connection to the postmaster, which will then send SIGINT to the
backend process you're originally connected to.

Also, looking again at the PQcancel code, I'm unsure that there is any
actual race condition here.  The comments therein quoth

     * Wait for the postmaster to close the connection, which indicates that
     * it's processed the request.  Without this delay, we might issue another
     * command only to find that our cancel zaps that command instead of the
     * one we thought we were canceling.  Note we don't actually expect this
     * read to obtain any data, we are just waiting for EOF to be signaled.

which means that by the time the PQcancel call returns, the SIGINT has
been sent.  I did speculate upthread that delivery of that signal might
be asynchronous, but that was just speculation.  Even if there was some
delay, the backend would ignore the signal unless it arrived later than
the first byte of the next client request.  So it seems a bit hard to
credit that you would have a practical problem unless your application
were to issue a new command while PQcancel was still executing ... and if
it does, I maintain that's an application bug not a protocol problem.

            regards, tom lane

Re: BUG #13518: CancelRequest lacks statement identifier

От
Niall Ross
Дата:
Dear Tom,
    thanks for info.

>Niall Ross <niallfr@btinternet.com> writes:
>
>
>>Am I misunderstanding something?  My experience with using Postgres, and
>>my reading of the documents, is that a new statement, but not a new
>>connection, is needed to cancel a statement on that connection.  The
>>cancel is signalled by a fresh statement using the same connection (and
>>specifically that connection's PID and Key).
>>
>>
>
>Uh, no, that's not how it works at all.  PQcancel() involves opening a
>fresh connection to the postmaster, which will then send SIGINT to the
>backend process you're originally connected to.
>
>
You are right.  Our cancelling code is implemented on _our_ system's
existing connection instance but it opens a new _socket_ connection to
send the cancel request to postgres (we do not reify this temporary
socket connection as a separate connection in our system).  I apologise
for my stupid phrasing - I was thinking in my system's terms, forgetting
that you of course were describing it in Postgres' terms.

>Also, looking again at the PQcancel code, I'm unsure that there is any
>actual race condition here.  The comments therein quoth
>
>     * Wait for the postmaster to close the connection, which indicates that
>     * it's processed the request.  Without this delay, we might issue another
>     * command only to find that our cancel zaps that command instead of the
>     * one we thought we were canceling.  Note we don't actually expect this
>     * read to obtain any data, we are just waiting for EOF to be signaled.
>
>
That's interesting.  Currently, we close the socket at our client end
after committing the cancel request, knowing that no message response is
expected, and do not attempt to tell when the socket is closed by the
server.

If the connection's command has completed on the server by the time the
cancel request reaches the server, is the behaviour expected that the
server would close the socket (EOF) immediately?

>which means that by the time the PQcancel call returns, the SIGINT has
>been sent.  I did speculate upthread that delivery of that signal might
>be asynchronous, but that was just speculation.  Even if there was some
>delay, the backend would ignore the signal unless it arrived later than
>the first byte of the next client request.
>
Should I interpret these remarks to mean that you are qualifying the
quoted comment above them, or would that be a misunderstanding?  To ask
the same question another way, are you saying it is safe to wait till
the cancel request's socket is closed by the server, for no less time
but also for no longer, before sending another command on the
connection's main socket?

>So it seems a bit hard to
>credit that you would have a practical problem unless your application
>were to issue a new command while PQcancel was still executing
>
That's exactly the problem;  in our desire to maximise performance, we
want to issue a new command after cancelling a command that may or may
not complete before the cancel reaches the server and is processed.

     - In an ideal world, if the command is still running when the
cancel is processed, we want it cancelled, otherwise a no-op.

    - In a less ideal world, we at least want a response when the cancel
request has been processed rather than waiting for a guessed time.
IIUC, EOF on the cancel socket is this response, both in the case where
the cancel does something and in the case where it arrives after the
query completes.

>... and if
>it does, I maintain that's an application bug not a protocol problem.
>
We have a multi-threaded system which can use relational database
backends, Postgres being one possibility.  Threaded presentation of
statements to the same connection is not eliminable in certain important
cases.  One solution is to protect our reification of the connection so
it drains its socket of messages for an ongoing query (i.e. buffers
them) before sending another.  Such a query can be dismissed while
buffering in various cases.  This is the situation in which we want to
end a running query's posting of messages to the socket as soon as
possible, and then present a new query to that socket as soon as
possible.  Fire&forget cancellation would be ideal - _if_ the cancel
could only affect the intended query.  Waiting for server-closing of
another socket is less ideal, but at least it is deterministic.

I submitted this as a bug, but now I understand there is a way to know
when the server has cancelled, I'd reclassify it as a critique of the
cancel request feature.  For now, we are managing without it.  Hopefully
the discussion has been of interest to you in thinking about Postgres
evolution.

             Yours faithfully
                   Niall Ross