Обсуждение: canceling query

Поиск
Список
Период
Сортировка

canceling query

От
"Merlin Moncure"
Дата:
Is it normal for a long query to keep running after I close the psql
session that initiated it?  I made a select * from a,b which naturally
takes a long time, but the only way to stop it was to bring down the
postmaster.

Merlin


Re: canceling query

От
Bruce Momjian
Дата:
Merlin Moncure wrote:
> Is it normal for a long query to keep running after I close the psql
> session that initiated it?  I made a select * from a,b which naturally
> takes a long time, but the only way to stop it was to bring down the
> postmaster.

The backend should exit automatically when the user exits psql.  How did
you exit psql?  If you just closed the window, I think the query will
keep going even on Unix.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: canceling query

От
"Merlin Moncure"
Дата:
Bruce Momjian wrote:
> The backend should exit automatically when the user exits psql.  How
did
> you exit psql?  If you just closed the window, I think the query will
> keep going even on Unix.

I pressed ctrl-c from psql.  The reason I am posting this on this list
is because I am pretty sure this involves signal handling.

To be specific: is the query cancel routine checking pending signals
(using the win32 signal polling routine) and if so, is it properly
catching the signal?  (If not, I'll just properly wait until it is :) ).
I suppose it is also possible my query is not checking to see if it
cancelled.

Following exiting psql, pg_stat_activity reports query still
running...since I used big tables, it basically runes forever until I
bring down the postmaster.

Also interesting is that when I kill the running backend it kills the
postmaster (I know you are not supposed to do this).  Here is the log:

LOG:  database system was shut down at 2004-04-19 08:14:01 Eastern
Daylight Time
LOG:  checkpoint record is at 0/2D715018
LOG:  redo record is at 0/2D715018; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 664978; next OID: 27086
LOG:  database system is ready
ERROR:  relation "order_file" does not exist
LOG:  could not send data to client: Unknown error
LOG:  server process (PID 3104) was terminated by signal 1
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server
process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and e
xit, because another server process exited abnormally and possibly
corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
LOG:  all server processes terminated; reinitializing
FATAL:  could not attach to proper memory at fixed address:
shmget(key=5432001, addr=00F60000) faile
d: No such file or directory

Merlin

Re: canceling query

От
"Magnus Hagander"
Дата:
> > The backend should exit automatically when the user exits psql.  How
> did
> > you exit psql?  If you just closed the window, I think the
> query will
> > keep going even on Unix.
>
> I pressed ctrl-c from psql.  The reason I am posting this on
> this list is because I am pretty sure this involves signal handling.

This is because there is no Ctrl-C handler in psql on win32. There was a
patch for this which was "almost complete", but it was not fully
thread-safe. The window of error is very small, but it's there. Perhaps
we can live with that since it's a client (if you hit the very small
window, you'll get a segfault or similar). If so, that patch acn
probably still be applied.


> To be specific: is the query cancel routine checking pending
> signals (using the win32 signal polling routine) and if so,
> is it properly catching the signal?  (If not, I'll just
> properly wait until it is :) ). I suppose it is also possible
> my query is not checking to see if it cancelled.

There is no query cancel sent, thus the signal is not sent :-)

It should die once it tries to send data down the TCP connection,
though. Since the other end of the socket is gone. Do you know if your
query gets that for, or is it still executing?


> Following exiting psql, pg_stat_activity reports query still
> running...since I used big tables, it basically runes forever
> until I bring down the postmaster.

Yes. Or you can use my pg_kill_backend() function *cough*. Ok, yeah, I
remember the discussion. I'm working on fixing it up per the result of
that discussino, but I'm not done yet.
Either that, or you can use a small commandline tool to send the "new
style kill signal" o the backend. I'll see if we can get these files up
on the pg win32 status page for now, so you have some kind of tool for
now.

There really should be no need to bring down the entire postmaster.


> Also interesting is that when I kill the running backend it
> kills the postmaster (I know you are not supposed to do
> this).  Here is the log:
>
> LOG:  database system was shut down at 2004-04-19 08:14:01
> Eastern Daylight Time
> LOG:  checkpoint record is at 0/2D715018
> LOG:  redo record is at 0/2D715018; undo record is at 0/0;
> shutdown TRUE
> LOG:  next transaction ID: 664978; next OID: 27086
> LOG:  database system is ready
> ERROR:  relation "order_file" does not exist
> LOG:  could not send data to client: Unknown error
> LOG:  server process (PID 3104) was terminated by signal 1
> LOG:  terminating any other active server processes
> WARNING:  terminating connection because of crash of another
> server process
> DETAIL:  The postmaster has commanded this server process to
> roll back the current transaction and e xit, because another
> server process exited abnormally and possibly corrupted shared memory.
> HINT:  In a moment you should be able to reconnect to the
> database and repeat your command.
> LOG:  all server processes terminated; reinitializing
> FATAL:  could not attach to proper memory at fixed address:
> shmget(key=5432001, addr=00F60000) faile
> d: No such file or directory

Now, that looks like a bug :-) If you kill just the backend, the
postmaster should still survive. I think it used to do this, must've
been one of the recent changes that broke it. Needs further digging in.

/Magnus

Re: canceling query

От
"Merlin Moncure"
Дата:
Magnus Hagander wrote:
> This is because there is no Ctrl-C handler in psql on win32. There was
a
> patch for this which was "almost complete", but it was not fully
> thread-safe. The window of error is very small, but it's there.
Perhaps
> we can live with that since it's a client (if you hit the very small
> window, you'll get a segfault or similar). If so, that patch acn
> probably still be applied.

Ok, that makes sense.

> It should die once it tries to send data down the TCP connection,
> though. Since the other end of the socket is gone. Do you know if your
> query gets that for, or is it still executing?

Still executing.  CPU load high, and pg_stat_activity reports that
particular query running on that particular backend.

> There really should be no need to bring down the entire postmaster.

Yes...Isn't it possible to set up your app so that it can't be brought
down by the task manager?  (IIRC this may only be possible with
services)  Of course, this is not a good idea until there is a proper
cancel.

My personal design philosophy is to try and keep normal query execution
time < 1 sec, so I'm not overly concerned about this.  I'm just beating
on the postmaster to see what can come up with.

Merlin

Re: canceling query

От
"Magnus Hagander"
Дата:
> > It should die once it tries to send data down the TCP connection,
> > though. Since the other end of the socket is gone. Do you
> know if your
> > query gets that for, or is it still executing?
>
> Still executing.  CPU load high, and pg_stat_activity reports
> that particular query running on that particular backend.

Ok. If you have the time/CPU to spare, it would be intersting to see if
it goes down once it starts sending results to the frontend (which is
gone).


> > There really should be no need to bring down the entire postmaster.
>
> Yes...Isn't it possible to set up your app so that it can't
> be brought down by the task manager?  (IIRC this may only be
> possible with
> services)  Of course, this is not a good idea until there is
> a proper cancel.

No, this can't be done.
A GUI program can be set to ignore close requests, but you cannot
prevent it from being killed from the "processes" tab. Unless you put in
a kernel driver there to prevent it, and that's not going to happen :-


> My personal design philosophy is to try and keep normal query
> execution time < 1 sec, so I'm not overly concerned about
> this.  I'm just beating on the postmaster to see what can
> come up with.

Oh yes, this kind of testing is defintly what we need now.

//Magnus

Re: canceling query

От
"Merlin Moncure"
Дата:
> Ok. If you have the time/CPU to spare, it would be intersting to see
if
> it goes down once it starts sending results to the frontend (which is
> gone).
>

No crash, just:

LOG:  could not send data to client: Unknown error
LOG:  could not receive data from client: Unknown error
LOG:  unexpected EOF on client connection

Merlin


Re: canceling query

От
Tom Lane
Дата:
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
> Is it normal for a long query to keep running after I close the psql
> session that initiated it?

Yes, the backend will typically not notice a client disconnect until it
next tries to read a command.

            regards, tom lane

Re: canceling query

От
Tom Lane
Дата:
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
> No crash, just:

> LOG:  could not send data to client: Unknown error
> LOG:  could not receive data from client: Unknown error
> LOG:  unexpected EOF on client connection

This is expected except for the "Unknown error".  You should be
seeing the equivalent of EPIPE, typically "Broken pipe" on Unixen.
It sounds like the error number handling may not be quite right
on Win32.

            regards, tom lane

Re: canceling query

От
"Magnus Hagander"
Дата:
>> No crash, just:
>
>> LOG:  could not send data to client: Unknown error
>> LOG:  could not receive data from client: Unknown error
>> LOG:  unexpected EOF on client connection
>
>This is expected except for the "Unknown error".  You should be
>seeing the equivalent of EPIPE, typically "Broken pipe" on Unixen.
>It sounds like the error number handling may not be quite right
>on Win32.

A quick lock at this shows that the problem is probably that %m has no
clue about winsock error codes. There is a workaround strerror() in
libpq on win32 (see interfaces/libpq/win32.c and libpq-int.h). My bet is
it's the same thing.

I beleive the fix needs to be in backend/utils/elog.c, function
useful_strerror(). Unless someone either tells me that's the wrong place
or beats me to it, I'll try to get a patch done for this soonest.

//Magnus

Re: canceling query

От
Tom Lane
Дата:
"Magnus Hagander" <mha@sollentuna.net> writes:
> I beleive the fix needs to be in backend/utils/elog.c, function
> useful_strerror(). Unless someone either tells me that's the wrong place
> or beats me to it, I'll try to get a patch done for this soonest.

Sounds reasonable.  Better look also at errcode_for_socket_access() and
other places that check for particular errno values.

While you are at it you might want to make sure there aren't any other
unprotected strerror() calls in the backend.  At one time we had a fair
number of places with code like

    elog("barf: %s", strerror(errno));

and I'm not sure if all of them have gotten turned into %m or not.

            regards, tom lane

Re: canceling query

От
"Magnus Hagander"
Дата:
>> I beleive the fix needs to be in backend/utils/elog.c, function
>> useful_strerror(). Unless someone either tells me that's the
>wrong place
>> or beats me to it, I'll try to get a patch done for this soonest.
>
>Sounds reasonable.  Better look also at errcode_for_socket_access() and
>other places that check for particular errno values.

The errno values are correct (they're #defined in win32.h), it's just
that the system supplied strerror() doesn't know about them.

>While you are at it you might want to make sure there aren't any other
>unprotected strerror() calls in the backend.  At one time we had a fair
>number of places with code like
>
>    elog("barf: %s", strerror(errno));
>
>and I'm not sure if all of them have gotten turned into %m or not.

From what I can see, there are a couple of places in postmaster.c that
uses it, and that's it. And it's all non-socket related. So I think
we're safe there.


Patch coming up shortly :-)

//Magnus