Обсуждение: canceling query
Is it normal for a long query to keep running after I close the psql session that initiated it? I made a select * from a,b which naturally takes a long time, but the only way to stop it was to bring down the postmaster. Merlin
Merlin Moncure wrote: > Is it normal for a long query to keep running after I close the psql > session that initiated it? I made a select * from a,b which naturally > takes a long time, but the only way to stop it was to bring down the > postmaster. The backend should exit automatically when the user exits psql. How did you exit psql? If you just closed the window, I think the query will keep going even on Unix. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian wrote: > The backend should exit automatically when the user exits psql. How did > you exit psql? If you just closed the window, I think the query will > keep going even on Unix. I pressed ctrl-c from psql. The reason I am posting this on this list is because I am pretty sure this involves signal handling. To be specific: is the query cancel routine checking pending signals (using the win32 signal polling routine) and if so, is it properly catching the signal? (If not, I'll just properly wait until it is :) ). I suppose it is also possible my query is not checking to see if it cancelled. Following exiting psql, pg_stat_activity reports query still running...since I used big tables, it basically runes forever until I bring down the postmaster. Also interesting is that when I kill the running backend it kills the postmaster (I know you are not supposed to do this). Here is the log: LOG: database system was shut down at 2004-04-19 08:14:01 Eastern Daylight Time LOG: checkpoint record is at 0/2D715018 LOG: redo record is at 0/2D715018; undo record is at 0/0; shutdown TRUE LOG: next transaction ID: 664978; next OID: 27086 LOG: database system is ready ERROR: relation "order_file" does not exist LOG: could not send data to client: Unknown error LOG: server process (PID 3104) was terminated by signal 1 LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and e xit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. LOG: all server processes terminated; reinitializing FATAL: could not attach to proper memory at fixed address: shmget(key=5432001, addr=00F60000) faile d: No such file or directory Merlin
> > The backend should exit automatically when the user exits psql. How > did > > you exit psql? If you just closed the window, I think the > query will > > keep going even on Unix. > > I pressed ctrl-c from psql. The reason I am posting this on > this list is because I am pretty sure this involves signal handling. This is because there is no Ctrl-C handler in psql on win32. There was a patch for this which was "almost complete", but it was not fully thread-safe. The window of error is very small, but it's there. Perhaps we can live with that since it's a client (if you hit the very small window, you'll get a segfault or similar). If so, that patch acn probably still be applied. > To be specific: is the query cancel routine checking pending > signals (using the win32 signal polling routine) and if so, > is it properly catching the signal? (If not, I'll just > properly wait until it is :) ). I suppose it is also possible > my query is not checking to see if it cancelled. There is no query cancel sent, thus the signal is not sent :-) It should die once it tries to send data down the TCP connection, though. Since the other end of the socket is gone. Do you know if your query gets that for, or is it still executing? > Following exiting psql, pg_stat_activity reports query still > running...since I used big tables, it basically runes forever > until I bring down the postmaster. Yes. Or you can use my pg_kill_backend() function *cough*. Ok, yeah, I remember the discussion. I'm working on fixing it up per the result of that discussino, but I'm not done yet. Either that, or you can use a small commandline tool to send the "new style kill signal" o the backend. I'll see if we can get these files up on the pg win32 status page for now, so you have some kind of tool for now. There really should be no need to bring down the entire postmaster. > Also interesting is that when I kill the running backend it > kills the postmaster (I know you are not supposed to do > this). Here is the log: > > LOG: database system was shut down at 2004-04-19 08:14:01 > Eastern Daylight Time > LOG: checkpoint record is at 0/2D715018 > LOG: redo record is at 0/2D715018; undo record is at 0/0; > shutdown TRUE > LOG: next transaction ID: 664978; next OID: 27086 > LOG: database system is ready > ERROR: relation "order_file" does not exist > LOG: could not send data to client: Unknown error > LOG: server process (PID 3104) was terminated by signal 1 > LOG: terminating any other active server processes > WARNING: terminating connection because of crash of another > server process > DETAIL: The postmaster has commanded this server process to > roll back the current transaction and e xit, because another > server process exited abnormally and possibly corrupted shared memory. > HINT: In a moment you should be able to reconnect to the > database and repeat your command. > LOG: all server processes terminated; reinitializing > FATAL: could not attach to proper memory at fixed address: > shmget(key=5432001, addr=00F60000) faile > d: No such file or directory Now, that looks like a bug :-) If you kill just the backend, the postmaster should still survive. I think it used to do this, must've been one of the recent changes that broke it. Needs further digging in. /Magnus
Magnus Hagander wrote: > This is because there is no Ctrl-C handler in psql on win32. There was a > patch for this which was "almost complete", but it was not fully > thread-safe. The window of error is very small, but it's there. Perhaps > we can live with that since it's a client (if you hit the very small > window, you'll get a segfault or similar). If so, that patch acn > probably still be applied. Ok, that makes sense. > It should die once it tries to send data down the TCP connection, > though. Since the other end of the socket is gone. Do you know if your > query gets that for, or is it still executing? Still executing. CPU load high, and pg_stat_activity reports that particular query running on that particular backend. > There really should be no need to bring down the entire postmaster. Yes...Isn't it possible to set up your app so that it can't be brought down by the task manager? (IIRC this may only be possible with services) Of course, this is not a good idea until there is a proper cancel. My personal design philosophy is to try and keep normal query execution time < 1 sec, so I'm not overly concerned about this. I'm just beating on the postmaster to see what can come up with. Merlin
> > It should die once it tries to send data down the TCP connection, > > though. Since the other end of the socket is gone. Do you > know if your > > query gets that for, or is it still executing? > > Still executing. CPU load high, and pg_stat_activity reports > that particular query running on that particular backend. Ok. If you have the time/CPU to spare, it would be intersting to see if it goes down once it starts sending results to the frontend (which is gone). > > There really should be no need to bring down the entire postmaster. > > Yes...Isn't it possible to set up your app so that it can't > be brought down by the task manager? (IIRC this may only be > possible with > services) Of course, this is not a good idea until there is > a proper cancel. No, this can't be done. A GUI program can be set to ignore close requests, but you cannot prevent it from being killed from the "processes" tab. Unless you put in a kernel driver there to prevent it, and that's not going to happen :- > My personal design philosophy is to try and keep normal query > execution time < 1 sec, so I'm not overly concerned about > this. I'm just beating on the postmaster to see what can > come up with. Oh yes, this kind of testing is defintly what we need now. //Magnus
> Ok. If you have the time/CPU to spare, it would be intersting to see if > it goes down once it starts sending results to the frontend (which is > gone). > No crash, just: LOG: could not send data to client: Unknown error LOG: could not receive data from client: Unknown error LOG: unexpected EOF on client connection Merlin
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > Is it normal for a long query to keep running after I close the psql > session that initiated it? Yes, the backend will typically not notice a client disconnect until it next tries to read a command. regards, tom lane
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > No crash, just: > LOG: could not send data to client: Unknown error > LOG: could not receive data from client: Unknown error > LOG: unexpected EOF on client connection This is expected except for the "Unknown error". You should be seeing the equivalent of EPIPE, typically "Broken pipe" on Unixen. It sounds like the error number handling may not be quite right on Win32. regards, tom lane
>> No crash, just: > >> LOG: could not send data to client: Unknown error >> LOG: could not receive data from client: Unknown error >> LOG: unexpected EOF on client connection > >This is expected except for the "Unknown error". You should be >seeing the equivalent of EPIPE, typically "Broken pipe" on Unixen. >It sounds like the error number handling may not be quite right >on Win32. A quick lock at this shows that the problem is probably that %m has no clue about winsock error codes. There is a workaround strerror() in libpq on win32 (see interfaces/libpq/win32.c and libpq-int.h). My bet is it's the same thing. I beleive the fix needs to be in backend/utils/elog.c, function useful_strerror(). Unless someone either tells me that's the wrong place or beats me to it, I'll try to get a patch done for this soonest. //Magnus
"Magnus Hagander" <mha@sollentuna.net> writes: > I beleive the fix needs to be in backend/utils/elog.c, function > useful_strerror(). Unless someone either tells me that's the wrong place > or beats me to it, I'll try to get a patch done for this soonest. Sounds reasonable. Better look also at errcode_for_socket_access() and other places that check for particular errno values. While you are at it you might want to make sure there aren't any other unprotected strerror() calls in the backend. At one time we had a fair number of places with code like elog("barf: %s", strerror(errno)); and I'm not sure if all of them have gotten turned into %m or not. regards, tom lane
>> I beleive the fix needs to be in backend/utils/elog.c, function >> useful_strerror(). Unless someone either tells me that's the >wrong place >> or beats me to it, I'll try to get a patch done for this soonest. > >Sounds reasonable. Better look also at errcode_for_socket_access() and >other places that check for particular errno values. The errno values are correct (they're #defined in win32.h), it's just that the system supplied strerror() doesn't know about them. >While you are at it you might want to make sure there aren't any other >unprotected strerror() calls in the backend. At one time we had a fair >number of places with code like > > elog("barf: %s", strerror(errno)); > >and I'm not sure if all of them have gotten turned into %m or not. From what I can see, there are a couple of places in postmaster.c that uses it, and that's it. And it's all non-socket related. So I think we're safe there. Patch coming up shortly :-) //Magnus