Обсуждение: (Cygwin) postmaster shutdown problem
I am observing the following postmaster shutdown problem with 7.0.3 and
the 12/29/2000 snapshot on Cygwin 1.1.7:
    After postmaster has been driven by multiple simultaneous (JDBC)
    connections, postmaster usually requires to receive more the one
    SIGTERM signal before it will perform a Smart Shutdown.
When I run the 7.0.3 postmaster with the "-d 1" option, I get the
following pruned and annotated (indicated by the ### prefix) output for
two simultaneous connections:
    ...
    ### last JDBC connection is dropped by client
    pq_recvbuf: recv() failed: Connection reset by peer
    proc_exit(0)
    shmem_exit(0)
    exit(0)
    /usr/local/pgsql/bin/postmaster: reaping dead processes...
    /usr/local/pgsql/bin/postmaster: CleanupProc: pid 461 exited with status 0
    /usr/local/pgsql/bin/postmaster: CleanupProc: pid 406 exited with status 0
    /usr/local/pgsql/bin/postmaster: CleanupProc: pid 358 exited with status 0
    /usr/local/pgsql/bin/postmaster: reaping dead processes...
    ### first SIGTERM signal received
    /usr/local/pgsql/bin/postmaster: reaping dead processes...
    /usr/local/pgsql/bin/postmaster: reaping dead processes...
    ### second SIGTERM signal received
    pmdie 15
    Smart Shutdown request at Fri Jan  5 13:47:13 2001
    ...
The above output seems to indicate that reaper() is firing instead of
pmdie() when the first SIGTERM is signal received.  Hmm...
If postmaster is driven by only one connection, then it always shutdowns
on the first SIGTERM signal.  If postmaster is driven by more than two
connections, then it can require three or more SIGTERM signals.
I have *not* been able to reproduce this problem with 7.0.3 on Red Hat
6.2 Linux.
Is this a known problem?  Has anyone else observed this problem on a
platform other than Cygwin?  This information would be helpful before I
start trudging through the Cygwin DLL...
Thanks,
Jason
--
Jason Tishler
Director, Software Engineering       Phone: +1 (732) 264-8770 x235
Dot Hill Systems Corp.               Fax:   +1 (732) 264-8798
82 Bethany Road, Suite 7             Email: Jason.Tishler@dothill.com
Hazlet, NJ 07730 USA                 WWW:   http://www.dothill.com
			
		On Fri, 5 Jan 2001 14:13:11 -0500 Jason Tishler <Jason.Tishler@dothill.com> wrote: > I am observing the following postmaster shutdown problem with 7.0.3 and > the 12/29/2000 snapshot on Cygwin 1.1.7: > > After postmaster has been driven by multiple simultaneous (JDBC) > connections, postmaster usually requires to receive more the one > SIGTERM signal before it will perform a Smart Shutdown. > > When I run the 7.0.3 postmaster with the "-d 1" option, I get the > following pruned and annotated (indicated by the ### prefix) output for > two simultaneous connections: > > ... > ### last JDBC connection is dropped by client > pq_recvbuf: recv() failed: Connection reset by peer > proc_exit(0) > shmem_exit(0) > exit(0) > /usr/local/pgsql/bin/postmaster: reaping dead processes... > /usr/local/pgsql/bin/postmaster: CleanupProc: pid 461 exited with status 0 > /usr/local/pgsql/bin/postmaster: CleanupProc: pid 406 exited with status 0 > /usr/local/pgsql/bin/postmaster: CleanupProc: pid 358 exited with status 0 > /usr/local/pgsql/bin/postmaster: reaping dead processes... > ### first SIGTERM signal received > /usr/local/pgsql/bin/postmaster: reaping dead processes... > /usr/local/pgsql/bin/postmaster: reaping dead processes... > ### second SIGTERM signal received > pmdie 15 > Smart Shutdown request at Fri Jan 5 13:47:13 2001 > ... > > The above output seems to indicate that reaper() is firing instead of > pmdie() when the first SIGTERM is signal received. Hmm... > > If postmaster is driven by only one connection, then it always shutdowns > on the first SIGTERM signal. If postmaster is driven by more than two > connections, then it can require three or more SIGTERM signals. > > I have *not* been able to reproduce this problem with 7.0.3 on Red Hat > 6.2 Linux. > > Is this a known problem? Has anyone else observed this problem on a > platform other than Cygwin? This information would be helpful before I > start trudging through the Cygwin DLL... It's a bug of cygipc. Cygipc can't catch signals when waiting with semget().I'm trying to fix this. -- Yutaka tanida<yutaka@hi-net.zaq.ne.jp>
Yutaka, On Sat, Jan 06, 2001 at 08:58:48PM +0900, Yutaka tanida wrote: > It's a bug of cygipc. > Cygipc can't catch signals when waiting with semget().I'm trying to fix > this. Thanks for your response *and* especially for trying to fix this problem. I find it curious that you think that the problem is in cygipc. How does cygipc get involved when I am sending the SIGTERM signal directly to postmaster? Thanks, Jason -- Jason Tishler Director, Software Engineering Phone: +1 (732) 264-8770 x235 Dot Hill Systems Corp. Fax: +1 (732) 264-8798 82 Bethany Road, Suite 7 Email: Jason.Tishler@dothill.com Hazlet, NJ 07730 USA WWW: http://www.dothill.com
Jason, On Tue, 9 Jan 2001 09:50:56 -0500 Jason Tishler <Jason.Tishler@dothill.com> wrote: > I find it curious that you think that the problem is in cygipc. How does > cygipc get involved when I am sending the SIGTERM signal directly to > postmaster? Cygipc's implemention ignores all signals. Because, cygwin's signal recieving implemented by WaitForMultipleObject() and signal_arrived variable .Cygipc semaphore is implemented by win32 semaphore and WaitForSingleObject(). So these 2 implemention was same level interrupt on windows . To fix this , we must rewrite every WaitForSingleObject() to handle signals and returns EINTR. --- Yutaka tanida<yutaka@hi-net.zaq.ne.jp>
Jason, > On Sat, Jan 06, 2001 at 08:58:48PM +0900, Yutaka tanida wrote: > > It's a bug of cygipc. > > Cygipc can't catch signals when waiting with semget().I'm trying to fix > > this. > > Thanks for your response *and* especially for trying to fix this > problem. I think this bug will be fixed with attatched patch against cygipc 1.08.Can you test this? --- Yutaka tanida<yutaka@hi-net.zaq.ne.jp>
Вложения
Yutaka,
On Sat, Jan 13, 2001 at 08:29:41PM +0900, Yutaka tanida wrote:
> Jason,
>
> > On Sat, Jan 06, 2001 at 08:58:48PM +0900, Yutaka tanida wrote:
> > > It's a bug of cygipc.
> > > Cygipc can't catch signals when waiting with semget().I'm trying to fix
> > > this.
> >
> > Thanks for your response *and* especially for trying to fix this
> > problem.
>
> I think this bug will be fixed with attatched patch against cygipc
> 1.08.Can you test this?
I'm sorry to inform you that either the above patch doesn't fix this
problem or I did not install it incorrectly.
I used the following procedure:
    1. downloaded and extracted cygipc source from:
       http://www.neuro.gatech.edu/users/cwilson/cygutils/V1.1/cygipc/cygipc-1.08-1-src.tar.gz
    2. applied your patch
    3. removed original cygipc 1.08
    4. make; make install of cygipc
    5. make clean; make; make install of postgres
Then I ran my multiple (JDBC) connection test case and tried to kill
postmaster with a SIGTERM.  Unfortunately, it still took more than one
SIGTERM to shutdown postmaster.
It is very easy to set up your own environment to facilitate tracking
down this problem.  Would you like me to send you my recipe?  I'm just
driving a slightly modified version of the 7.0.3 PostgreSQL JDBC driver
with the Protomatter 1.1.2 JDBC Connection Manager:
    http://sourceforge.net/project/showfiles.php?group_id=261
The Protomatter package comes with a test client that exhibits this
problem.
Thanks,
Jason
--
Jason Tishler
Director, Software Engineering       Phone: +1 (732) 264-8770 x235
Dot Hill Systems Corp.               Fax:   +1 (732) 264-8798
82 Bethany Road, Suite 7             Email: Jason.Tishler@dothill.com
Hazlet, NJ 07730 USA                 WWW:   http://www.dothill.com
			
		Yutaka,
On Mon, Jan 15, 2001 at 01:02:58AM +0900, Yutaka tanida wrote:
> Jason,
>
> > > I think this bug will be fixed with attatched patch against cygipc
> > > 1.08.Can you test this?
> > I'm sorry to inform you that either the above patch doesn't fix this
> > problem or I did not install it incorrectly.
>
> Sorry , I have a mistake.
I'm not sure that I understand your above comment.  Does it mean that your
patch has a mistake in it?
> I test this on 7.1Beta3 , not 7.0.3
I didn't mean to confuse the issue by mentioning the 7.0.3 JDBC driver
in my previous post.  I am also using 7.1Beta3 (i.e., CVS) for my backend.
> I can reproduce this bug on my 7.0.3 env.Following C program can cause
> this probrem,too.
>
> #include <stdio.h>
> #include <libpq-fe.h>
>
> int main() {
>   PGconn *con;
>   int i=0;
>   for(i=0;i<2;i++ )
>     con = PQsetdb("127.0.0.1", "5432",NULL,NULL, "template1");
> }
I can reproduce the problem with the above program too but I have to call
PQsetdb() four times (instead of two) to reproduce the problem.  Anyway,
your method to reproduce the problem is much more minimal than mine.
Thanks,
Jason
--
Jason Tishler
Director, Software Engineering       Phone: +1 (732) 264-8770 x235
Dot Hill Systems Corp.               Fax:   +1 (732) 264-8798
82 Bethany Road, Suite 7             Email: Jason.Tishler@dothill.com
Hazlet, NJ 07730 USA                 WWW:   http://www.dothill.com
			
		Jason, > > Sorry , I have a mistake. > I'm not sure that I understand your above comment. Does it mean that your > patch has a mistake in it? Oh, my mistake is that attached patch can'tfix this problem. It fixes some problem on PostgreSQL , but can't fix it. > I can reproduce the problem with the above program too but I have to call > PQsetdb() four times (instead of two) to reproduce the problem. Anyway, > your method to reproduce the problem is much more minimal than mine. My other enveroment ,NT4 box with cygwin 1.1.7 , I can't reproduce this. --- Yutaka tanida<yutaka@hi-net.zaq.ne.jp>
Yutaka, On Mon, Jan 15, 2001 at 11:42:13PM +0900, Yutaka tanida wrote: > > > Sorry , I have a mistake. > > I'm not sure that I understand your above comment. Does it mean that your > > patch has a mistake in it? > > Oh, my mistake is that attached patch can'tfix this problem. It fixes > some problem on PostgreSQL , but can't fix it. Are you still attempting to fix this problem? Please do not interpret this as a request. I will understand if you do not want or have time to continue with this effort. I just need to know whether or not I should start to debug myself. Thanks, Jason -- Jason Tishler Director, Software Engineering Phone: +1 (732) 264-8770 x235 Dot Hill Systems Corp. Fax: +1 (732) 264-8798 82 Bethany Road, Suite 7 Email: Jason.Tishler@dothill.com Hazlet, NJ 07730 USA WWW: http://www.dothill.com
Jason,
> On Mon, Jan 15, 2001 at 11:42:13PM +0900, Yutaka tanida wrote:
> > > > Sorry , I have a mistake.
> > > I'm not sure that I understand your above comment.  Does it mean that your
> > > patch has a mistake in it?
> >
> > Oh, my mistake is that attached patch can'tfix this problem. It fixes
> > some problem on PostgreSQL , but can't fix it.
>
> Are you still attempting to fix this problem?  Please do not interpret
> this as a request.  I will understand if you do not want or have time to
> continue with this effort.  I just need to know whether or not I should
> start to debug myself.
Today, I'm working on this and finally create a patch against cygwin
1.1.7 . This patch fixes PostgreSQL's problem , but attatched C program
doesn't work correctly.
---
Yutaka tanida<yutaka@hi-net.zaq.ne.jp>
---- testsig.c
#include<signal.h>
#include<unistd.h>
#include<stdio.h>
#include<errno.h>
void recvsig(int);
static sigset_t unblock,block,old;
int main() {
    int pid=getpid();
    int pid2=-1;
    int i=0,k=0;
    char buffer[8];
    signal(SIGUSR1,recvsig);
    signal(SIGUSR2,recvsig);
    sigfillset(&block);
    sigemptyset(&unblock);
    sigprocmask(SIG_SETMASK,&block,&old);
    for(k=0;k<10;k++) {
        kill(pid,SIGUSR1);
    }
    sleep(2);
    for(i=0;i<10;i++) {
      sigprocmask(SIG_SETMASK,&unblock,&old);
      sleep(0);
            kill(pid,SIGUSR2); //comment this!
    }
}
void recvsig(int sig) {
    switch(sig) {
    case SIGUSR1:
      printf("SIGUSR1\n");
      break;
    case SIGUSR2:
      printf("SIGUSR2\n");
      break;
    default:
      printf("UNKNOWN\n");
      break;
    }
    usleep(100000);
}
			
		Вложения
Yutaka, On Fri, Jan 19, 2001 at 09:19:34PM +0900, Yutaka tanida wrote: > > Are you still attempting to fix this problem? Please do not interpret > > this as a request. I will understand if you do not want or have time to > > continue with this effort. I just need to know whether or not I should > > start to debug myself. > > Today, I'm working on this and finally create a patch against cygwin > 1.1.7 . This patch fixes PostgreSQL's problem , but attatched C program > doesn't work correctly. Thanks for the status update. Do you think that you will be able to find a solution that fixes the PostgreSQL problem *without* breaking Cygwin? Originally you thought that the problem was in cygipc. Have you concluded that the problem is actually in Cygwin? BTW, this problem is becoming a really nuisance. I was running many regression tests sequentially last night. If I forget to kill postmaster from the previous test, then the next regression test would fail because the regression database "already existed." When I do remember to kill postmaster it takes between 5 - 10 kills to really kill it. Thanks, Jason -- Jason Tishler Director, Software Engineering Phone: +1 (732) 264-8770 x235 Dot Hill Systems Corp. Fax: +1 (732) 264-8798 82 Bethany Road, Suite 7 Email: Jason.Tishler@dothill.com Hazlet, NJ 07730 USA WWW: http://www.dothill.com
Jason, On Mon, 22 Jan 2001 16:36:29 -0500 Jason Tishler <Jason.Tishler@dothill.com> wrote: > Thanks for the status update. Do you think that you will be able to > find a solution that fixes the PostgreSQL problem *without* breaking > Cygwin? Perhaps I can't. > Originally you thought that the problem was in cygipc. Have you > concluded that the problem is actually in Cygwin? Yes. I can reproduce the probrem without Cygipc and PostgreSQL so I think It's bug of cygwin.(see C program attatched in previous mail) -- Yutaka tanida <yutaka@hi-net.zaq.ne.jp>
On Tue, Jan 23, 2001 at 01:11:13PM +0900, Yutaka tanida wrote:
> Yes. I can reproduce the probrem without Cygipc and PostgreSQL so I
> think It's bug of cygwin.(see C program attatched in previous mail)
This problem appears to be solved in the latest Cygwin CVS.  See the
following, if interested:
    http://www.cygwin.com/ml/cygwin-developers/2001-02/msg00018.html
Jason
--
Jason Tishler
Director, Software Engineering       Phone: +1 (732) 264-8770 x235
Dot Hill Systems Corp.               Fax:   +1 (732) 264-8798
82 Bethany Road, Suite 7             Email: Jason.Tishler@dothill.com
Hazlet, NJ 07730 USA                 WWW:   http://www.dothill.com