Обсуждение: statistics process shutting down

Поиск
Список
Период
Сортировка

statistics process shutting down

От
"Merlin Moncure"
Дата:
During load testing, I'm getting the following error:

FATAL:  could not read from statistics collector pipe: No error
LOG:  statistics collector process (PID 1108) was terminated by signal 1

After which the statistics collector restarts.  I have statement level
and block level stats turned on.  The server still hums along happily
but after the restart both pg_stat_database and pg_stat_activity lie
about how many backends are connected.

Merlin

Re: statistics process shutting down

От
Tom Lane
Дата:
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
> During load testing, I'm getting the following error:
> FATAL:  could not read from statistics collector pipe: No error
> LOG:  statistics collector process (PID 1108) was terminated by signal 1

Evidently coming from here:

        len = piperead(readPipe, ((char *) &msg) + nread,
                       targetlen - nread);
        if (len < 0)
        {
            if (errno == EINTR)
                continue;
            ereport(ERROR,
                    (errcode_for_socket_access(),
                     errmsg("could not read from statistics collector pipe: %m")));
        }

So why is piperead() failing, and why doesn't it set errno to something
useful?

            regards, tom lane

Re: statistics process shutting down

От
"Merlin Moncure"
Дата:
> Evidently coming from here:
>
>         len = piperead(readPipe, ((char *) &msg) + nread,
>                        targetlen - nread);
>         if (len < 0)
>         {
>             if (errno == EINTR)
>                 continue;
>             ereport(ERROR,
>                     (errcode_for_socket_access(),
>                      errmsg("could not read from statistics collector
> pipe: %m")));
>         }
>
> So why is piperead() failing, and why doesn't it set errno to
something
> useful?

Well, the win32 piperead() is really just a call to recv() (vs unix
read()).  Here is the win32 implemenation:
int
piperead(int s, char *buf, int len)
{
    int            ret = recv(s, buf, len, 0);

    if (ret < 0 && WSAGetLastError() == WSAECONNRESET)
        /* EOF on the pipe! (win32 socket based implementation)
*/
        ret = 0;
    return ret;
}

I think the key to this puzzle is the return code from
WSAGetLastError().  Also, the WSA call *might* be masking the value of
errno.  I did a quick search and came up with this:
http://archives.postgresql.org/pgsql-patches/2001-10/msg00160.php

I think maybe errno needs to get set to WSAGetLastError().  In pipe.c:

if (ret < 0)
{
   int  wsa_errno;
   wsa_errno = WSAGetLastError();

   if (WSAECONNRESET == wsa_errno)
   {   /* EOF on the pipe! (win32 socket based implementation) */
      ret = 0;
   }
   else
   {
      errno = wsa_errno;  /* this *might* be ok */
   }
}

return ret;

Maybe Magnus might comment here.  This doesn't explain why the read call
is failing but I'm pretty sure the error code is not being properly
returned.

Merlin

Re: statistics process shutting down

От
Tom Lane
Дата:
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
> if (ret < 0)
> {
>    int  wsa_errno;
>    wsa_errno = WSAGetLastError();

>    if (WSAECONNRESET == wsa_errno)
>    {   /* EOF on the pipe! (win32 socket based implementation) */
>       ret = 0;
>    }
>    else
>    {
>       errno = wsa_errno;  /* this *might* be ok */
>    }
> }

> Maybe Magnus might comment here.  This doesn't explain why the read call
> is failing but I'm pretty sure the error code is not being properly
> returned.

I recall a lot of angst about whether the encoding of WSAGetLastError()
is compatible with errno values, but I forget what the conclusion was.
Can we just assign to errno like that, or do we need a mapping function?

            regards, tom lane

Re: statistics process shutting down

От
"Merlin Moncure"
Дата:
> > Maybe Magnus might comment here.  This doesn't explain why the read
call
> > is failing but I'm pretty sure the error code is not being properly
> > returned.
>
> I recall a lot of angst about whether the encoding of
WSAGetLastError()
> is compatible with errno values, but I forget what the conclusion was.
> Can we just assign to errno like that, or do we need a mapping
function?

Ah.  My bad.
/backend/prot/win32/socket.c

static void
TranslateSocketError(void) etc.  This puts the value in errno like it is
suppoed to be.  MS recv() does not (I checked) so this is definately a
bug in pipe.c, since it is reasonable for the caller to expect errno to
be set to something.

Also, recv() and read() return completely different error codes :-).  So
any caller has to be careful not to make assumptions about errno.

Merlin

Re: statistics process shutting down

От
"Merlin Moncure"
Дата:
Whoop! I missed something here.

In src/include/port/win32.h, recv is #defined to pgwin32_recv(s, buf,
len, flags).  This version of the function appears to do all the errno
mapping, etc.  So pipe.c is correct, or at least I have no answer as to
why the error code is not showing up in my log :(.

Merlin


Re: statistics process shutting down

От
Tom Lane
Дата:
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
> Ah.  My bad.
> /backend/port/win32/socket.c

> static void
> TranslateSocketError(void) etc.  This puts the value in errno like it is
> suppoed to be.

Hmm.  I am strongly tempted to remove src/port/pipe.c and put it into
src/backend/port/win32/socket.c, so it can share this function.

> MS recv() does not (I checked) so this is definately a
> bug in pipe.c, since it is reasonable for the caller to expect errno to
> be set to something.

Can you provide a patch?  I'm hesitant to hack code I don't have the
means to test.

            regards, tom lane

Re: statistics process shutting down

От
Bruce Momjian
Дата:
Is this an open item?

---------------------------------------------------------------------------

Merlin Moncure wrote:
>
> > > Maybe Magnus might comment here.  This doesn't explain why the read
> call
> > > is failing but I'm pretty sure the error code is not being properly
> > > returned.
> >
> > I recall a lot of angst about whether the encoding of
> WSAGetLastError()
> > is compatible with errno values, but I forget what the conclusion was.
> > Can we just assign to errno like that, or do we need a mapping
> function?
>
> Ah.  My bad.
> /backend/prot/win32/socket.c
>
> static void
> TranslateSocketError(void) etc.  This puts the value in errno like it is
> suppoed to be.  MS recv() does not (I checked) so this is definately a
> bug in pipe.c, since it is reasonable for the caller to expect errno to
> be set to something.
>
> Also, recv() and read() return completely different error codes :-).  So
> any caller has to be careful not to make assumptions about errno.
>
> Merlin
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: statistics process shutting down

От
"Merlin Moncure"
Дата:
Bruce Momijan wrote:
>
> Is this an open item?
>
I would say yes, referring to the statistics collector restarting under
heavy load conditions.

Unfortunately, the server where I was most frequently getting the error
went into production on Wednesday (with stats collector turned off).  So
testing the problem there is basically out until I can set up some kind
of simulated test here at the office.

Merlin