Обсуждение: statistics process shutting down
During load testing, I'm getting the following error: FATAL: could not read from statistics collector pipe: No error LOG: statistics collector process (PID 1108) was terminated by signal 1 After which the statistics collector restarts. I have statement level and block level stats turned on. The server still hums along happily but after the restart both pg_stat_database and pg_stat_activity lie about how many backends are connected. Merlin
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
> During load testing, I'm getting the following error:
> FATAL: could not read from statistics collector pipe: No error
> LOG: statistics collector process (PID 1108) was terminated by signal 1
Evidently coming from here:
len = piperead(readPipe, ((char *) &msg) + nread,
targetlen - nread);
if (len < 0)
{
if (errno == EINTR)
continue;
ereport(ERROR,
(errcode_for_socket_access(),
errmsg("could not read from statistics collector pipe: %m")));
}
So why is piperead() failing, and why doesn't it set errno to something
useful?
regards, tom lane
> Evidently coming from here:
>
> len = piperead(readPipe, ((char *) &msg) + nread,
> targetlen - nread);
> if (len < 0)
> {
> if (errno == EINTR)
> continue;
> ereport(ERROR,
> (errcode_for_socket_access(),
> errmsg("could not read from statistics collector
> pipe: %m")));
> }
>
> So why is piperead() failing, and why doesn't it set errno to
something
> useful?
Well, the win32 piperead() is really just a call to recv() (vs unix
read()). Here is the win32 implemenation:
int
piperead(int s, char *buf, int len)
{
int ret = recv(s, buf, len, 0);
if (ret < 0 && WSAGetLastError() == WSAECONNRESET)
/* EOF on the pipe! (win32 socket based implementation)
*/
ret = 0;
return ret;
}
I think the key to this puzzle is the return code from
WSAGetLastError(). Also, the WSA call *might* be masking the value of
errno. I did a quick search and came up with this:
http://archives.postgresql.org/pgsql-patches/2001-10/msg00160.php
I think maybe errno needs to get set to WSAGetLastError(). In pipe.c:
if (ret < 0)
{
int wsa_errno;
wsa_errno = WSAGetLastError();
if (WSAECONNRESET == wsa_errno)
{ /* EOF on the pipe! (win32 socket based implementation) */
ret = 0;
}
else
{
errno = wsa_errno; /* this *might* be ok */
}
}
return ret;
Maybe Magnus might comment here. This doesn't explain why the read call
is failing but I'm pretty sure the error code is not being properly
returned.
Merlin
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
> if (ret < 0)
> {
> int wsa_errno;
> wsa_errno = WSAGetLastError();
> if (WSAECONNRESET == wsa_errno)
> { /* EOF on the pipe! (win32 socket based implementation) */
> ret = 0;
> }
> else
> {
> errno = wsa_errno; /* this *might* be ok */
> }
> }
> Maybe Magnus might comment here. This doesn't explain why the read call
> is failing but I'm pretty sure the error code is not being properly
> returned.
I recall a lot of angst about whether the encoding of WSAGetLastError()
is compatible with errno values, but I forget what the conclusion was.
Can we just assign to errno like that, or do we need a mapping function?
regards, tom lane
> > Maybe Magnus might comment here. This doesn't explain why the read call > > is failing but I'm pretty sure the error code is not being properly > > returned. > > I recall a lot of angst about whether the encoding of WSAGetLastError() > is compatible with errno values, but I forget what the conclusion was. > Can we just assign to errno like that, or do we need a mapping function? Ah. My bad. /backend/prot/win32/socket.c static void TranslateSocketError(void) etc. This puts the value in errno like it is suppoed to be. MS recv() does not (I checked) so this is definately a bug in pipe.c, since it is reasonable for the caller to expect errno to be set to something. Also, recv() and read() return completely different error codes :-). So any caller has to be careful not to make assumptions about errno. Merlin
Whoop! I missed something here. In src/include/port/win32.h, recv is #defined to pgwin32_recv(s, buf, len, flags). This version of the function appears to do all the errno mapping, etc. So pipe.c is correct, or at least I have no answer as to why the error code is not showing up in my log :(. Merlin
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
> Ah. My bad.
> /backend/port/win32/socket.c
> static void
> TranslateSocketError(void) etc. This puts the value in errno like it is
> suppoed to be.
Hmm. I am strongly tempted to remove src/port/pipe.c and put it into
src/backend/port/win32/socket.c, so it can share this function.
> MS recv() does not (I checked) so this is definately a
> bug in pipe.c, since it is reasonable for the caller to expect errno to
> be set to something.
Can you provide a patch? I'm hesitant to hack code I don't have the
means to test.
regards, tom lane
Is this an open item? --------------------------------------------------------------------------- Merlin Moncure wrote: > > > > Maybe Magnus might comment here. This doesn't explain why the read > call > > > is failing but I'm pretty sure the error code is not being properly > > > returned. > > > > I recall a lot of angst about whether the encoding of > WSAGetLastError() > > is compatible with errno values, but I forget what the conclusion was. > > Can we just assign to errno like that, or do we need a mapping > function? > > Ah. My bad. > /backend/prot/win32/socket.c > > static void > TranslateSocketError(void) etc. This puts the value in errno like it is > suppoed to be. MS recv() does not (I checked) so this is definately a > bug in pipe.c, since it is reasonable for the caller to expect errno to > be set to something. > > Also, recv() and read() return completely different error codes :-). So > any caller has to be careful not to make assumptions about errno. > > Merlin > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momijan wrote: > > Is this an open item? > I would say yes, referring to the statistics collector restarting under heavy load conditions. Unfortunately, the server where I was most frequently getting the error went into production on Wednesday (with stats collector turned off). So testing the problem there is basically out until I can set up some kind of simulated test here at the office. Merlin