pg_listener entries deleted under heavy NOTIFY load only on Windows

Поиск
Список
Период
Сортировка
От Radu Ilie
Тема pg_listener entries deleted under heavy NOTIFY load only on Windows
Дата
Msg-id 1264195826.2328.159.camel@arc-dev2.wsicorp.com
обсуждение исходный текст
Ответы Re: pg_listener entries deleted under heavy NOTIFY load only on Windows
Re: pg_listener entries deleted under heavy NOTIFY load only on Windows
Список pgsql-hackers
On a Windows server under heavy load of NOTIFY events, entries in pg_listener table for some events are deleted. It is
likeUNLISTEN was called.<br /><br /> PostgreSQL version: 8.3.9.<br /> Operating System: Windows XP.<br /><br />
PostgreSQLbelieves that if it fails to notify a listener (by signaling the respective backend), then the backend
doesn'texist anymore and so it should get rid of the pg_listener entry. The relevant code is in
src/backend/commands/async.c,function Send_Notify:<br /><br /> if (kill(listenerPID, SIGUSR2) < 0)<br /> {<br />
/*<br/> * Get rid of pg_listener entry if it refers to a PID that no<br /> * longer exists.  Presumably, that backend
crashedwithout<br /> * deleting its pg_listener entries. This code used to only<br /> * delete the entry if
errno==ESRCH,but as far as I can see<br /> * we should just do it for any failure (certainly at least<br /> * for EPERM
too...)<br/> */<br /> simple_heap_delete(lRel, &lTuple->t_self);<br /> }<br /><br /> The problem is that under
Windows,kill can fail even if the process is still alive. PostgreSQL uses named pipes under Windows to send signals to
backends.The present implementation has a bug that causes a client to fail to write data to the named pipe, even though
theserver process is alive. This is because the server doesn't maintain the named pipe at all times. The bug is present
infile src/backend/port/win32/signal.c, function pg_signal_thread.<br /><br /> The server code stays in a loop in which
itcontinuously creates an instance of the named pipe (via CreateNamedPipe) and waits for a client process to connect
(viaConnectNamedPipe). Once a client connects, the communication with the client is handled in a new thread, with the
threadprocedure pg_signal_dispatch_thread. This function is very simple: it reads one byte from the named pipe, it
writesit back and (very important) closes the handle to the named pipe instance. The main loop creates another instance
ofthe named pipe and waits for another client to connect.<br /><br /> Now imagine that the server is under heavy load.
Thereare dozens of backends and threads running and the CPU usage is close to 100%. The following succession of events
ispossible:<br /><br /> 1. Server signal handling thread (in function pg_signal_thread) creates the first, one and only
instanceof the named pipe via CreateNamedPipe.<br /> 2. Server code starts waiting for clients to connect with
ConnectNamedPipe.<br/> 3. Client wishes to make a transaction on the named pipe and calls CallNamedPipe (in file
src/port/kill.c,function pgkill).<br /> 4. Server code returns from ConnectNamedPipe. It creates a new thread with the
threadprocedure pg_signal_dispatch_thread.<br /> 5. The signal dispatch thread is scheduled for execution and it runs
tocompletion. As you can see, the last thing it does related to the named pipe is to close the handle via CloseHandle
(infunction pg_signal_dispatch_thread). This closes the last instance of the named pipe. The named pipe is gone. There
isno more named pipe. The signal handling thread was not yet scheduled by the operating system for execution and thus
didn'thave an opportunity to call CreateNamedPipe.<br /> 6. Another client (or the same one, it doesn't matter) tries
towrite to the named pipe via CallNamedPipe. The call returns ERROR_FILE_NOT_FOUND, because the named pipe is gone. The
clientbelieves the backend is gone and it removes the entry from pg_listener.<br /> 7. The signal handling thread (in
functionpg_signal_thread) is finally scheduled for execution and it calls CreateNamedPipe. We now have an instance of
thenamed pipe available.<br /><br /> So we end up with the server backend alive, the named pipe is there, but the row
isgone from pg_listener. This is easy to reproduce under Windows. I used the scripts posted by Steve Marshall in a
similarthread from 01/15/2009 and the problem appears within one minute all the time. For testing I used a Windows XP
machinewith 2 cores and 2GB of RAM. The CPU usage was over 70% during the trials.<br /><br /> The solution is to create
anew instance of the named pipe before launching the signal dispatch thread. This means changing the code in
src/backend/port/win32/signal.cto look like this:<br /><br /> @@ -250,6 +250,7 @@<br /> {<br />         char           
pipename[128];<br/>         HANDLE          pipe = pgwin32_initial_signal_pipe;<br /> +       HANDLE          new_pipe
=pgwin32_initial_signal_pipe;<br /><br />         snprintf(pipename, sizeof(pipename), "\\\\.\\pipe\\pgsignal_%u",
GetCurrentProcessId());<br/><br /> @@ -275,6 +276,10 @@<br />                 fConnected = ConnectNamedPipe(pipe, NULL)
?TRUE : (GetLastError() == ERROR_PIPE_CONNECTED);<br />                 if (fConnected)<br />                 {<br />
+                      new_pipe = CreateNamedPipe(pipename, PIPE_ACCESS_DUPLEX,<br />
+                                         PIPE_TYPE_MESSAGE | PIPE_READMODE_MESSAGE | PIPE_WAIT,<br />
+                                                         PIPE_UNLIMITED_INSTANCES, 16, 16, 1000, NULL);<br /> +<br />
                       hThread = CreateThread(NULL, 0,<br />                                                  
(LPTHREAD_START_ROUTINE)pg_signal_dispatch_thread,<br />
                                                                  (LPVOID) pipe, 0, NULL);<br /> @@ -288,8 +293,7 @@<br
/>                        /* Connection failed. Cleanup and try again */<br />                        
CloseHandle(pipe);<br/><br /> -               /* Set up so we create a new pipe on next loop */<br /> -              
pipe= INVALID_HANDLE_VALUE;<br /> +               pipe = new_pipe;<br />         }<br />         return 0;<br /> }<br
/><br/> This will guarantee that we have an instance of the named pipe available at any given moment. If we do this, we
canalso remove the 3 tries loop from src/port/kill.c:<br /><br /> @@ -25,7 +25,6 @@<br />         BYTE           
sigData= sig;<br />         BYTE            sigRet = 0;<br />         DWORD           bytes;<br /> -      
int                    pipe_tries;<br /><br />         /* we allow signal 0 here, but it will be ignored in
pg_queue_signal*/<br />         if (sig >= PG_SIGNAL_COUNT || sig < 0)<br /> @@ -41,14 +40,6 @@<br />        
}<br/>         snprintf(pipename, sizeof(pipename), "\\\\.\\pipe\\pgsignal_%u", pid);<br /><br /> -       /*<br />
-       * Writing data to the named pipe can fail for transient reasons.<br /> -        * Therefore, it is useful to
retryif it fails.  The maximum number of<br /> -        * calls to make was empirically determined from a 90-hour
notification<br/> -        * stress test.<br /> -        */<br /> -       for (pipe_tries = 0; pipe_tries < 3;
pipe_tries++)<br/> -       {<br />                 if (CallNamedPipe(pipename, &sigData, 1, &sigRet, 1,
&bytes,1000))<br />                 {<br />                         if (bytes != 1 || sigRet != sig)<br /> @@ -58,7
+49,6@@<br />                         }<br />                         return 0;<br />                 }<br /> -      
}<br/><br />         if (GetLastError() == ERROR_FILE_NOT_FOUND)<br />                 errno = ESRCH;<br /><br /> As a
note,the original code has a timeout of 1000 milliseconds specified in the call to CallNamedPipe in kill.c. That
timeoutdoesn't help for this bug, because it is the timeout that the client will wait for an instance of the named pipe
tobe available for communication. This means it helps in the case when we have one client already using that instance
toread/write to the server and this second client will wait until the first one finishes. It does not help if there is
noinstance at all. In that case the CallNamedPipe returns immediately with ERROR_FILE_NOT_FOUND.<br /><br /> Thanks,<br
/>Radu<br /><br /><br /> 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Kevin Grittner"
Дата:
Сообщение: Re: Largeobject Access Controls (r2460)
Следующее
От: Dimitri Fontaine
Дата:
Сообщение: Re: commit fests