Обсуждение: [ADMIN] Postgres will not allow new connections, suspended process, waiting error

Поиск
Список
Период
Сортировка

[ADMIN] Postgres will not allow new connections, suspended process, waiting error

От
Prateek Mahajan
Дата:

We have a recurring problem where Postgres will not allow any new connections. Any connection already made keeps working fine. Every time this happens, we also have one suspended postgres.exe process that can not be cancelled. Also, we get the following error in the logs every minute: ‘WARNING:  worker took too long to start; canceled’.


We’re using Postgres 9.5 on Windows Server 2012 R2. Our original hypothesis was that there was something wrong with the autovac settings, but we copied the settings over from our other server that does not have this problem and still continue to have problem. 



Re: [ADMIN] Postgres will not allow new connections, suspendedprocess, waiting error

От
Brian Farrell
Дата:
Have you checked to see if you're bumping up against the max_connections limit? https://www.postgresql.org/docs/current/static/runtime-config-connection.html#GUC-MAX-CONNECTIONS

-Brian

On Fri, Jun 30, 2017 at 4:39 PM, Prateek Mahajan <prateekm99@gmail.com> wrote:

We have a recurring problem where Postgres will not allow any new connections. Any connection already made keeps working fine. Every time this happens, we also have one suspended postgres.exe process that can not be cancelled. Also, we get the following error in the logs every minute: ‘WARNING:  worker took too long to start; canceled’.


We’re using Postgres 9.5 on Windows Server 2012 R2. Our original hypothesis was that there was something wrong with the autovac settings, but we copied the settings over from our other server that does not have this problem and still continue to have problem. 






--
Brian Farrell
Project Associate
Center for Geospatial Information Technology

Re: [ADMIN] Postgres will not allow new connections, suspendedprocess, waiting error

От
Prateek Mahajan
Дата:
Maximum connection is never reached

--
Prateek Mahajan
McCombs School of Business | Business Honors and Finance 
Cockrell School of Engineering | Mechanical Engineering 
The University of Texas at Austin
(832) 420-2622 | prateekm99@gmail.com



On Fri, Jun 30, 2017 at 3:46 PM, Brian Farrell <brian.farrell@vt.edu> wrote:
Have you checked to see if you're bumping up against the max_connections limit? https://www.postgresql.org/docs/current/static/runtime-config-connection.html#GUC-MAX-CONNECTIONS

-Brian

On Fri, Jun 30, 2017 at 4:39 PM, Prateek Mahajan <prateekm99@gmail.com> wrote:

We have a recurring problem where Postgres will not allow any new connections. Any connection already made keeps working fine. Every time this happens, we also have one suspended postgres.exe process that can not be cancelled. Also, we get the following error in the logs every minute: ‘WARNING:  worker took too long to start; canceled’.


We’re using Postgres 9.5 on Windows Server 2012 R2. Our original hypothesis was that there was something wrong with the autovac settings, but we copied the settings over from our other server that does not have this problem and still continue to have problem. 






--
Brian Farrell
Project Associate
Center for Geospatial Information Technology


Re: [ADMIN] Postgres will not allow new connections, suspendedprocess, waiting error

От
Prateek Mahajan
Дата:
More details.

Environment
PostgreSQL 9.5, EnterpriseDB Postgres installer
Windows Server 2012R2 with Active Directory
Symantec End Point Protection

Symptom:

After about 1 week of running, one of PostgreSQL process (postgres.exe) showed "suspended" in task manager, and I cannot kill it in the task manager ("Access Denied" error message appeared). This "suspended" process was not the master PID as indicated in postmaster.pid file. 
Current live connections still work but one cannot establish new connections. The only solution that I have is to restart the Server 
Other information:

The PostgreSQL service is run under a domain account.
The maximum connection was never reached as it is set as 1000 and we only had about 10 connections.
There was plenty of available memory there. The total memory is 288GB and only 8% was used
There were minimum hard drive activities as it occurred. The C drive where PostgreSQL was installed had about 86GB of free space.
There are additional 4 table spaces that are not on C drive but spread over 4 hard drives. Each of 4 hard drives has more than 500GB of space.
we have been using the same configuration files for years and the same file is also used on a second PostgreSQL server, which does not have the issue at all.
The PostgreSQL logs had something like this when this happened and it continues to produce this warning message every minute or so:

2017-06-28 19:40:21 CDT WARNING:  worker took too long to start; canceled
2017-06-28 19:41:21 CDT WARNING:  worker took too long to start; canceled
2017-06-28 19:42:21 CDT WARNING:  worker took too long to start; canceled
2017-06-28 19:43:21 CDT WARNING:  worker took too long to start; canceled



On Fri, Jun 30, 2017 at 4:28 PM, Prateek Mahajan <prateekm99@gmail.com> wrote:
Maximum connection is never reached

--
Prateek Mahajan
McCombs School of Business | Business Honors and Finance 
Cockrell School of Engineering | Mechanical Engineering 
The University of Texas at Austin



On Fri, Jun 30, 2017 at 3:46 PM, Brian Farrell <brian.farrell@vt.edu> wrote:
Have you checked to see if you're bumping up against the max_connections limit? https://www.postgresql.org/docs/current/static/runtime-config-connection.html#GUC-MAX-CONNECTIONS

-Brian

On Fri, Jun 30, 2017 at 4:39 PM, Prateek Mahajan <prateekm99@gmail.com> wrote:

We have a recurring problem where Postgres will not allow any new connections. Any connection already made keeps working fine. Every time this happens, we also have one suspended postgres.exe process that can not be cancelled. Also, we get the following error in the logs every minute: ‘WARNING:  worker took too long to start; canceled’.


We’re using Postgres 9.5 on Windows Server 2012 R2. Our original hypothesis was that there was something wrong with the autovac settings, but we copied the settings over from our other server that does not have this problem and still continue to have problem. 






--
Brian Farrell
Project Associate
Center for Geospatial Information Technology



Re: [ADMIN] Postgres will not allow new connections, suspendedprocess, waiting error

От
"David G. Johnston"
Дата:
On Fri, Jun 30, 2017 at 3:59 PM, Prateek Mahajan <prateekm99@gmail.com> wrote:
Current live connections still work but one cannot establish new connections.

​Your problem description makes it sound like your server is presently in "smart" shutdown mode.  In 9.5 that is no longer the default but still possible.

David J.

Re: [ADMIN] Postgres will not allow new connections, suspendedprocess, waiting error

От
Magnus Hagander
Дата:


On Sat, Jul 1, 2017 at 12:59 AM, Prateek Mahajan <prateekm99@gmail.com> wrote:
More details.

Environment
PostgreSQL 9.5, EnterpriseDB Postgres installer
Windows Server 2012R2 with Active Directory
Symantec End Point Protection

Symptom:

After about 1 week of running, one of PostgreSQL process (postgres.exe) showed "suspended" in task manager, and I cannot kill it in the task manager ("Access Denied" error message appeared). This "suspended" process was not the master PID as indicated in postmaster.pid file. 
Current live connections still work but one cannot establish new connections. The only solution that I have is to restart the Server 
Other information:

The PostgreSQL service is run under a domain account.
The maximum connection was never reached as it is set as 1000 and we only had about 10 connections.
There was plenty of available memory there. The total memory is 288GB and only 8% was used
There were minimum hard drive activities as it occurred. The C drive where PostgreSQL was installed had about 86GB of free space.
There are additional 4 table spaces that are not on C drive but spread over 4 hard drives. Each of 4 hard drives has more than 500GB of space.
we have been using the same configuration files for years and the same file is also used on a second PostgreSQL server, which does not have the issue at all.
The PostgreSQL logs had something like this when this happened and it continues to produce this warning message every minute or so:

2017-06-28 19:40:21 CDT WARNING:  worker took too long to start; canceled
2017-06-28 19:41:21 CDT WARNING:  worker took too long to start; canceled
2017-06-28 19:42:21 CDT WARNING:  worker took too long to start; canceled
2017-06-28 19:43:21 CDT WARNING:  worker took too long to start; canceled


Those are autovacuum workers trying to start. My guess is that's a symptom of the same basic problem, which is that your machine behaves as if it's heavily overloaded.

As a first try I'd attempt removing the Symantec Endpoint stuff and see if that helps. It's very common that software like that breaks the database. And being unable to kill things in the task manager clearly indicates the problem lies outside the control of Postgres. 

--