Обсуждение: Unable to connect. Log : PDTWARNING: worker took too long to start; canceled

Поиск
Список
Период
Сортировка

Unable to connect. Log : PDTWARNING: worker took too long to start; canceled

От
Linus Kamb
Дата:

Hi,

Every morning for the past several days, when I come in to work, I find the postgres process running at 100% CPU and the logs filled with entries every minute with the warning "PDTWARNING:  worker took too long to start; canceled"

This appears to be almost exactly as this other thread:


except I am running postgres 9.1 on OSX 10.7.5

current postgres processes are:

picasso:~ postrgres$ ps -ef | grep post
  642 69200 69199   0 Tue08AM ??         0:01.61 postgres: logger process     
  642 69202 69199   0 Tue08AM ??         0:07.27 postgres: writer process     
  642 69203 69199   0 Tue08AM ??         0:05.17 postgres: wal writer process     
  642 69204 69199   0 Tue08AM ??         0:02.48 postgres: autovacuum launcher process     
  642 69205 69199   0 Tue08AM ??         0:03.71 postgres: stats collector process     
  642 69226 69199   0 Tue08AM ??         0:00.01 postgres: postgres postgres [local] idle  
  642 69247 69199   0  9:04AM ??         0:00.05 postgres: postgres postgres ::1(58621) idle  
  642 70284 69199   0 11:38AM ??         0:00.00 (postgres)
  642 70286 69199   0 11:38AM ??         0:00.00 (postgres)
  642 70290 69199   0 11:39AM ??         0:00.00 (postgres)
  642 72214 69199   0  6:13PM ??         0:00.00 (postgres)
  642 72232 69199   0  6:18PM ??         0:00.00 (postgres)
  642 72233 69199   0  6:18PM ??         0:00.00 (postgres)
  642 69199     1   0 Tue08AM ttys006  881:48.66 /usr/local/pgsql-9.1/bin/postgres -D data

I am unable to establish new connections to the database.  New command line psql requests hang effectively forever, as does trying to connect with pgadmin.   Existing psql sessions remain active and responsive, unless I try to connect to a different database.

The only recourse has been to kill -9 the postgres process and restart.

I am not familiar with strace (dtrace on OSX, I guess...) so I'm not sure what I'm looking for.

I did do % dtrace -P syscall -p 69199

and I notice now that postgres is no longer using 100% of the cpu, but I am still unable to establish new connections. And the warning messages are still being written to the log every minute.

The only thing that I can think of that has changed recently is that I have been experimenting with container-managed security (FORM-based authentication) using Glassfish 3.1.2.

Apologies if this was answered somewhere and I haven't found it.

Thanks,
Linus



Re: Unable to connect. Log : PDTWARNING: worker took too long to start; canceled

От
Alvaro Herrera
Дата:
Linus Kamb escribió:
> Hi,
>
> Every morning for the past several days, when I come in to work, I find the
> postgres process running at 100% CPU and the logs filled with entries every
> minute with the warning "PDTWARNING:  worker took too long to start;
> canceled"
>
> This appears to be almost exactly as this other thread:
>
> http://www.postgresql.org/message-id/AANLkTimt7_wbadtOkM9HqzeHmADmxb2TvuQD5yBUwkdZ@mail.gmail.com

I noticed that the thread was missing a couple of emails in the
archives, and that I had them in my mbox.  I have just forwarded them
("bounced" them in mutt's terms) to the list and I think the archive is
complete now for that thread; I apologize to subscribers that just
received a couple of very old emails.

Anyway, that thread died without finding the root cause of the problem,
but Tom suspected that postmaster was being kept busy doing something
that precluded it from running its usual tasks.  For the OP of the other
thread, stracing the postmaster seemed to bring it back to normal life.

Would it be possible to attach to the postmaster with GDB and get a
stack trace when it's stuck?  That might be helpful; or, if the mere act
of tracing it brings it back to life, it might not -- but instead it
would make me think it's a platform bug of some sort.

If you remove the Glassfish thing, does the problem go away?

(Note: the PDTWARNING thing happens because you don't have a white space
at the end of your log_line_prefix.  Adding one does wonders for
readability.)

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


Re: Unable to connect. Log : PDTWARNING: worker took too long to start; canceled

От
Linus Kamb
Дата:
Thanks for the response - and explanation of those emails.  I was somewhat surprised that there were no further discussions of that issue in the archives and that it would be left hanging like that.  And then further confused by the receipt of those old emails!

I have killed and restarted postgres, so I will have to wait a day, I suppose, to see if it hangs again.

I was no where near the connection limit, however, so it wasn't that.  And I didn't have all those dead postmaster processes.

Interestingly (maybe), when I did a ps when it was hung, there was NO postmaster process, just a postgres process.

I believe when I restarted yesterday, I used the pg_ctl start command.

Today I used the OSX launchctl start command (what would happen at reboot), and now I DO see a postmaster process.

  642 72833     1   0  9:34AM ??         0:00.23 /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data
  642 72837 72833   0  9:34AM ??         0:00.10 postgres: logger process     
  642 72839 72833   0  9:34AM ??         0:00.43 postgres: writer process     
  642 72840 72833   0  9:34AM ??         0:00.31 postgres: wal writer process     
  642 72841 72833   0  9:34AM ??         0:00.11 postgres: autovacuum launcher process     
  642 72842 72833   0  9:34AM ??         0:00.17 postgres: stats collector process     
  642 72860 72833   0  9:35AM ??         0:00.18 postgres: postgres postgres ::1(54138) idle  
  642 72861 72833   0  9:35AM ??         0:00.06 postgres: postgres test ::1(54139) idle  

I have also temporarily disabled the glassfish thread pool that was connecting to the database.

I will continue to try to isolate a (possible) cause.

thanks for your help,
Linus



On Wed, Oct 23, 2013 at 10:37 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Linus Kamb escribió:
> Hi,
>
> Every morning for the past several days, when I come in to work, I find the
> postgres process running at 100% CPU and the logs filled with entries every
> minute with the warning "PDTWARNING:  worker took too long to start;
> canceled"
>
> This appears to be almost exactly as this other thread:
>
> http://www.postgresql.org/message-id/AANLkTimt7_wbadtOkM9HqzeHmADmxb2TvuQD5yBUwkdZ@mail.gmail.com

I noticed that the thread was missing a couple of emails in the
archives, and that I had them in my mbox.  I have just forwarded them
("bounced" them in mutt's terms) to the list and I think the archive is
complete now for that thread; I apologize to subscribers that just
received a couple of very old emails.

Anyway, that thread died without finding the root cause of the problem,
but Tom suspected that postmaster was being kept busy doing something
that precluded it from running its usual tasks.  For the OP of the other
thread, stracing the postmaster seemed to bring it back to normal life.

Would it be possible to attach to the postmaster with GDB and get a
stack trace when it's stuck?  That might be helpful; or, if the mere act
of tracing it brings it back to life, it might not -- but instead it
would make me think it's a platform bug of some sort.

If you remove the Glassfish thing, does the problem go away?

(Note: the PDTWARNING thing happens because you don't have a white space
at the end of your log_line_prefix.  Adding one does wonders for
readability.)

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services