Re: (Never?) Kill Postmaster?

Поиск
Список
Период
Сортировка
От Christian Schröder
Тема Re: (Never?) Kill Postmaster?
Дата
Msg-id 47339573.5050202@deriva.de
обсуждение исходный текст
Ответ на Re: (Never?) Kill Postmaster?  (Alvaro Herrera <alvherre@commandprompt.com>)
Ответы Re: (Never?) Kill Postmaster?
Re: (Never?) Kill Postmaster?
Список pgsql-general
Hi all,
any news about this issue? Anything else that I can do to help you?
Meanwhile there are 4 connections in the same state. (I did not do the
whole investigation on all 4, but since they all do not respond on a
SIGINT I assume that they all have the same problem.)
It may also be interesting that the 4 processes belong to the same two
users that already caused this problem earlier. Maybe it really has
something to do with their unstable internet connection? (I mentioned
this in an earlier mail.)
I have also noticed that one of these two users has many open
connections which are all idle. I guess that those connections are in
fact dead, but the database did not close them for any reason. The
pg_stat_activity entries for the corresponding backend processes are as
follows:

 procpid | usename | current_query | waiting |
query_start          |         backend_start
---------+---------+---------------+---------+-------------------------------+-------------------------------
   26033 | dpyrek  | <IDLE>        | f       | 2007-11-08
10:21:01.555232+01 | 2007-11-08 09:55:01.59932+01
   18331 | dpyrek  | <IDLE>        | f       | 2007-11-07
11:34:24.968852+01 | 2007-11-07 11:08:29.043762+01
   18940 | dpyrek  | <IDLE>        | f       | 2007-11-07
14:29:52.987176+01 | 2007-11-07 13:14:48.609031+01
   25868 | dpyrek  | <IDLE>        | f       | 2007-11-08
09:47:46.938991+01 | 2007-11-08 09:13:34.101351+01
    6719 | dpyrek  | <IDLE>        | f       | 2007-11-06
12:06:14.875588+01 | 2007-11-06 11:10:00.566644+01
   17987 | dpyrek  | <IDLE>        | f       | 2007-11-07
10:31:50.517275+01 | 2007-11-07 10:11:07.310338+01
   31808 | dpyrek  | <IDLE>        | f       | 2007-11-08
22:55:03.931727+01 | 2007-11-08 22:55:03.766638+01
   25484 | dpyrek  | <IDLE>        | f       | 2007-11-08
08:32:57.265377+01 | 2007-11-08 07:44:30.845967+01
    5972 | dpyrek  | <IDLE>        | f       | 2007-11-06
08:51:54.57437+01  | 2007-11-06 08:14:03.560602+01
    6241 | dpyrek  | <IDLE>        | f       | 2007-11-06
09:59:02.018452+01 | 2007-11-06 09:20:49.092246+01
    6136 | dpyrek  | <IDLE>        | f       | 2007-11-06
09:14:40.729837+01 | 2007-11-06 08:57:29.55187+01
   12645 | dpyrek  | <IDLE>        | f       | 2007-11-02
10:08:24.856929+01 | 2007-11-02 09:35:37.640976+01
   25254 | dpyrek  | <IDLE>        | f       | 2007-11-08
07:29:04.547081+01 | 2007-11-08 06:33:47.707759+01
   20275 | dpyrek  | <IDLE>        | f       | 2007-11-03
09:14:12.73829+01  | 2007-11-03 08:57:05.555972+01
   20216 | dpyrek  | <IDLE>        | f       | 2007-11-03
08:46:40.555354+01 | 2007-11-03 08:40:31.756993+01
   12435 | dpyrek  | <IDLE>        | f       | 2007-11-02
09:28:53.361365+01 | 2007-11-02 08:48:11.589485+01
   19633 | dpyrek  | <IDLE>        | f       | 2007-11-03
08:34:16.263487+01 | 2007-11-03 05:46:16.811069+01
   12156 | dpyrek  | <IDLE>        | f       | 2007-11-02
08:10:11.558427+01 | 2007-11-02 07:49:03.442489+01
    4899 | dpyrek  | <IDLE>        | f       | 2007-11-01
12:42:30.880391+01 | 2007-11-01 10:56:18.513398+01
   11988 | dpyrek  | <IDLE>        | f       | 2007-11-02
07:38:10.315758+01 | 2007-11-02 07:02:52.438251+01
    4490 | dpyrek  | <IDLE>        | f       | 2007-11-01
09:51:42.216745+01 | 2007-11-01 09:34:18.63771+01

A ptrace of one of these connections yields the following result:

db2:/home/pgsql/data # strace -p 4899
Process 4899 attached - interrupt to quit
futex(0x994000, FUTEX_WAIT, 2, NULL

This looks identical (even with the same uaddr parameter) for the other
processes.

In the log file I find many lines like this:

<2007-11-08 22:56:19 CET - dpyrek> LOG:  could not receive data from
client: Die Wartezeit für die Verbindung ist abgelaufen
<2007-11-08 22:56:19 CET - dpyrek> LOG:  unexpected EOF on client connection

I'm not sure that these lines correspond to the dead connections, but at
least it's the same user.

Does this additional information help you in any way? I'm a little bit
afraid that eventually the maximum number of connections will be
exceeded. I don't want to "kill -9" the processes because the last time
I did this the database was in recovery mode for a substantial amount of
time.

Any help is still highly appreciated!

Regards,
    Christian

P.S.: If nothing else helps I could also grant one of you guys root
access to the database machine.

--
Deriva GmbH                         Tel.: +49 551 489500-42
Financial IT and Consulting         Fax:  +49 551 489500-91
Hans-Böckler-Straße 2                  http://www.deriva.de
D-37079 Göttingen

Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer



В списке pgsql-general по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: "Resurrected" data files - problem?
Следующее
От: Gregory Stark
Дата:
Сообщение: Re: Optimal time series sampling.