Re: PGError: server closed the connection unexpectedly

Поиск
Список
Период
Сортировка
От Dave Steinberg
Тема Re: PGError: server closed the connection unexpectedly
Дата
Msg-id 43C03DD8.2010604@redterror.net
обсуждение исходный текст
Ответ на Re: PGError: server closed the connection unexpectedly  (Doug McNaught <doug@mcnaught.org>)
Ответы Re: PGError: server closed the connection unexpectedly  (Doug McNaught <doug@mcnaught.org>)
Список pgsql-general
>>My biggest problem is the lack of any real error message on the server.
>>    I don't see anything wrong in the system logs, and there's no core
>>file in the /var/postgresql directory.
>
>
> Are you sure core files are enabled; i.e. the server is running with
> 'ulimit -c unlimited' ?

Yes:

$ whoami
_postgresql
$ ulimit -c
unlimited


>>    I did a 'vacuumdb -afz' just as
>>a shot in the dark, without affect.  Pretty much all I see in the logs
>>is this:
>>
>><USER%DB x.y.z.a(51478) 487>LOG:  unexpected EOF on client connection
>
>
> This means a client is dying or closing its connection prematurely,
> and would seem to be a different problem.  It shouldn't ever cause the
> server to crash.

If the server is crashing - I have absolutely no info to suggest that,
except possibly this: at a later time I did a tcpdump of one of these
sessions, for lack of better ideas, and saw this near the end:

<snip lots of tcpdump output>
16:32:19.523842 clam.int.geekisp.com.15245 >
morningsun.int.geekisp.com.postgresql: P 17389:18036(647) ack 9323 win
16384 <nop,nop,timestamp 133396667 3702263885> (DF)
16:32:19.524289 morningsun.int.geekisp.com.postgresql >
clam.int.geekisp.com.15245: P 9323:9343(20) ack 18036 win 17376
<nop,nop,timestamp 3702263885 133396667> (DF)
16:32:19.562544 clam.int.geekisp.com.15245 >
morningsun.int.geekisp.com.postgresql: P 18036:18041(5) ack 9343 win
16384 <nop,nop,timestamp 133396667 3702263885> (DF)
******16:32:19.566662 morningsun.int.geekisp.com.postgresql >
clam.int.geekisp.com.15245: F 9343:9343(0) ack 18041 win 17376
<nop,nop,timestamp 3702263885 133396667> (DF)
16:32:19.567043 clam.int.geekisp.com.15245 >
morningsun.int.geekisp.com.postgresql: . ack 9344 win 16384
<nop,nop,timestamp 133396667 3702263885> (DF)
16:32:19.649445 clam.int.geekisp.com.15245 >
morningsun.int.geekisp.com.postgresql: P 18041:18851(810) ack 9344 win
16384 <nop,nop,timestamp 133396667 3702263885> (DF)
16:32:19.649468 morningsun.int.geekisp.com.postgresql >
clam.int.geekisp.com.15245: R 545360451:545360451(0) win 0 (DF)
16:32:19.649652 clam.int.geekisp.com.15245 >
morningsun.int.geekisp.com.postgresql: F 18851:18851(0) ack 9344 win
16384 <nop,nop,timestamp 133396667 3702263885> (DF)
16:32:19.649665 morningsun.int.geekisp.com.postgresql >
clam.int.geekisp.com.15245: R 545360451:545360451(0) win 0 (DF)

The '*****' above is my own, to highlight the interesting part.
Morningsun is the server, clam is the ruby client.  Based on this, I
think the server *might* be dying first (hard to tell), since morningsun
is the one who sends the FIN packet and thereby closes the connection.
(Sorry for the horrible wrapping).

Again, having trouble making sure.

>>Googling turned up a few reports suggesting bad hardware, or corrupted
>>indexes, but I don't think that's the case here.
>>
>>Any starting points or ideas would be greatly appreciated.
>
>
> Make sure the server is able to dump a core file, and perhaps crank up
> the logging level.

Just to verify that a core dump is possible for the postgresql user, I
wrote a tiny C program that just calls abort().  Sure enough, running it
gives me a core file.

I cranked the debug level up to 5 (man that's a lot), and here's what I
think is the relevent chunk:

<geekispv2%geekisp-v2-test 192.168.4.38(31992) 9155>STATEMENT:  UPDATE
job_members SET "start_time" = NULL, "exit_code" = NULL, "create_time" =
'2006-01-07 17:10:31', "job_id" = 30, "command" = '---
!ruby/object:GeekISP::ShellCommand
         args:
         command: /bin/ls /tmp/
         exit_code:
         logger: !ruby/object:Logger
           datetime_format:
           level: 0
           logdev: !ruby/object:Logger::LogDevice
             dev: !ruby/object:File {}
             filename:
/mnt/scratch/dave/control_panel/trunk/config/../log/test.log
             shift_age: 0
             shift_size: 1048576
           progname:
         run_as:
         stderr:
         stdin:
         stdout: ', "finish_time" = NULL, "last_update_time" = NULL,
"host_id" = 4, "status" = 'in_progress', "output" = NULL WHERE
job_member_id = 105
<geekispv2%geekisp-v2-test 192.168.4.38(31992) 9155>DEBUG:  name:
unnamed; blockState:       STARTED; state: INPROGR, xid/subid/cid:
10290432/1/0, nestlvl: 1, children: <>
<geekispv2%geekisp-v2-test 192.168.4.38(31992) 9155>STATEMENT:  UPDATE
job_members SET "start_time" = NULL, "exit_code" = NULL, "create_time" =
'2006-01-07 17:10:31', "job_id" = 30, "command" = '---
!ruby/object:GeekISP::ShellCommand
         args:
         command: /bin/ls /tmp/
         exit_code:
         logger: !ruby/object:Logger
           datetime_format:
           level: 0
           logdev: !ruby/object:Logger::LogDevice
             dev: !ruby/object:File {}
             filename:
/mnt/scratch/dave/control_panel/trunk/config/../log/test.log
             shift_age: 0
             shift_size: 1048576
           progname:
         run_as:
         stderr:
         stdin:
         stdout: ', "finish_time" = NULL, "last_update_time" = NULL,
"host_id" = 4, "status" = 'in_progress', "output" = NULL WHERE
job_member_id = 105
<geekispv2%geekisp-v2-test 192.168.4.38(31992) 9155>DEBUG:  proc_exit(0)
<geekispv2%geekisp-v2-test 192.168.4.38(31992) 9155>DEBUG:  shmem_exit(0)
<geekispv2%geekisp-v2-test 192.168.4.38(31992) 9155>LOG:  disconnection:
session time: 0:00:00.84 user=geekispv2 database=geekisp-v2-test
host=192.168.4.38 port=31992
<geekispv2%geekisp-v2-test 192.168.4.38(31992) 9155>DEBUG:  exit(0)
<%  7303>DEBUG:  reaping dead processes
<%  7303>DEBUG:  server process (PID 9155) exited with exit code 0

That looks to me like a clean and normal exit.  This is pointing more
and more towards the client in the ruby case, isn't it?

Regards,
--
Dave Steinberg
http://www.geekisp.com/
http://www.steinbergcomputing.com/

В списке pgsql-general по дате отправления:

Предыдущее
От: Angshu Kar
Дата:
Сообщение: Re: COPY to
Следующее
От: Angshu Kar
Дата:
Сообщение: programming in pgsql