Обсуждение: BUG #2001: Signal 11 after concurrent inserts + updates

Поиск
Список
Период
Сортировка

BUG #2001: Signal 11 after concurrent inserts + updates

От
"Telin Lin"
Дата:
The following bug has been logged online:

Bug reference:      2001
Logged by:          Telin Lin
Email address:      tlin@rosettastone.com
PostgreSQL version: 8.0.4
Operating system:   Red Hat Linux 3.2.3-47
Description:        Signal 11 after concurrent inserts + updates
Details:

Hi,

My postgresql keeps crashing by signal 11 after a number of concurrent
inserts + updates.  The db crashed and then automatically recovered after
the last update.  All the concurrent inserts/updates are going to the same
table.  The size of the table is about 200,000 rows with a few indices.
There are usually 6~15 inserts+updates in the same sec.  I can’t be
certain about if certain value in the update causes the crash since I
can’t reproduce the crash with the update right before the crash.  This
sort of concurrent db activity is going on all day long but the crash only
occurs maybe once or twice a day.  Vacuum + analyze is done on daily basis.

We started with version 8.0.2 and upgraded to 8.0.4 to see if the upgrade
solved the problem.  No improvement at the moment.  We are also using Slony
replication + Apache + Axis + JDBC.  The machine has 6G ram with at least
80M free all the time.  CPU and IO wait is always low.

Please help.  Thanks.

The log looks like the following:

<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
LOG:  00000: statement: update TemporalRecord set version=$1,
creationRole=$2, creationDate=$3, modificationDate=$4, modificationRole=$5,
temporalId=$6, revision=$7, tombstone=$8, workingCopyDirty=$9,
writeLockOwner=$10, usage=$11, lag=$12 where rowId=$13 and version=$14
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
LOCATION:  pg_parse_query, postgres.c:526
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
DEBUG:  00000: CommitTransactionCommand
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
LOCATION:  finish_xact_command, postgres.c:1896
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
DEBUG:  00000: StartTransactionCommand
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
LOCATION:  start_xact_command, postgres.c:1875
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
LOG:  00000: statement: update TemporalRecord set version=$1,
creationRole=$2, creationDate=$3, modificationDate=$4, modificationRole=$5,
temporalId=$6, revision=$7, tombstone=$8, workingCopyDirty=$9,
writeLockOwner=$10, usage=$11, lag=$12 where rowId=$13 and version=$14
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
LOCATION:  pg_parse_query, postgres.c:526
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
DEBUG:  00000: CommitTransactionCommand
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
LOCATION:  finish_xact_command, postgres.c:1896
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
DEBUG:  00000: StartTransactionCommand
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
LOCATION:  start_xact_command, postgres.c:1875
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
LOG:  00000: statement: update TemporalRecord set version=$1,
creationRole=$2, creationDate=$3, modificationDate=$4, modificationRole=$5,
temporalId=$6, revision=$7, tombstone=$8, workingCopyDirty=$9,
writeLockOwner=$10, usage=$11, lag=$12 where rowId=$13 and version=$14
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
LOCATION:  pg_parse_query, postgres.c:526
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
DEBUG:  00000: ProcessQuery
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
LOCATION:  ProcessQuery, pquery.c:127
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
DEBUG:  00000: CommitTransactionCommand
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
LOCATION:  finish_xact_command, postgres.c:1896
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
DEBUG:  00000: StartTransactionCommand
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
LOCATION:  start_xact_command, postgres.c:1875
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
LOG:  00000: statement: update TemporalRecord set version=$1,
creationRole=$2, creationDate=$3, modificationDate=$4, modificationRole=$5,
temporalId=$6, revision=$7, tombstone=$8, workingCopyDirty=$9,
writeLockOwner=$10, parentId=$11, childId=$12, name=$13, index=$14 where
rowId=$15 and version=$16
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
LOCATION:  pg_parse_query, postgres.c:526
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
DEBUG:  00000: ProcessQuery
<2005-10-21 16:39:26 EDT><14916><10.1.249.59(1672)><5342196><4359500a.3a44>
LOCATION:  ProcessQuery, pquery.c:127
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: reaping dead
processes
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  reaper, postmaster.c:1988
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: server process (PID
14916) was terminated by signal 11
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  LogChildExit,
postmaster.c:2358
<2005-10-21 16:39:26 EDT><28519><><><> LOG:  00000: server process (PID
14916) was terminated by signal 11
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  LogChildExit,
postmaster.c:2358
<2005-10-21 16:39:26 EDT><28519><><><> LOG:  00000: terminating any other
active server processes
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  HandleChildCrash,
postmaster.c:2251
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: sending SIGQUIT to
process 14917
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  HandleChildCrash,
postmaster.c:2290
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: sending SIGQUIT to
process 14646
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  HandleChildCrash,
postmaster.c:2290
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: sending SIGQUIT to
process 14431
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  HandleChildCrash,
postmaster.c:2290
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: sending SIGQUIT to
process 12519
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  HandleChildCrash,
postmaster.c:2290
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: sending SIGQUIT to
process 1727
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  HandleChildCrash,
postmaster.c:2290
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: sending SIGQUIT to
process 615
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  HandleChildCrash,
postmaster.c:2290
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: sending SIGQUIT to
process 614
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  HandleChildCrash,
postmaster.c:2290
<2005-10-21 16:39:26 EDT><12519><10.1.249.72(1605)><5340712><43593f17.30e7>
WARNING:  57P02: terminating connection because of crash of another server
process
<2005-10-21 16:39:26 EDT><12519><10.1.249.72(1605)><5340712><43593f17.30e7>
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
<2005-10-21 16:39:26 EDT><12519><10.1.249.72(1605)><5340712><43593f17.30e7>
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
<2005-10-21 16:39:26 EDT><12519><10.1.249.72(1605)><5340712><43593f17.30e7>
LOCATION:  quickdie, postgres.c:1945
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: sending SIGQUIT to
process 613
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  HandleChildCrash,
postmaster.c:2290
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: sending SIGQUIT to
process 32590
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  HandleChildCrash,
postmaster.c:2290
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: sending SIGQUIT to
process 19567
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  HandleChildCrash,
postmaster.c:2290
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: sending SIGQUIT to
process 19508
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  HandleChildCrash,
postmaster.c:2290
<2005-10-21 16:39:26 EDT><1727><10.1.249.59(1606)><5321127><43593473.6bf>
WARNING:  57P02: terminating connection because of crash of another server
process
<2005-10-21 16:39:26 EDT><1727><10.1.249.59(1606)><5321127><43593473.6bf>
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
<2005-10-21 16:39:26 EDT><1727><10.1.249.59(1606)><5321127><43593473.6bf>
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
<2005-10-21 16:39:26 EDT><1727><10.1.249.59(1606)><5321127><43593473.6bf>
LOCATION:  quickdie, postgres.c:1945
<2005-10-21 16:39:26 EDT><615><10.1.247.21(1349)><0><43592cb6.267> WARNING:
57P02: terminating connection because of crash of another server process
<2005-10-21 16:39:26 EDT><615><10.1.247.21(1349)><0><43592cb6.267> DETAIL:
The postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
<2005-10-21 16:39:26 EDT><615><10.1.247.21(1349)><0><43592cb6.267> HINT:  In
a moment you should be able to reconnect to the database and repeat your
command.
<2005-10-21 16:39:26 EDT><615><10.1.247.21(1349)><0><43592cb6.267> LOCATION:
 quickdie, postgres.c:1945
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: sending SIGQUIT to
process 19497
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  HandleChildCrash,
postmaster.c:2290
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: sending SIGQUIT to
process 19496
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  HandleChildCrash,
postmaster.c:2290
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: sending SIGQUIT to
process 15837
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  HandleChildCrash,
postmaster.c:2290
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: sending SIGQUIT to
process 15836
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  HandleChildCrash,
postmaster.c:2290
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: sending SIGQUIT to
process 15835
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  HandleChildCrash,
postmaster.c:2290
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: sending SIGQUIT to
process 15832
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  HandleChildCrash,
postmaster.c:2290
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: sending SIGQUIT to
process 15829
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  HandleChildCrash,
postmaster.c:2290
<2005-10-21 16:39:26 EDT><613><10.1.247.21(1347)><0><43592cb4.265> WARNING:
57P02: terminating connection because of crash of another server process
<2005-10-21 16:39:26 EDT><613><10.1.247.21(1347)><0><43592cb4.265> HINT:  In
a moment you should be able to reconnect to the database and repeat your
command.
<2005-10-21 16:39:26 EDT><613><10.1.247.21(1347)><0><43592cb4.265> LOCATION:
 quickdie, postgres.c:1945
<2005-10-21 16:39:26 EDT><32590><10.1.249.83(2488)><5319103><43592957.7f4e>
WARNING:  57P02: terminating connection because of crash of another server
process
<2005-10-21 16:39:26 EDT><32590><10.1.249.83(2488)><5319103><43592957.7f4e>
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
<2005-10-21 16:39:26 EDT><32590><10.1.249.83(2488)><5319103><43592957.7f4e>
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
<2005-10-21 16:39:26 EDT><32590><10.1.249.83(2488)><5319103><43592957.7f4e>
LOCATION:  quickdie, postgres.c:1945
<2005-10-21 16:39:26 EDT><14431><10.1.249.71(1491)><5339168><43594c8e.385f>
WARNING:  57P02: terminating connection because of crash of another server
process
<2005-10-21 16:39:26 EDT><14431><10.1.249.71(1491)><5339168><43594c8e.385f>
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
<2005-10-21 16:39:26 EDT><14431><10.1.249.71(1491)><5339168><43594c8e.385f>
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
<2005-10-21 16:39:26 EDT><14431><10.1.249.71(1491)><5339168><43594c8e.385f>
LOCATION:  quickdie, postgres.c:1945
<2005-10-21 16:39:26 EDT><19567><10.1.2.254(39934)><0><4357ef38.4c6f>
WARNING:  57P02: terminating connection because of crash of another server
process
<2005-10-21 16:39:26 EDT><19567><10.1.2.254(39934)><0><4357ef38.4c6f>
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
<2005-10-21 16:39:26 EDT><19567><10.1.2.254(39934)><0><4357ef38.4c6f> HINT:
In a moment you should be able to reconnect to the database and repeat your
command.
<2005-10-21 16:39:26 EDT><19567><10.1.2.254(39934)><0><4357ef38.4c6f>
LOCATION:  quickdie, postgres.c:1945
<2005-10-21 16:39:26 EDT><19508><10.1.2.254(39916)><0><4357eed9.4c34>
WARNING:  57P02: terminating connection because of crash of another server
process
<2005-10-21 16:39:26 EDT><19508><10.1.2.254(39916)><0><4357eed9.4c34>
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
<2005-10-21 16:39:26 EDT><19508><10.1.2.254(39916)><0><4357eed9.4c34> HINT:
In a moment you should be able to reconnect to the database and repeat your
command.
<2005-10-21 16:39:26 EDT><19508><10.1.2.254(39916)><0><4357eed9.4c34>
LOCATION:  quickdie, postgres.c:1945
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: sending SIGQUIT to
process 15821
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  HandleChildCrash,
postmaster.c:2304
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: sending SIGQUIT to
process 15822
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  HandleChildCrash,
postmaster.c:2326
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: server process (PID
19567) exited with exit code 1
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  LogChildExit,
postmaster.c:2349
<2005-10-21 16:39:26 EDT><15832><[local]><0><4357d689.3dd8> WARNING:  57P02:
terminating connection because of crash of another server process
<2005-10-21 16:39:26 EDT><15832><[local]><0><4357d689.3dd8> DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
<2005-10-21 16:39:26 EDT><15832><[local]><0><4357d689.3dd8> HINT:  In a
moment you should be able to reconnect to the database and repeat your
command.
<2005-10-21 16:39:26 EDT><15832><[local]><0><4357d689.3dd8> LOCATION:
quickdie, postgres.c:1945
<2005-10-21 16:39:26 EDT><19497><10.1.2.254(39910)><0><4357eec9.4c29>
WARNING:  57P02: terminating connection because of crash of another server
process
<2005-10-21 16:39:26 EDT><19497><10.1.2.254(39910)><0><4357eec9.4c29>
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
<2005-10-21 16:39:26 EDT><19497><10.1.2.254(39910)><0><4357eec9.4c29> HINT:
In a moment you should be able to reconnect to the database and repeat your
command.
<2005-10-21 16:39:26 EDT><19497><10.1.2.254(39910)><0><4357eec9.4c29>
LOCATION:  quickdie, postgres.c:1945
<2005-10-21 16:39:26 EDT><614><10.1.247.21(1348)><0><43592cb4.266> DETAIL:
The postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
<2005-10-21 16:39:26 EDT><614><10.1.247.21(1348)><0><43592cb4.266> HINT:  In
a moment you should be able to reconnect to the database and repeat your
command.
<2005-10-21 16:39:26 EDT><614><10.1.247.21(1348)><0><43592cb4.266> LOCATION:
 quickdie, postgres.c:1945
<2005-10-21 16:39:26 EDT><15829><[local]><0><4357d689.3dd5> WARNING:  57P02:
terminating connection because of crash of another server process
<2005-10-21 16:39:26 EDT><15829><[local]><0><4357d689.3dd5> DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
<2005-10-21 16:39:26 EDT><15829><[local]><0><4357d689.3dd5> HINT:  In a
moment you should be able to reconnect to the database and repeat your
command.
<2005-10-21 16:39:26 EDT><15829><[local]><0><4357d689.3dd5> LOCATION:
quickdie, postgres.c:1945
<2005-10-21 16:39:26 EDT><15835><[local]><0><4357d689.3ddb> WARNING:  57P02:
terminating connection because of crash of another server process
<2005-10-21 16:39:26 EDT><15835><[local]><0><4357d689.3ddb> DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
<2005-10-21 16:39:26 EDT><15835><[local]><0><4357d689.3ddb> HINT:  In a
moment you should be able to reconnect to the database and repeat your
command.
<2005-10-21 16:39:26 EDT><15835><[local]><0><4357d689.3ddb> LOCATION:
quickdie, postgres.c:1945
<2005-10-21 16:39:26 EDT><15837><10.1.2.2(43668)><0><4357d691.3ddd> WARNING:
 57P02: terminating connection because of crash of another server process
<2005-10-21 16:39:26 EDT><15837><10.1.2.2(43668)><0><4357d691.3ddd> DETAIL:
The postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
<2005-10-21 16:39:26 EDT><15837><10.1.2.2(43668)><0><4357d691.3ddd> HINT:
In a moment you should be able to reconnect to the database and repeat your
command.
<2005-10-21 16:39:26 EDT><15837><10.1.2.2(43668)><0><4357d691.3ddd>
LOCATION:  quickdie, postgres.c:1945
<2005-10-21 16:39:26 EDT><15836><[local]><0><4357d689.3ddc> WARNING:  57P02:
terminating connection because of crash of another server process
<2005-10-21 16:39:26 EDT><15836><[local]><0><4357d689.3ddc> DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
<2005-10-21 16:39:26 EDT><15836><[local]><0><4357d689.3ddc> HINT:  In a
moment you should be able to reconnect to the database and repeat your
command.
<2005-10-21 16:39:26 EDT><15836><[local]><0><4357d689.3ddc> LOCATION:
quickdie, postgres.c:1945
…..
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: reaping dead
processes
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  reaper, postmaster.c:1988
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: server process (PID
14917) exited with exit code 1
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  LogChildExit,
postmaster.c:2349
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: reaping dead
processes
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  reaper, postmaster.c:1988

<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: server process (PID
14646) exited with exit code 1
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  LogChildExit,
postmaster.c:2349
<2005-10-21 16:39:26 EDT><28519><><><> LOG:  00000: all server processes
terminated; reinitializing
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  reaper, postmaster.c:2150
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: shmem_exit(0)
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  shmem_exit, ipc.c:126
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: invoking
IpcMemoryCreate(size=14458880)
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:
CreateSharedMemoryAndSemaphores, ipci.c:89
<2005-10-21 16:39:26 EDT><15190><><><> LOG:  00000: database system was
interrupted at 2005-10-21 16:36:41 EDT
<2005-10-21 16:39:26 EDT><15190><><><> LOCATION:  StartupXLOG, xlog.c:4094
<2005-10-21 16:39:26 EDT><15190><><><> LOG:  00000: checkpoint record is at
1/F84865B4
<2005-10-21 16:39:26 EDT><15190><><><> LOCATION:  StartupXLOG, xlog.c:4163
<2005-10-21 16:39:26 EDT><15190><><><> LOG:  00000: redo record is at
1/F84865B4; undo record is at 0/0; shutdown FALSE
<2005-10-21 16:39:26 EDT><15190><><><> LOCATION:  StartupXLOG, xlog.c:4191
<2005-10-21 16:39:26 EDT><15190><><><> LOG:  00000: next transaction ID:
5341889; next OID: 4965252
<2005-10-21 16:39:26 EDT><15190><><><> LOCATION:  StartupXLOG, xlog.c:4194
<2005-10-21 16:39:26 EDT><15190><><><> LOG:  00000: database system was not
properly shut down; automatic recovery in progress
<2005-10-21 16:39:26 EDT><15190><><><> LOCATION:  StartupXLOG, xlog.c:4250
<2005-10-21 16:39:26 EDT><15190><><><> LOG:  00000: redo starts at
1/F84865F0
<2005-10-21 16:39:26 EDT><15190><><><> LOCATION:  StartupXLOG, xlog.c:4287
<2005-10-21 16:39:26 EDT><15190><><><> LOG:  00000: record with zero length
at 1/F84E3B90
<2005-10-21 16:39:26 EDT><15190><><><> LOCATION:  ReadRecord, xlog.c:2496
<2005-10-21 16:39:26 EDT><15190><><><> LOG:  00000: redo done at 1/F84E3B68
<2005-10-21 16:39:26 EDT><15190><><><> LOCATION:  StartupXLOG, xlog.c:4345
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: forked new backend,
pid=15192 socket=7
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  BackendStartup,
postmaster.c:2499
<2005-10-21 16:39:26 EDT><15192><><0><435951fe.3b58> LOG:  00000: connection
received: host=[local] port=
<2005-10-21 16:39:26 EDT><15192><><0><435951fe.3b58> LOCATION:  BackendRun,
postmaster.c:2679
<2005-10-21 16:39:26 EDT><15192><[local]><0><435951fe.3b58> FATAL:  57P03:
the database system is starting up
<2005-10-21 16:39:26 EDT><15192><[local]><0><435951fe.3b58> LOCATION:
ProcessStartupPacket, postmaster.c:1573
<2005-10-21 16:39:26 EDT><15192><[local]><0><435951fe.3b58> DEBUG:  00000:
proc_exit(0)
<2005-10-21 16:39:26 EDT><15192><[local]><0><435951fe.3b58> LOCATION:
proc_exit, ipc.c:95
<2005-10-21 16:39:26 EDT><15192><[local]><0><435951fe.3b58> DEBUG:  00000:
shmem_exit(0)
<2005-10-21 16:39:26 EDT><15192><[local]><0><435951fe.3b58> LOCATION:
shmem_exit, ipc.c:126
<2005-10-21 16:39:26 EDT><15192><[local]><0><435951fe.3b58> DEBUG:  00000:
exit(0)
<2005-10-21 16:39:26 EDT><15192><[local]><0><435951fe.3b58> LOCATION:
proc_exit, ipc.c:113
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: reaping dead
processes
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  reaper, postmaster.c:1988
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: server process (PID
15192) exited with exit code 0
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  LogChildExit,
postmaster.c:2349
<2005-10-21 16:39:26 EDT><15190><><><> DEBUG:  00000: recycled transaction
log file "0000000100000001000000F6"
<2005-10-21 16:39:26 EDT><15190><><><> LOCATION:  MoveOfflineLogs,
xlog.c:2192
<2005-10-21 16:39:26 EDT><15190><><><> DEBUG:  00000: recycled transaction
log file "0000000100000001000000F7"
<2005-10-21 16:39:26 EDT><15190><><><> LOCATION:  MoveOfflineLogs,
xlog.c:2192
<2005-10-21 16:39:26 EDT><15190><><><> LOG:  00000: database system is
ready
<2005-10-21 16:39:26 EDT><15190><><><> LOCATION:  StartupXLOG, xlog.c:4557
<2005-10-21 16:39:26 EDT><15190><><><> DEBUG:  00000: proc_exit(0)
<2005-10-21 16:39:26 EDT><15190><><><> LOCATION:  proc_exit, ipc.c:95
<2005-10-21 16:39:26 EDT><15190><><><> DEBUG:  00000: shmem_exit(0)
<2005-10-21 16:39:26 EDT><15190><><><> LOCATION:  shmem_exit, ipc.c:126
<2005-10-21 16:39:26 EDT><15190><><><> DEBUG:  00000: exit(0)
<2005-10-21 16:39:26 EDT><15190><><><> LOCATION:  proc_exit, ipc.c:113
<2005-10-21 16:39:26 EDT><28519><><><> DEBUG:  00000: reaping dead
processes
<2005-10-21 16:39:26 EDT><28519><><><> LOCATION:  reaper, postmaster.c:1988
<2005-10-21 16:39:27 EDT><15197><><0><435951ff.3b5d> LOG:  00000: connection
received: host=[local] port=

The postgresql.conf looks like the following:

max_connections = 400
shared_buffers = 1000           # min 16, at least max_connections*2, 8KB
each
#work_mem = 1024                # min 64, size in KB
#maintenance_work_mem = 16384   # min 1024, size in KB
#max_stack_depth = 2048         # min 100, size in KB

# - Free Space Map -

#max_fsm_pages = 20000          # min max_fsm_relations*16, 6 bytes each
#max_fsm_relations = 1000       # min 100, ~50 bytes each

# - Kernel Resource Usage -

#max_files_per_process = 1000   # min 25
#preload_libraries = ''
fsync = true                    # turns forced synchronization on or off
wal_sync_method = fsync # the default varies across platforms:
log_min_messages = debug5
log_min_error_statement = notice
log_connections = true
log_statement = 'all'

Re: BUG #2001: Signal 11 after concurrent inserts + updates

От
Tom Lane
Дата:
"Telin Lin" <tlin@rosettastone.com> writes:
> My postgresql keeps crashing by signal 11 after a number of concurrent
> inserts + updates.

This could be due to corrupt data, or it could be a hardware problem ---
have you done any memory tests on that machine lately?  If you can
pg_dump the table involved without any failure, then the corrupt-data
hypothesis is probably ruled out.  Also, try to get a stack trace from
the core dump (if no core file, restart postmaster under ulimit -c
unlimited --- it's easiest to put that into the postgres init script).

            regards, tom lane

Re: BUG #2001: Signal 11 after concurrent inserts + updates

От
Michael Fuhr
Дата:
On Wed, Oct 26, 2005 at 09:27:52AM +0100, Telin Lin wrote:
> My postgresql keeps crashing by signal 11 after a number of concurrent
> inserts + updates.

Did you get a core dump?  If so, can you get a stack trace from it?
In released versions of PostgreSQL core dumps are usually found
under the $PGDATA/base/<database oid> directory unless your system
is configured to put them elsewhere.  If you got no core dump then
you might need to adjust the coredumpsize resource limit that the
postmaster runs with.

You mentioned that you were using Slony.  Do you have any other
add-ons?  Custom data types, third-party modules, etc.?

--
Michael Fuhr

Re: BUG #2001: Signal 11 after concurrent inserts + updates

От
Tom Lane
Дата:
Telin Lin <tlin@rosettastone.com> writes:
> the core dump filename should be just like 'coredump', right?

Probably either "core" or "core.NNNN" kind of name.  In existing
PG releases you'll usually find it under $PGDATA/base/NNNN/

            regards, tom lane

Re: BUG #2001: Signal 11 after concurrent inserts + updates

От
Telin Lin
Дата:
I have several directories such as 1,  17229,  17230,  333909 under
base.  but none of them is generated during crash time.  Under thsoe
directories, there are a bunch of files with NNNN(number) filename.
Some of them are over size of 6M.

I guess i will set the max_stack_depth in postgres.conf to get the right
coredump?

Nope, no other add-on or modules.  we are just using tables, indices and
triggers from slony.

Thanks!!


Michael Fuhr wrote:

>On Wed, Oct 26, 2005 at 09:27:52AM +0100, Telin Lin wrote:
>
>
>>My postgresql keeps crashing by signal 11 after a number of concurrent
>>inserts + updates.
>>
>>
>
>Did you get a core dump?  If so, can you get a stack trace from it?
>In released versions of PostgreSQL core dumps are usually found
>under the $PGDATA/base/<database oid> directory unless your system
>is configured to put them elsewhere.  If you got no core dump then
>you might need to adjust the coredumpsize resource limit that the
>postmaster runs with.
>
>You mentioned that you were using Slony.  Do you have any other
>add-ons?  Custom data types, third-party modules, etc.?
>
>
>

Re: BUG #2001: Signal 11 after concurrent inserts + updates

От
Telin Lin
Дата:
No, haven't done any memory tests lately.  Let me get that going.

I am doing pg_dump nightly with the whole database including this
table.  so i guess that says it.  I couldn't find any core dump.  btw,
the core dump filename should be just like 'coredump', right?

Let me also get the max_stack_depth into conf.

Thanks much!!

telin

Thanks much for your advise.

Tom Lane wrote:

>"Telin Lin" <tlin@rosettastone.com> writes:
>
>
>>My postgresql keeps crashing by signal 11 after a number of concurrent
>>inserts + updates.
>>
>>
>
>This could be due to corrupt data, or it could be a hardware problem ---
>have you done any memory tests on that machine lately?  If you can
>pg_dump the table involved without any failure, then the corrupt-data
>hypothesis is probably ruled out.  Also, try to get a stack trace from
>the core dump (if no core file, restart postmaster under ulimit -c
>unlimited --- it's easiest to put that into the postgres init script).
>
>            regards, tom lane
>
>

Re: BUG #2001: Signal 11 after concurrent inserts + updates

От
Michael Fuhr
Дата:
[Please copy the mailing list on replies.]

On Thu, Oct 27, 2005 at 10:54:51AM -0400, Telin Lin wrote:
> I have just got a 17M coredump.  Do you have a preferred way for me to
> sent that to you?  I can always place it on a ftp site for you
> download.  Thank you.

Please post the stack trace from the core dump.  Something like
this should work if you have gdb installed:

gdb /path/to/postgres /path/to/coredump

"/path/to/postgres" means the path to the executable file named
"postgres".  If you're not sure where that is then look in your
PostgreSQL startup script or use locate, find, etc.

gdb will print some startup info, then it'll present a "(gdb) "
prompt.  Type "bt" and hit Enter/Return ("bt" means backtrace)
(hopefully postgres was compiled with debugging symbols).  Copy the
output into an email message and post it here.  Use "q" to exit
gdb.  If Tom Lane gives differing or additional instructions then
listen to him, because he's one of the core developers.

--
Michael Fuhr