Обсуждение: BUG #4383: operations in a certain row makes server process terminate with signal 11

Поиск
Список
Период
Сортировка

BUG #4383: operations in a certain row makes server process terminate with signal 11

От
"Marco Aurelio C. Miranda"
Дата:
The following bug has been logged online:

Bug reference:      4383
Logged by:          Marco Aurelio C. Miranda
Email address:      macmiranda@gmail.com
PostgreSQL version: 7.4.19
Operating system:   CentOS release 4.6 (Final)
Description:        operations in a certain row makes server process
terminate with signal 11
Details:

Hi,

the behavior I'm about to demonstrate started to happen after a forced
reboot but fsck hasn't found anything wrong with the ext3 filesystem.

Look what happens when I select the id field of a particular row in this
table:

> select id from uab_message_read where id=249222;
     id
------------
 1714385506
(1 row)

The id shown is completely different from the one I selected.

And when I try to select any other field of this row or to update any of the
fields or to delete the row, the client outputs the following message:

server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
connection to server was lost

On the server log, that's what I get:
Aug 27 22:24:47 uab postgres[8852]: [19-1] LOG:  server process (PID 11320)
was terminated by signal 11
Aug 27 22:24:47 uab postgres[8852]: [20-1] LOG:  terminating any other
active server processes
Aug 27 22:24:47 uab postgres[11310]: [19-1] WARNING:  terminating connection
because of crash of another server process
Aug 27 22:24:47 uab postgres[11310]: [19-2] DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and exit,
because another server
Aug 27 22:24:47 uab postgres[11310]: [19-3]  process exited abnormally and
possibly corrupted shared memory.
Aug 27 22:24:47 uab postgres[11316]: [19-1] WARNING:  terminating connection
because of crash of another server process
Aug 27 22:24:47 uab postgres[11310]: [19-4] HINT:  In a moment you should be
able to reconnect to the database and repeat your command.
Aug 27 22:24:47 uab postgres[11316]: [19-2] DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and exit,
because another server
Aug 27 22:24:47 uab postgres[11316]: [19-3]  process exited abnormally and
possibly corrupted shared memory.
Aug 27 22:24:47 uab postgres[11284]: [19-1] WARNING:  terminating connection
because of crash of another server process
Aug 27 22:24:47 uab postgres[11316]: [19-4] HINT:  In a moment you should be
able to reconnect to the database and repeat your command.
Aug 27 22:24:47 uab postgres[11284]: [19-2] DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and exit,
because another server
Aug 27 22:24:47 uab postgres[11284]: [19-3]  process exited abnormally and
possibly corrupted shared memory.
Aug 27 22:24:47 uab postgres[11247]: [19-1] WARNING:  terminating connection
because of crash of another server process
Aug 27 22:24:47 uab postgres[11314]: [19-1] WARNING:  terminating connection
because of crash of another server process
Aug 27 22:24:47 uab postgres[11284]: [19-4] HINT:  In a moment you should be
able to reconnect to the database and repeat your command.
Aug 27 22:24:47 uab postgres[11247]: [19-2] DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and exit,
because another server
Aug 27 22:24:47 uab postgres[11314]: [19-2] DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and exit,
because another server
Aug 27 22:24:47 uab postgres[11247]: [19-3]  process exited abnormally and
possibly corrupted shared memory.
Aug 27 22:24:47 uab postgres[11314]: [19-3]  process exited abnormally and
possibly corrupted shared memory.
Aug 27 22:24:47 uab postgres[11247]: [19-4] HINT:  In a moment you should be
able to reconnect to the database and repeat your command.
Aug 27 22:24:47 uab postgres[11314]: [19-4] HINT:  In a moment you should be
able to reconnect to the database and repeat your command.
Aug 27 22:24:47 uab postgres[8852]: [21-1] LOG:  all server processes
terminated; reinitializing
Aug 27 22:24:47 uab postgres[11322]: [22-1] LOG:  database system was
interrupted at 2008-08-27 22:24:31 BRT
Aug 27 22:24:47 uab postgres[11322]: [23-1] LOG:  checkpoint record is at
158/A63C6290
Aug 27 22:24:47 uab postgres[11322]: [24-1] LOG:  redo record is at
158/A6322CEC; undo record is at 0/0; shutdown FALSE
Aug 27 22:24:47 uab postgres[11322]: [25-1] LOG:  next transaction ID:
1535991290; next OID: 37872258
Aug 27 22:24:47 uab postgres[11322]: [26-1] LOG:  database system was not
properly shut down; automatic recovery in progress
Aug 27 22:24:47 uab postgres[11322]: [27-1] LOG:  redo starts at
158/A6322CEC
Aug 27 22:24:47 uab postgres[11322]: [28-1] LOG:  record with zero length at
158/A64ED93C
Aug 27 22:24:47 uab postgres[11322]: [29-1] LOG:  redo done at 158/A64ED918
Aug 27 22:24:47 uab postgres[11323]: [22-1] FATAL:  the database system is
starting up
Aug 27 22:24:47 uab postgres[11324]: [22-1] FATAL:  the database system is
starting up
Aug 27 22:24:47 uab postgres[11325]: [22-1] FATAL:  the database system is
starting up
Aug 27 22:24:48 uab postgres[11326]: [22-1] FATAL:  the database system is
starting up
Aug 27 22:24:48 uab postgres[11327]: [22-1] FATAL:  the database system is
starting up
Aug 27 22:24:49 uab postgres[11328]: [22-1] FATAL:  the database system is
starting up
Aug 27 22:24:49 uab postgres[11329]: [22-1] FATAL:  the database system is
starting up
Aug 27 22:24:50 uab postgres[11330]: [22-1] FATAL:  the database system is
starting up
Aug 27 22:24:50 uab postgres[11331]: [22-1] FATAL:  the database system is
starting up
Aug 27 22:24:51 uab postgres[11322]: [30-1] LOG:  recycled transaction log
file "00000158000000A5"
Aug 27 22:24:51 uab postgres[11322]: [31-1] LOG:  database system is ready

This is the description of the given table:

> \d uab_message_read;
                                    Table "public.uab_message_read"
   Column    |         Type          |                            Modifiers

-------------+-----------------------+--------------------------------------
----------------------------
 id          | integer               | not null default
nextval('public.uab_message_read_id_seq'::text)
 useridfrom  | integer               | not null default 0
 useridto    | integer               | not null default 0
 message     | text                  |
 format      | integer               | not null default 0
 timecreated | integer               | not null default 0
 timeread    | integer               | not null default 0
 messagetype | character varying(50) | not null default ''::character
varying
 mailed      | integer               | not null default 0
Indexes:
    "uab_message_read_pkey" primary key, btree (id)
    "uab_message_read_useridfrom_idx" btree (useridfrom)
    "uab_message_read_useridto_idx" btree (useridto)

As you can infer from that, I also can't dump the table with pg_dump.

Any help will be highly appreciated.

Re: BUG #4383: operations in a certain row makes server process terminate with signal 11

От
Tom Lane
Дата:
"Marco Aurelio C. Miranda" <macmiranda@gmail.com> writes:
> the behavior I'm about to demonstrate started to happen after a forced
> reboot but fsck hasn't found anything wrong with the ext3 filesystem.

fsck doesn't even think about whether the data inside user files is
any good.

> Look what happens when I select the id field of a particular row in this
> table:

>> select id from uab_message_read where id=249222;
>      id
> ------------
>  1714385506
> (1 row)

If you're lucky, this is just a corrupted index on uab_message_read.id,
and can be fixed via REINDEX.  Otherwise, what you've got is a corrupted
table.

> And when I try to select any other field of this row or to update any of the
> fields or to delete the row, the client outputs the following message:
> server closed the connection unexpectedly

That part sounds more like the table row is corrupted :-(.  If you
aren't in a position to restore the whole table from backup, you'll
need to try to identify and wipe out the broken row.  Usually zeroing
its whole page is the easiest way to do this, though you'll want to
see what else is on that page first.  Check the PG archives for
discussions of recovering from corrupted-data scenarios.

            regards, tom lane

Re: BUG #4383: operations in a certain row makes server process terminate with signal 11

От
"Marco Aurelio Miranda"
Дата:
>
>
> That part sounds more like the table row is corrupted :-(.  If you
> aren't in a position to restore the whole table from backup, you'll
> need to try to identify and wipe out the broken row.  Usually zeroing
> its whole page is the easiest way to do this, though you'll want to
> see what else is on that page first.  Check the PG archives for
> discussions of recovering from corrupted-data scenarios.
>
>                        regards, tom lane
>
Tom,

how do I erase just one block of the table file?