Обсуждение: BUG #4383: operations in a certain row makes server process terminate with signal 11
BUG #4383: operations in a certain row makes server process terminate with signal 11
От
"Marco Aurelio C. Miranda"
Дата:
The following bug has been logged online: Bug reference: 4383 Logged by: Marco Aurelio C. Miranda Email address: macmiranda@gmail.com PostgreSQL version: 7.4.19 Operating system: CentOS release 4.6 (Final) Description: operations in a certain row makes server process terminate with signal 11 Details: Hi, the behavior I'm about to demonstrate started to happen after a forced reboot but fsck hasn't found anything wrong with the ext3 filesystem. Look what happens when I select the id field of a particular row in this table: > select id from uab_message_read where id=249222; id ------------ 1714385506 (1 row) The id shown is completely different from the one I selected. And when I try to select any other field of this row or to update any of the fields or to delete the row, the client outputs the following message: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost On the server log, that's what I get: Aug 27 22:24:47 uab postgres[8852]: [19-1] LOG: server process (PID 11320) was terminated by signal 11 Aug 27 22:24:47 uab postgres[8852]: [20-1] LOG: terminating any other active server processes Aug 27 22:24:47 uab postgres[11310]: [19-1] WARNING: terminating connection because of crash of another server process Aug 27 22:24:47 uab postgres[11310]: [19-2] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server Aug 27 22:24:47 uab postgres[11310]: [19-3] process exited abnormally and possibly corrupted shared memory. Aug 27 22:24:47 uab postgres[11316]: [19-1] WARNING: terminating connection because of crash of another server process Aug 27 22:24:47 uab postgres[11310]: [19-4] HINT: In a moment you should be able to reconnect to the database and repeat your command. Aug 27 22:24:47 uab postgres[11316]: [19-2] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server Aug 27 22:24:47 uab postgres[11316]: [19-3] process exited abnormally and possibly corrupted shared memory. Aug 27 22:24:47 uab postgres[11284]: [19-1] WARNING: terminating connection because of crash of another server process Aug 27 22:24:47 uab postgres[11316]: [19-4] HINT: In a moment you should be able to reconnect to the database and repeat your command. Aug 27 22:24:47 uab postgres[11284]: [19-2] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server Aug 27 22:24:47 uab postgres[11284]: [19-3] process exited abnormally and possibly corrupted shared memory. Aug 27 22:24:47 uab postgres[11247]: [19-1] WARNING: terminating connection because of crash of another server process Aug 27 22:24:47 uab postgres[11314]: [19-1] WARNING: terminating connection because of crash of another server process Aug 27 22:24:47 uab postgres[11284]: [19-4] HINT: In a moment you should be able to reconnect to the database and repeat your command. Aug 27 22:24:47 uab postgres[11247]: [19-2] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server Aug 27 22:24:47 uab postgres[11314]: [19-2] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server Aug 27 22:24:47 uab postgres[11247]: [19-3] process exited abnormally and possibly corrupted shared memory. Aug 27 22:24:47 uab postgres[11314]: [19-3] process exited abnormally and possibly corrupted shared memory. Aug 27 22:24:47 uab postgres[11247]: [19-4] HINT: In a moment you should be able to reconnect to the database and repeat your command. Aug 27 22:24:47 uab postgres[11314]: [19-4] HINT: In a moment you should be able to reconnect to the database and repeat your command. Aug 27 22:24:47 uab postgres[8852]: [21-1] LOG: all server processes terminated; reinitializing Aug 27 22:24:47 uab postgres[11322]: [22-1] LOG: database system was interrupted at 2008-08-27 22:24:31 BRT Aug 27 22:24:47 uab postgres[11322]: [23-1] LOG: checkpoint record is at 158/A63C6290 Aug 27 22:24:47 uab postgres[11322]: [24-1] LOG: redo record is at 158/A6322CEC; undo record is at 0/0; shutdown FALSE Aug 27 22:24:47 uab postgres[11322]: [25-1] LOG: next transaction ID: 1535991290; next OID: 37872258 Aug 27 22:24:47 uab postgres[11322]: [26-1] LOG: database system was not properly shut down; automatic recovery in progress Aug 27 22:24:47 uab postgres[11322]: [27-1] LOG: redo starts at 158/A6322CEC Aug 27 22:24:47 uab postgres[11322]: [28-1] LOG: record with zero length at 158/A64ED93C Aug 27 22:24:47 uab postgres[11322]: [29-1] LOG: redo done at 158/A64ED918 Aug 27 22:24:47 uab postgres[11323]: [22-1] FATAL: the database system is starting up Aug 27 22:24:47 uab postgres[11324]: [22-1] FATAL: the database system is starting up Aug 27 22:24:47 uab postgres[11325]: [22-1] FATAL: the database system is starting up Aug 27 22:24:48 uab postgres[11326]: [22-1] FATAL: the database system is starting up Aug 27 22:24:48 uab postgres[11327]: [22-1] FATAL: the database system is starting up Aug 27 22:24:49 uab postgres[11328]: [22-1] FATAL: the database system is starting up Aug 27 22:24:49 uab postgres[11329]: [22-1] FATAL: the database system is starting up Aug 27 22:24:50 uab postgres[11330]: [22-1] FATAL: the database system is starting up Aug 27 22:24:50 uab postgres[11331]: [22-1] FATAL: the database system is starting up Aug 27 22:24:51 uab postgres[11322]: [30-1] LOG: recycled transaction log file "00000158000000A5" Aug 27 22:24:51 uab postgres[11322]: [31-1] LOG: database system is ready This is the description of the given table: > \d uab_message_read; Table "public.uab_message_read" Column | Type | Modifiers -------------+-----------------------+-------------------------------------- ---------------------------- id | integer | not null default nextval('public.uab_message_read_id_seq'::text) useridfrom | integer | not null default 0 useridto | integer | not null default 0 message | text | format | integer | not null default 0 timecreated | integer | not null default 0 timeread | integer | not null default 0 messagetype | character varying(50) | not null default ''::character varying mailed | integer | not null default 0 Indexes: "uab_message_read_pkey" primary key, btree (id) "uab_message_read_useridfrom_idx" btree (useridfrom) "uab_message_read_useridto_idx" btree (useridto) As you can infer from that, I also can't dump the table with pg_dump. Any help will be highly appreciated.
"Marco Aurelio C. Miranda" <macmiranda@gmail.com> writes: > the behavior I'm about to demonstrate started to happen after a forced > reboot but fsck hasn't found anything wrong with the ext3 filesystem. fsck doesn't even think about whether the data inside user files is any good. > Look what happens when I select the id field of a particular row in this > table: >> select id from uab_message_read where id=249222; > id > ------------ > 1714385506 > (1 row) If you're lucky, this is just a corrupted index on uab_message_read.id, and can be fixed via REINDEX. Otherwise, what you've got is a corrupted table. > And when I try to select any other field of this row or to update any of the > fields or to delete the row, the client outputs the following message: > server closed the connection unexpectedly That part sounds more like the table row is corrupted :-(. If you aren't in a position to restore the whole table from backup, you'll need to try to identify and wipe out the broken row. Usually zeroing its whole page is the easiest way to do this, though you'll want to see what else is on that page first. Check the PG archives for discussions of recovering from corrupted-data scenarios. regards, tom lane
Re: BUG #4383: operations in a certain row makes server process terminate with signal 11
От
"Marco Aurelio Miranda"
Дата:
> > > That part sounds more like the table row is corrupted :-(. If you > aren't in a position to restore the whole table from backup, you'll > need to try to identify and wipe out the broken row. Usually zeroing > its whole page is the easiest way to do this, though you'll want to > see what else is on that page first. Check the PG archives for > discussions of recovering from corrupted-data scenarios. > > regards, tom lane > Tom, how do I erase just one block of the table file?