Re: backend for database 'A' crashes/is killed -> corrupt index in database 'B'

Поиск
Список
Период
Сортировка
От Jon Nelson
Тема Re: backend for database 'A' crashes/is killed -> corrupt index in database 'B'
Дата
Msg-id AANLkTi=Q812FJK-2wjnUOtD1=CX7bcobcgnv8rq0YM5d@mail.gmail.com
обсуждение исходный текст
Ответ на Re: backend for database 'A' crashes/is killed -> corrupt index in database 'B'  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Список pgsql-bugs
On Thu, Mar 31, 2011 at 2:58 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 30.03.2011 21:06, Jon Nelson wrote:
>>
>> The short version is that if a postgresql backend is killed (by the Linux
>> OOM handler, or kill -9, etc...) while operations are
>> taking place in a *different* backend, corruption is introduced in the
>> other
>> backend. =C2=A0I don't want to say it happens 100% of the time, but it h=
appens
>> every time I test.
>> ...
>>
>> Here is how I am reproducing the problem:
>>
>> 1. Open a psql connection to database A. It may remain idle.
>> 2. Wait for an automated process to connect to database B and start
>> operations. These operations
>> 3. kill -9 the backend for the psql connection to database A.
>>
>> Then I observe the backends all shutting down and postgresql entering
>> recovery mode, which succeeds.
>> Subsequent operations on other databases appear fine, but not for
>> database B: An index on one of the tables in database B is corrupted.
>> It is always the
>> same index.
>>
>> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0server process (PID 3871) was =
terminated
>> by
>> signal 9: Killed
>> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0terminating any other active s=
erver
>> processes
>> 2011-03-30 14:51:32 UTC =C2=A0 WARNING: =C2=A0terminating connection bec=
ause of
>> crash
>> of another server process
>> 2011-03-30 14:51:32 UTC =C2=A0 DETAIL: =C2=A0The postmaster has commande=
d this
>> server
>> process to roll back the current transaction and exit, because another
>> server process exited abnormally and possibly corrupted shared memory.
>> 2011-03-30 14:51:32 UTC =C2=A0 HINT: =C2=A0In a moment you should be abl=
e to
>> reconnect
>> to the database and repeat your command.
>> 2011-03-30 14:51:32 UTC databaseB databaseB WARNING: =C2=A0terminating
>> connection
>> because of crash of another server process
>> 2011-03-30 14:51:32 UTC databaseB databaseB DETAIL: =C2=A0The postmaster=
 has
>> commanded this server process to roll back the current transaction and
>> exit,
>> because another server process exited abnormally and possibly corrupted
>> shared memory.
>> 2011-03-30 14:51:32 UTC databaseB databaseB HINT: =C2=A0In a moment you =
should
>> be
>> able to reconnect to the database and repeat your command.
>> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0all server processes terminate=
d;
>> reinitializing
>> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0database system was interrupte=
d; last
>> known
>> up at 2011-03-30 14:46:50 UTC
>> 2011-03-30 14:51:32 UTC databaseB databaseB FATAL: =C2=A0the database sy=
stem is
>> in recovery mode
>> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0database system was not proper=
ly shut
>> down;
>> automatic recovery in progress
>> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0redo starts at 301/1D328E40
>> 2011-03-30 14:51:33 UTC databaseB databaseB FATAL: =C2=A0the database sy=
stem is
>> in recovery mode
>> 2011-03-30 14:51:34 UTC =C2=A0 LOG: =C2=A0record with zero length at 301=
/1EA08608
>> 2011-03-30 14:51:34 UTC =C2=A0 LOG: =C2=A0redo done at 301/1EA08558
>> 2011-03-30 14:51:34 UTC =C2=A0 LOG: =C2=A0last completed transaction was=
 at log time
>> 2011-03-30 14:51:31.257997+00
>> 2011-03-30 14:51:37 UTC =C2=A0 LOG: =C2=A0autovacuum launcher started
>> 2011-03-30 14:51:37 UTC =C2=A0 LOG: =C2=A0database system is ready to ac=
cept
>> connections
>> 2011-03-30 14:52:05 UTC databaseB databaseB ERROR: =C2=A0index "<elided>"
>> contains unexpected zero page at block 0
>> 2011-03-30 14:52:05 UTC databaseB databaseB HINT: =C2=A0Please REINDEX i=
t.
>>
>> What's more, I can execute a 'DELETE from tableB' (where tableB is the
>> table that is the one with the troublesome index) without error, but
>> when I try to *insert* that is when I get a problem. The index is a
>> standard btree index. The DELETE statement has no where clause.
>
> Can you provide a self-contained test script to reproduce this?

I will try.

> Is the corruption always the same, ie. "unexpected zero page at block 0" ?

As far as I can tell, yes!



--=20
Jon

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: backend for database 'A' crashes/is killed -> corrupt index in database 'B'
Следующее
От: "Martin Handsteiner"
Дата:
Сообщение: BUG #5961: JDBC Driver acceptURL does not check 'jdbc:postgresql:'