Обсуждение: backend for database 'A' crashes/is killed -> corrupt index in database 'B'

Поиск

Список

Период

Сортировка

backend for database 'A' crashes/is killed -> corrupt index in database 'B'

От

Jon Nelson

Дата:

30 марта 2011 г., 18:07:26

Please bear with me, as it's taken me a while to be able to reproduce an
issue that is worrisome.

Before I describe the problem, I am using postgresql version 8.4.5 on
CentOS 5.5 (x86_64).

The short version is that if a postgresql backend is killed (by the Linux
OOM handler, or kill -9, etc...) while operations are
taking place in a *different* backend, corruption is introduced in the other
backend.  I don't want to say it happens 100% of the time, but it happens
every time I test.

The differences from a stock pg config are primarily to do with
logging and such. The other settings (besides logging
and such) that differ from the stock values include:

wal_sync_method -- I've tried with leaving it as the default, and I've
also tried datasync -- same results.
wal_buffers = 128MB
effective_cache_size = 4GB
work_mem = 1MB
maintenance_work_mem = 1GB

Here is how I am reproducing the problem:

1. Open a psql connection to database A. It may remain idle.
2. Wait for an automated process to connect to database B and start
operations. These operations
3. kill -9 the backend for the psql connection to database A.

Then I observe the backends all shutting down and postgresql entering
recovery mode, which succeeds.
Subsequent operations on other databases appear fine, but not for
database B: An index on one of the tables in database B is corrupted.
It is always the
same index.

2011-03-30 14:51:32 UTC   LOG:  server process (PID 3871) was terminated by
signal 9: Killed
2011-03-30 14:51:32 UTC   LOG:  terminating any other active server
processes
2011-03-30 14:51:32 UTC   WARNING:  terminating connection because of crash
of another server process
2011-03-30 14:51:32 UTC   DETAIL:  The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2011-03-30 14:51:32 UTC   HINT:  In a moment you should be able to reconnect
to the database and repeat your command.
2011-03-30 14:51:32 UTC databaseB databaseB WARNING:  terminating connection
because of crash of another server process
2011-03-30 14:51:32 UTC databaseB databaseB DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and exit,
because another server process exited abnormally and possibly corrupted
shared memory.
2011-03-30 14:51:32 UTC databaseB databaseB HINT:  In a moment you should be
able to reconnect to the database and repeat your command.
2011-03-30 14:51:32 UTC   LOG:  all server processes terminated;
reinitializing
2011-03-30 14:51:32 UTC   LOG:  database system was interrupted; last known
up at 2011-03-30 14:46:50 UTC
2011-03-30 14:51:32 UTC databaseB databaseB FATAL:  the database system is
in recovery mode
2011-03-30 14:51:32 UTC   LOG:  database system was not properly shut down;
automatic recovery in progress
2011-03-30 14:51:32 UTC   LOG:  redo starts at 301/1D328E40
2011-03-30 14:51:33 UTC databaseB databaseB FATAL:  the database system is
in recovery mode
2011-03-30 14:51:34 UTC   LOG:  record with zero length at 301/1EA08608
2011-03-30 14:51:34 UTC   LOG:  redo done at 301/1EA08558
2011-03-30 14:51:34 UTC   LOG:  last completed transaction was at log time
2011-03-30 14:51:31.257997+00
2011-03-30 14:51:37 UTC   LOG:  autovacuum launcher started
2011-03-30 14:51:37 UTC   LOG:  database system is ready to accept
connections
2011-03-30 14:52:05 UTC databaseB databaseB ERROR:  index "<elided>"
contains unexpected zero page at block 0
2011-03-30 14:52:05 UTC databaseB databaseB HINT:  Please REINDEX it.

What's more, I can execute a 'DELETE from tableB' (where tableB is the
table that is the one with the troublesome index) without error, but
when I try to *insert* that is when I get a problem. The index is a
standard btree index. The DELETE statement has no where clause.

Other information: The following variables remain commented out and thus
retain their defaults (confirmed with 'show all'):
 fsync is on.  synchronous_commit is on. full_page_writes is on.
zero_damaged_pages is *off*.

The underlying storage is a software raid1.

'test_fsync' from postgresql 9.0.1 reports:

Loops = 10000

Simple write:
        8k write                      157205.515/second

Compare file sync methods using one write:
        (open_datasync unavailable)
        open_sync 8k write             1357.928/second
        8k write, fdatasync            3872.508/second
        8k write, fsync                3496.659/second

Compare file sync methods using two writes:
        (open_datasync unavailable)
        2 open_sync 8k writes          1928.987/second
        8k write, 8k write, fdatasync  2020.884/second
        8k write, 8k write, fsync      2411.653/second

Compare open_sync with different sizes:
        open_sync 16k write            3298.716/second
        2 open_sync 8k writes          1228.599/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
        8k write, fsync, close         2535.765/second
        8k write, close, fsync         2162.060/second

My interpretation of these values is that the drives themselves have
their write caches disabled.

Lastly, the problem has been reproduced on at least three totally
different machines.


--
Jon

Re: backend for database 'A' crashes/is killed -> corrupt index in database 'B'

От

Heikki Linnakangas

Дата:

31 марта 2011 г., 08:07:08

On 30.03.2011 21:06, Jon Nelson wrote:
> The short version is that if a postgresql backend is killed (by the Linux
> OOM handler, or kill -9, etc...) while operations are
> taking place in a *different* backend, corruption is introduced in the other
> backend.  I don't want to say it happens 100% of the time, but it happens
> every time I test.
>...
>
> Here is how I am reproducing the problem:
>
> 1. Open a psql connection to database A. It may remain idle.
> 2. Wait for an automated process to connect to database B and start
> operations. These operations
> 3. kill -9 the backend for the psql connection to database A.
>
> Then I observe the backends all shutting down and postgresql entering
> recovery mode, which succeeds.
> Subsequent operations on other databases appear fine, but not for
> database B: An index on one of the tables in database B is corrupted.
> It is always the
> same index.
>
> 2011-03-30 14:51:32 UTC   LOG:  server process (PID 3871) was terminated by
> signal 9: Killed
> 2011-03-30 14:51:32 UTC   LOG:  terminating any other active server
> processes
> 2011-03-30 14:51:32 UTC   WARNING:  terminating connection because of crash
> of another server process
> 2011-03-30 14:51:32 UTC   DETAIL:  The postmaster has commanded this server
> process to roll back the current transaction and exit, because another
> server process exited abnormally and possibly corrupted shared memory.
> 2011-03-30 14:51:32 UTC   HINT:  In a moment you should be able to reconnect
> to the database and repeat your command.
> 2011-03-30 14:51:32 UTC databaseB databaseB WARNING:  terminating connection
> because of crash of another server process
> 2011-03-30 14:51:32 UTC databaseB databaseB DETAIL:  The postmaster has
> commanded this server process to roll back the current transaction and exit,
> because another server process exited abnormally and possibly corrupted
> shared memory.
> 2011-03-30 14:51:32 UTC databaseB databaseB HINT:  In a moment you should be
> able to reconnect to the database and repeat your command.
> 2011-03-30 14:51:32 UTC   LOG:  all server processes terminated;
> reinitializing
> 2011-03-30 14:51:32 UTC   LOG:  database system was interrupted; last known
> up at 2011-03-30 14:46:50 UTC
> 2011-03-30 14:51:32 UTC databaseB databaseB FATAL:  the database system is
> in recovery mode
> 2011-03-30 14:51:32 UTC   LOG:  database system was not properly shut down;
> automatic recovery in progress
> 2011-03-30 14:51:32 UTC   LOG:  redo starts at 301/1D328E40
> 2011-03-30 14:51:33 UTC databaseB databaseB FATAL:  the database system is
> in recovery mode
> 2011-03-30 14:51:34 UTC   LOG:  record with zero length at 301/1EA08608
> 2011-03-30 14:51:34 UTC   LOG:  redo done at 301/1EA08558
> 2011-03-30 14:51:34 UTC   LOG:  last completed transaction was at log time
> 2011-03-30 14:51:31.257997+00
> 2011-03-30 14:51:37 UTC   LOG:  autovacuum launcher started
> 2011-03-30 14:51:37 UTC   LOG:  database system is ready to accept
> connections
> 2011-03-30 14:52:05 UTC databaseB databaseB ERROR:  index "<elided>"
> contains unexpected zero page at block 0
> 2011-03-30 14:52:05 UTC databaseB databaseB HINT:  Please REINDEX it.
>
> What's more, I can execute a 'DELETE from tableB' (where tableB is the
> table that is the one with the troublesome index) without error, but
> when I try to *insert* that is when I get a problem. The index is a
> standard btree index. The DELETE statement has no where clause.

Can you provide a self-contained test script to reproduce this?

Is the corruption always the same, ie. "unexpected zero page at block 0" ?

> My interpretation of these values is that the drives themselves have
> their write caches disabled.

Ok. It doesn't look like a hardware issue, as there's no OS crash involved.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: backend for database 'A' crashes/is killed -> corrupt index in database 'B'

От

Jon Nelson

Дата:

31 марта 2011 г., 11:42:01

On Thu, Mar 31, 2011 at 2:58 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 30.03.2011 21:06, Jon Nelson wrote:
>>
>> The short version is that if a postgresql backend is killed (by the Linux
>> OOM handler, or kill -9, etc...) while operations are
>> taking place in a *different* backend, corruption is introduced in the
>> other
>> backend. =C2=A0I don't want to say it happens 100% of the time, but it h=
appens
>> every time I test.
>> ...
>>
>> Here is how I am reproducing the problem:
>>
>> 1. Open a psql connection to database A. It may remain idle.
>> 2. Wait for an automated process to connect to database B and start
>> operations. These operations
>> 3. kill -9 the backend for the psql connection to database A.
>>
>> Then I observe the backends all shutting down and postgresql entering
>> recovery mode, which succeeds.
>> Subsequent operations on other databases appear fine, but not for
>> database B: An index on one of the tables in database B is corrupted.
>> It is always the
>> same index.
>>
>> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0server process (PID 3871) was =
terminated
>> by
>> signal 9: Killed
>> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0terminating any other active s=
erver
>> processes
>> 2011-03-30 14:51:32 UTC =C2=A0 WARNING: =C2=A0terminating connection bec=
ause of
>> crash
>> of another server process
>> 2011-03-30 14:51:32 UTC =C2=A0 DETAIL: =C2=A0The postmaster has commande=
d this
>> server
>> process to roll back the current transaction and exit, because another
>> server process exited abnormally and possibly corrupted shared memory.
>> 2011-03-30 14:51:32 UTC =C2=A0 HINT: =C2=A0In a moment you should be abl=
e to
>> reconnect
>> to the database and repeat your command.
>> 2011-03-30 14:51:32 UTC databaseB databaseB WARNING: =C2=A0terminating
>> connection
>> because of crash of another server process
>> 2011-03-30 14:51:32 UTC databaseB databaseB DETAIL: =C2=A0The postmaster=
 has
>> commanded this server process to roll back the current transaction and
>> exit,
>> because another server process exited abnormally and possibly corrupted
>> shared memory.
>> 2011-03-30 14:51:32 UTC databaseB databaseB HINT: =C2=A0In a moment you =
should
>> be
>> able to reconnect to the database and repeat your command.
>> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0all server processes terminate=
d;
>> reinitializing
>> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0database system was interrupte=
d; last
>> known
>> up at 2011-03-30 14:46:50 UTC
>> 2011-03-30 14:51:32 UTC databaseB databaseB FATAL: =C2=A0the database sy=
stem is
>> in recovery mode
>> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0database system was not proper=
ly shut
>> down;
>> automatic recovery in progress
>> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0redo starts at 301/1D328E40
>> 2011-03-30 14:51:33 UTC databaseB databaseB FATAL: =C2=A0the database sy=
stem is
>> in recovery mode
>> 2011-03-30 14:51:34 UTC =C2=A0 LOG: =C2=A0record with zero length at 301=
/1EA08608
>> 2011-03-30 14:51:34 UTC =C2=A0 LOG: =C2=A0redo done at 301/1EA08558
>> 2011-03-30 14:51:34 UTC =C2=A0 LOG: =C2=A0last completed transaction was=
 at log time
>> 2011-03-30 14:51:31.257997+00
>> 2011-03-30 14:51:37 UTC =C2=A0 LOG: =C2=A0autovacuum launcher started
>> 2011-03-30 14:51:37 UTC =C2=A0 LOG: =C2=A0database system is ready to ac=
cept
>> connections
>> 2011-03-30 14:52:05 UTC databaseB databaseB ERROR: =C2=A0index "<elided>"
>> contains unexpected zero page at block 0
>> 2011-03-30 14:52:05 UTC databaseB databaseB HINT: =C2=A0Please REINDEX i=
t.
>>
>> What's more, I can execute a 'DELETE from tableB' (where tableB is the
>> table that is the one with the troublesome index) without error, but
>> when I try to *insert* that is when I get a problem. The index is a
>> standard btree index. The DELETE statement has no where clause.
>
> Can you provide a self-contained test script to reproduce this?

I will try.

> Is the corruption always the same, ie. "unexpected zero page at block 0" ?

As far as I can tell, yes!



--=20
Jon

Re: backend for database 'A' crashes/is killed -> corrupt index in database 'B'

От

Jon Nelson

Дата:

02 августа 2011 г., 14:35:28

On Thu, Mar 31, 2011 at 2:58 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 30.03.2011 21:06, Jon Nelson wrote:
>>
>> The short version is that if a postgresql backend is killed (by the Linux
>> OOM handler, or kill -9, etc...) while operations are
>> taking place in a *different* backend, corruption is introduced in the
>> other
>> backend. =C2=A0I don't want to say it happens 100% of the time, but it h=
appens
>> every time I test.
>> ...
>>
>> Here is how I am reproducing the problem:
>>
>> 1. Open a psql connection to database A. It may remain idle.
>> 2. Wait for an automated process to connect to database B and start
>> operations. These operations
>> 3. kill -9 the backend for the psql connection to database A.
>>
>> Then I observe the backends all shutting down and postgresql entering
>> recovery mode, which succeeds.
>> Subsequent operations on other databases appear fine, but not for
>> database B: An index on one of the tables in database B is corrupted.
>> It is always the
>> same index.
>>
>> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0server process (PID 3871) was =
terminated
>> by
>> signal 9: Killed
>> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0terminating any other active s=
erver
>> processes
>> 2011-03-30 14:51:32 UTC =C2=A0 WARNING: =C2=A0terminating connection bec=
ause of
>> crash
>> of another server process
>> 2011-03-30 14:51:32 UTC =C2=A0 DETAIL: =C2=A0The postmaster has commande=
d this
>> server
>> process to roll back the current transaction and exit, because another
>> server process exited abnormally and possibly corrupted shared memory.
>> 2011-03-30 14:51:32 UTC =C2=A0 HINT: =C2=A0In a moment you should be abl=
e to
>> reconnect
>> to the database and repeat your command.
>> 2011-03-30 14:51:32 UTC databaseB databaseB WARNING: =C2=A0terminating
>> connection
>> because of crash of another server process
>> 2011-03-30 14:51:32 UTC databaseB databaseB DETAIL: =C2=A0The postmaster=
 has
>> commanded this server process to roll back the current transaction and
>> exit,
>> because another server process exited abnormally and possibly corrupted
>> shared memory.
>> 2011-03-30 14:51:32 UTC databaseB databaseB HINT: =C2=A0In a moment you =
should
>> be
>> able to reconnect to the database and repeat your command.
>> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0all server processes terminate=
d;
>> reinitializing
>> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0database system was interrupte=
d; last
>> known
>> up at 2011-03-30 14:46:50 UTC
>> 2011-03-30 14:51:32 UTC databaseB databaseB FATAL: =C2=A0the database sy=
stem is
>> in recovery mode
>> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0database system was not proper=
ly shut
>> down;
>> automatic recovery in progress
>> 2011-03-30 14:51:32 UTC =C2=A0 LOG: =C2=A0redo starts at 301/1D328E40
>> 2011-03-30 14:51:33 UTC databaseB databaseB FATAL: =C2=A0the database sy=
stem is
>> in recovery mode
>> 2011-03-30 14:51:34 UTC =C2=A0 LOG: =C2=A0record with zero length at 301=
/1EA08608
>> 2011-03-30 14:51:34 UTC =C2=A0 LOG: =C2=A0redo done at 301/1EA08558
>> 2011-03-30 14:51:34 UTC =C2=A0 LOG: =C2=A0last completed transaction was=
 at log time
>> 2011-03-30 14:51:31.257997+00
>> 2011-03-30 14:51:37 UTC =C2=A0 LOG: =C2=A0autovacuum launcher started
>> 2011-03-30 14:51:37 UTC =C2=A0 LOG: =C2=A0database system is ready to ac=
cept
>> connections
>> 2011-03-30 14:52:05 UTC databaseB databaseB ERROR: =C2=A0index "<elided>"
>> contains unexpected zero page at block 0
>> 2011-03-30 14:52:05 UTC databaseB databaseB HINT: =C2=A0Please REINDEX i=
t.
>>
>> What's more, I can execute a 'DELETE from tableB' (where tableB is the
>> table that is the one with the troublesome index) without error, but
>> when I try to *insert* that is when I get a problem. The index is a
>> standard btree index. The DELETE statement has no where clause.
>
> Can you provide a self-contained test script to reproduce this?
>
> Is the corruption always the same, ie. "unexpected zero page at block 0" ?
>
>> My interpretation of these values is that the drives themselves have
>> their write caches disabled.
>
> Ok. It doesn't look like a hardware issue, as there's no OS crash involve=
d.



I wanted to write this quick follow-up, so that anyone searching in
the future might have a chance of learning the resolution to this
issue.

The short version is this:  the database that experienced corrupted
indices was using HASH-based indices. As we all know, as of this
writing, HASH-based indices are _not_ crash-safe. When revised to use
btree-indices, the issue does not re-occur.

--=20
Jon

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: backend for database 'A' crashes/is killed -> corrupt index in database 'B'

backend for database 'A' crashes/is killed -> corrupt index in database 'B'

Re: backend for database 'A' crashes/is killed -> corrupt index in database 'B'

Re: backend for database 'A' crashes/is killed -> corrupt index in database 'B'

Re: backend for database 'A' crashes/is killed -> corrupt index in database 'B'