Re: backend for database 'A' crashes/is killed -> corrupt index in database 'B'
От | Heikki Linnakangas |
---|---|
Тема | Re: backend for database 'A' crashes/is killed -> corrupt index in database 'B' |
Дата | |
Msg-id | 4D94343F.6030302@enterprisedb.com обсуждение исходный текст |
Ответ на | backend for database 'A' crashes/is killed -> corrupt index in database 'B' (Jon Nelson <jnelson+pgsql@jamponi.net>) |
Ответы |
Re: backend for database 'A' crashes/is killed -> corrupt
index in database 'B'
(Jon Nelson <jnelson+pgsql@jamponi.net>)
Re: backend for database 'A' crashes/is killed -> corrupt index in database 'B' (Jon Nelson <jnelson+pgsql@jamponi.net>) |
Список | pgsql-bugs |
On 30.03.2011 21:06, Jon Nelson wrote: > The short version is that if a postgresql backend is killed (by the Linux > OOM handler, or kill -9, etc...) while operations are > taking place in a *different* backend, corruption is introduced in the other > backend. I don't want to say it happens 100% of the time, but it happens > every time I test. >... > > Here is how I am reproducing the problem: > > 1. Open a psql connection to database A. It may remain idle. > 2. Wait for an automated process to connect to database B and start > operations. These operations > 3. kill -9 the backend for the psql connection to database A. > > Then I observe the backends all shutting down and postgresql entering > recovery mode, which succeeds. > Subsequent operations on other databases appear fine, but not for > database B: An index on one of the tables in database B is corrupted. > It is always the > same index. > > 2011-03-30 14:51:32 UTC LOG: server process (PID 3871) was terminated by > signal 9: Killed > 2011-03-30 14:51:32 UTC LOG: terminating any other active server > processes > 2011-03-30 14:51:32 UTC WARNING: terminating connection because of crash > of another server process > 2011-03-30 14:51:32 UTC DETAIL: The postmaster has commanded this server > process to roll back the current transaction and exit, because another > server process exited abnormally and possibly corrupted shared memory. > 2011-03-30 14:51:32 UTC HINT: In a moment you should be able to reconnect > to the database and repeat your command. > 2011-03-30 14:51:32 UTC databaseB databaseB WARNING: terminating connection > because of crash of another server process > 2011-03-30 14:51:32 UTC databaseB databaseB DETAIL: The postmaster has > commanded this server process to roll back the current transaction and exit, > because another server process exited abnormally and possibly corrupted > shared memory. > 2011-03-30 14:51:32 UTC databaseB databaseB HINT: In a moment you should be > able to reconnect to the database and repeat your command. > 2011-03-30 14:51:32 UTC LOG: all server processes terminated; > reinitializing > 2011-03-30 14:51:32 UTC LOG: database system was interrupted; last known > up at 2011-03-30 14:46:50 UTC > 2011-03-30 14:51:32 UTC databaseB databaseB FATAL: the database system is > in recovery mode > 2011-03-30 14:51:32 UTC LOG: database system was not properly shut down; > automatic recovery in progress > 2011-03-30 14:51:32 UTC LOG: redo starts at 301/1D328E40 > 2011-03-30 14:51:33 UTC databaseB databaseB FATAL: the database system is > in recovery mode > 2011-03-30 14:51:34 UTC LOG: record with zero length at 301/1EA08608 > 2011-03-30 14:51:34 UTC LOG: redo done at 301/1EA08558 > 2011-03-30 14:51:34 UTC LOG: last completed transaction was at log time > 2011-03-30 14:51:31.257997+00 > 2011-03-30 14:51:37 UTC LOG: autovacuum launcher started > 2011-03-30 14:51:37 UTC LOG: database system is ready to accept > connections > 2011-03-30 14:52:05 UTC databaseB databaseB ERROR: index "<elided>" > contains unexpected zero page at block 0 > 2011-03-30 14:52:05 UTC databaseB databaseB HINT: Please REINDEX it. > > What's more, I can execute a 'DELETE from tableB' (where tableB is the > table that is the one with the troublesome index) without error, but > when I try to *insert* that is when I get a problem. The index is a > standard btree index. The DELETE statement has no where clause. Can you provide a self-contained test script to reproduce this? Is the corruption always the same, ie. "unexpected zero page at block 0" ? > My interpretation of these values is that the drives themselves have > their write caches disabled. Ok. It doesn't look like a hardware issue, as there's no OS crash involved. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Julia JacobsonДата:
Сообщение: Re: BUG #5960: No rule to make target 'libpq.a', needed by 'all-static-lib'
Следующее
От: Jon NelsonДата:
Сообщение: Re: backend for database 'A' crashes/is killed -> corrupt index in database 'B'