Streaming replication slave crash

Поиск
Список
Период
Сортировка
От Quentin Hartman
Тема Streaming replication slave crash
Дата
Msg-id CAJ48qNZHsXPw8Gc1sM9dnxhwNnZ5YX=CZ6kGCPvTp5PehVGyWg@mail.gmail.com
обсуждение исходный текст
Ответы Re: Streaming replication slave crash
Re: Streaming replication slave crash
Re: Streaming replication slave crash
Список pgsql-general
Yesterday morning, one of my streaming replication slaves running 9.2.3 crashed with the following in the log file:

2013-03-28 12:49:30 GMT WARNING:  page 1441792 of relation base/63229/63370 does not exist
2013-03-28 12:49:30 GMT CONTEXT:  xlog redo delete: index 1663/63229/109956; iblk 303, heap 1663/63229/63370;
2013-03-28 12:49:30 GMT PANIC:  WAL contains references to invalid pages
2013-03-28 12:49:30 GMT CONTEXT:  xlog redo delete: index 1663/63229/109956; iblk 303, heap 1663/63229/63370;
2013-03-28 12:49:31 GMT LOG:  startup process (PID 22941) was terminated by signal 6: Aborted
2013-03-28 12:49:31 GMT LOG:  terminating any other active server processes
2013-03-28 12:49:31 GMT WARNING:  terminating connection because of crash of another server process
2013-03-28 12:49:31 GMT DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2013-03-28 12:49:31 GMT HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2013-03-28 12:57:44 GMT LOG:  database system was interrupted while in recovery at log time 2013-03-28 12:37:42 GMT
2013-03-28 12:57:44 GMT HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2013-03-28 12:57:44 GMT LOG:  entering standby mode
2013-03-28 12:57:44 GMT LOG:  redo starts at 19/2367CE30
2013-03-28 12:57:44 GMT LOG:  incomplete startup packet
2013-03-28 12:57:44 GMT LOG:  consistent recovery state reached at 19/241835B0
2013-03-28 12:57:44 GMT LOG:  database system is ready to accept read only connections
2013-03-28 12:57:44 GMT LOG:  invalid record length at 19/2419EE38
2013-03-28 12:57:44 GMT LOG:  streaming replication successfully connected to primary

As you can see I was able to restart it and it picked up and synchronized right away, but this crash still concerns me.

The DB has about 75GB of data in it, and it is almost entirely write traffic. It's essentially a log aggregator. I believe it was doing a pg_dump backup at the time of the crash. It has hot_standby_feedback on to allow that process to complete.

Any insights into this, or advice on figuring out the root of it would be appreciated. So far all the things I've found like this are bugs that should be fixed in this version, or the internet equivalent of a shrug.

Thanks!

QH

В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Money casting too liberal?
Следующее
От: Lonni J Friedman
Дата:
Сообщение: Re: Streaming replication slave crash