Обсуждение: How to continue streaming replication after this error?

Поиск
Список
Период
Сортировка

How to continue streaming replication after this error?

От
Torsten Förtsch
Дата:
Hi,

one of our streaming replicas died with

2014-02-21 05:17:10 UTC PANIC:  heap2_redo: unknown op code 32
2014-02-21 05:17:10 UTC CONTEXT:  xlog redo UNKNOWN
2014-02-21 05:17:11 UTC LOG:  startup process (PID 1060) was terminated
by signal 6: Aborted
2014-02-21 05:17:11 UTC LOG:  terminating any other active server processes
2014-02-21 05:17:11 UTC WARNING:  terminating connection because of
crash of another server process
2014-02-21 05:17:11 UTC DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2014-02-21 05:17:11 UTC HINT:  In a moment you should be able to
reconnect to the database and repeat your command.


Now, if I try to restart it, I get this:


The PostgreSQL server failed to start. Please check the log output:
2014-02-21 07:42:53 UTC LOG:  database system was interrupted while in
recovery at log time 2014-02-21 05:02:45 UTC
2014-02-21 07:42:53 UTC HINT:  If this has occurred more than once some
data might be corrupted and you might need to choose an earlier recovery
target.
2014-02-21 07:42:53 UTC LOG:  incomplete startup packet
2014-02-21 07:42:53 UTC LOG:  entering standby mode
2014-02-21 07:42:53 UTC LOG:  redo starts at 11C/B2211778
2014-02-21 07:42:53 UTC FATAL:  the database system is starting up
2014-02-21 07:42:54 UTC LOG:  consistent recovery state reached at
11C/B4234108
2014-02-21 07:42:54 UTC LOG:  database system is ready to accept read
only connections
2014-02-21 07:42:54 UTC PANIC:  heap2_redo: unknown op code 32
2014-02-21 07:42:54 UTC CONTEXT:  xlog redo UNKNOWN
2014-02-21 07:42:54 UTC LOG:  startup process (PID 38187) was terminated
by signal 6: Aborted
2014-02-21 07:42:54 UTC LOG:  terminating any other active server processes


This is 9.3.2. What is the supposed way to continue replication? Or do I
need to start from a fresh base backup?

Thanks,
Torsten


Re: How to continue streaming replication after this error?

От
Torsten Förtsch
Дата:
On 21/02/14 09:17, Torsten Förtsch wrote:
> one of our streaming replicas died with
>
> 2014-02-21 05:17:10 UTC PANIC:  heap2_redo: unknown op code 32
> 2014-02-21 05:17:10 UTC CONTEXT:  xlog redo UNKNOWN
> 2014-02-21 05:17:11 UTC LOG:  startup process (PID 1060) was terminated
> by signal 6: Aborted
> 2014-02-21 05:17:11 UTC LOG:  terminating any other active server processes
> 2014-02-21 05:17:11 UTC WARNING:  terminating connection because of
> crash of another server process
> 2014-02-21 05:17:11 UTC DETAIL:  The postmaster has commanded this
> server process to roll back the current transaction and exit, because
> another server process exited abnormally and possibly corrupted shared
> memory.
> 2014-02-21 05:17:11 UTC HINT:  In a moment you should be able to
> reconnect to the database and repeat your command.

Any idea what that means?

I have got a second replica dying with the same symptoms.

Thanks,
Torsten


Re: How to continue streaming replication after this error?

От
Haribabu Kommi
Дата:
On Sat, Feb 22, 2014 at 1:21 PM, Torsten Förtsch <torsten.foertsch@gmx.net> wrote:
On 21/02/14 09:17, Torsten Förtsch wrote:
> one of our streaming replicas died with
>
> 2014-02-21 05:17:10 UTC PANIC:  heap2_redo: unknown op code 32
> 2014-02-21 05:17:10 UTC CONTEXT:  xlog redo UNKNOWN
> 2014-02-21 05:17:11 UTC LOG:  startup process (PID 1060) was terminated
> by signal 6: Aborted
> 2014-02-21 05:17:11 UTC LOG:  terminating any other active server processes
> 2014-02-21 05:17:11 UTC WARNING:  terminating connection because of
> crash of another server process
> 2014-02-21 05:17:11 UTC DETAIL:  The postmaster has commanded this
> server process to roll back the current transaction and exit, because
> another server process exited abnormally and possibly corrupted shared
> memory.
> 2014-02-21 05:17:11 UTC HINT:  In a moment you should be able to
> reconnect to the database and repeat your command.

Any idea what that means?

I have got a second replica dying with the same symptoms.

The Xlog record seems to be corrupted. The op code 32 represents XLOG_HEAP2_FREEZE_PAGE, the code exists to handle it.
Don't know why the system is not able to recognize the op code?  Can you try pg_xlogdump of the corrupted WAL file?

Keep the data folder for problem investigation. As it seems some of kind corruption, you need to take a fresh base backup to continue.  
 
Regards,
Hari Babu
Fujitsu Australia

Re: How to continue streaming replication after this error?

От
Torsten Förtsch
Дата:
On 22/02/14 03:21, Torsten Förtsch wrote:
>> 2014-02-21 05:17:10 UTC PANIC:  heap2_redo: unknown op code 32
>> > 2014-02-21 05:17:10 UTC CONTEXT:  xlog redo UNKNOWN
>> > 2014-02-21 05:17:11 UTC LOG:  startup process (PID 1060) was terminated
>> > by signal 6: Aborted
>> > 2014-02-21 05:17:11 UTC LOG:  terminating any other active server processes
>> > 2014-02-21 05:17:11 UTC WARNING:  terminating connection because of
>> > crash of another server process
>> > 2014-02-21 05:17:11 UTC DETAIL:  The postmaster has commanded this
>> > server process to roll back the current transaction and exit, because
>> > another server process exited abnormally and possibly corrupted shared
>> > memory.
>> > 2014-02-21 05:17:11 UTC HINT:  In a moment you should be able to
>> > reconnect to the database and repeat your command.

> Any idea what that means?

Updating the replica to 9.3.3 cured it. The master was already on 9.3.3.

Torsten


Re: How to continue streaming replication after this error?

От
Michael Paquier
Дата:



On Mon, Feb 24, 2014 at 12:23 PM, Torsten Förtsch <torsten.foertsch@gmx.net> wrote:
On 22/02/14 03:21, Torsten Förtsch wrote:
> Any idea what that means?

Updating the replica to 9.3.3 cured it. The master was already on 9.3.3.
9.3.3 has introduced some new configuration parameters. So you need to actually update a slave before the master or replication is broken.
--
Michael