Re: BUG #8294: new timeline 6 forked off current dat abase system timeline 5 before current recovery point 0/100000

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: BUG #8294: new timeline 6 forked off current dat abase system timeline 5 before current recovery point 0/100000
Дата
Msg-id 5211C82B.2080605@vmware.com
обсуждение исходный текст
Ответ на BUG #8294: new timeline 6 forked off current dat abase system timeline 5 before current recovery point 0/100000  (digoal@126.com)
Список pgsql-bugs
(Quoted pg_controldata output edited to highlight the important parts)

On 11.07.2013 06:26, digoal@126.com wrote:
> PostgreSQL 9.3 beta2 stream replication primary and standby cann't
> switchover.
> ...

Primary :
> psql
> checkpont;
> pg_controldata
> ...
> Database cluster state:               in production
> Latest checkpoint location:           0/C000060
> Prior checkpoint location:            0/B000060
> Latest checkpoint's REDO location:    0/C000028
> Latest checkpoint's TimeLineID:       4
> Latest checkpoint's PrevTimeLineID:   4
> Minimum recovery ending location:     0/0
> Min recovery ending loc's timeline:   0
>
> Standby :
> Database cluster state:               in archive recovery
> Latest checkpoint location:           0/B000060
> Prior checkpoint location:            0/B000060
> Latest checkpoint's REDO location:    0/B000028
> Latest checkpoint's TimeLineID:       4
> Latest checkpoint's PrevTimeLineID:   4
> Minimum recovery ending location:     0/B0000F0
> Min recovery ending loc's timeline:   4

So at this point you have a primary, and a standby server following the
primary through streaming replication.

> Primary :
> pg_ctl stop
> waiting for server to shut down.... done
> server stopped
> pg_controldata
> Database cluster state:               shut down
> Latest checkpoint location:           0/D000028
> Prior checkpoint location:            0/C000060
> Latest checkpoint's REDO location:    0/D000028
> Latest checkpoint's TimeLineID:       4
> Latest checkpoint's PrevTimeLineID:   4
> Minimum recovery ending location:     0/0
> Min recovery ending loc's timeline:   0
>
> standby :
> pg_controldata
> Database cluster state:               in archive recovery
> Latest checkpoint location:           0/B000060
> Prior checkpoint location:            0/B000060
> Latest checkpoint's REDO location:    0/B000028
> Latest checkpoint's TimeLineID:       4
> Latest checkpoint's PrevTimeLineID:   4
> Minimum recovery ending location:     0/B0000F0
> Min recovery ending loc's timeline:   4
>
> pg_ctl promote
> server promoting
>
> pg_controldata
> Database cluster state:               in production
> Latest checkpoint location:           0/D000090
> Prior checkpoint location:            0/B000060
> Latest checkpoint's REDO location:    0/D000058
> Latest checkpoint's TimeLineID:       5
> Latest checkpoint's PrevTimeLineID:   5

The primary is shut down, and the standby is promoted to become new master.

> primary  :
>   cd $PGDATA
>   mv recovery.done recovery.conf
>   pg_ctl start
>   log :
> 2013-07-11 11:10:11.386 CST,,,14911,,51de2213.3a3f,1,,2013-07-11 11:10:11
> CST,,0,LOG,00000,"database system was shut down in recovery at 2013-07-11
> 11:09:51 CST",,,,,,,,"StartupXLOG, xlog.c:4895",""
> 2013-07-11 11:10:11.387 CST,,,14911,,51de2213.3a3f,2,,2013-07-11 11:10:11
> CST,,0,LOG,00000,"entering standby mode",,,,,,,,"StartupXLOG,
> xlog.c:4968",""
> 2013-07-11 11:10:11.391 CST,,,14911,,51de2213.3a3f,3,,2013-07-11 11:10:11
> CST,1/0,0,LOG,00000,"consistent recovery state reached at
> 0/D000090",,,,,,,,"CheckRecoveryConsistency, xlog.c:6187",""
> 2013-07-11 11:10:11.391 CST,,,14911,,51de2213.3a3f,4,,2013-07-11 11:10:11
> CST,1/0,0,LOG,00000,"record with zero length at
> 0/D000090",,,,,,,,"ReadRecord, xlog.c:3285",""
> 2013-07-11 11:10:11.392 CST,,,14909,,51de2213.3a3d,1,,2013-07-11 11:10:11
> CST,,0,LOG,00000,"database system is ready to accept read only
> connections",,,,,,,,"sigusr1_handler, postmaster.c:4658",""
> 2013-07-11 11:10:11.407 CST,,,14915,,51de2213.3a43,1,,2013-07-11 11:10:11
> CST,,0,LOG,00000,"fetching timeline history file for timeline 5 from primary
> server",,,,,,,,"WalRcvFetchTimeLineHistoryFiles, walreceiver.c:666",""
> 2013-07-11 11:10:11.411 CST,,,14915,,51de2213.3a43,2,,2013-07-11 11:10:11
> CST,,0,LOG,00000,"primary server contains no more WAL on requested timeline
> 4",,,,,,,,"WalReceiverMain, walreceiver.c:529",""
> 2013-07-11 11:10:11.411 CST,,,14911,,51de2213.3a3f,5,,2013-07-11 11:10:11
> CST,1/0,0,LOG,00000,"new timeline 5 forked off current database system
> timeline 4 before current recovery point
> 0/D000090",,,,,,,,"rescanLatestTimeLine, xlog.c:3441",""

The old primary is restarted in standby-mode. It tries to connect to the
new primary, but it refuses to follow it because there is some WAL
applied on the old primary that was not replicated to the new primary
before the switchover.

I believe this is the same issue that Fujii reported in June:
http://www.postgresql.org/message-id/CAHGQGwHLjEROTMtSWJd=xg_VFwRe3oJWnTYsyBDUbRYa6rr0DQ@mail.gmail.com.
This was fixed in commit 0b958f3efcfcc3d9b0e39d550b705a28763bc9e2 on
June 25th:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0b958f3efcfcc3d9b0e39d550b705a28763bc9e2

Unfortunately that commit didn't make it into 9.3beta2, which was
wrapped just one day before that commit. Could you try again with a
fresh checkout from REL9_3_STABLE branch? Or if you can't easily build
from sources, you can wait for the 9.3rc1 release, which should be
available later this week
(http://www.postgresql.org/message-id/24973.1376419162@sss.pgh.pa.us).

Thanks for the report! Please let us know if the next version fixes this.

- Heikki

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Kevin Grittner
Дата:
Сообщение: Re: BUG #8385: greek symbols as function name
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: [JDBC] Incorrect response code after XA recovery