Re: Switching timeline over streaming replication

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Switching timeline over streaming replication
Дата
Msg-id 50745884.6040008@vmware.com
обсуждение исходный текст
Ответ на Re: Switching timeline over streaming replication  (Amit Kapila <amit.kapila@huawei.com>)
Ответы Re: Switching timeline over streaming replication
Re: Switching timeline over streaming replication
Список pgsql-hackers
On 06.10.2012 15:58, Amit Kapila wrote:
> One more test seems to be failed. Apart from this, other tests are passed.
>
> 2. a. Master M-1
>     b. Standby S-1 follows M-1
>     c. insert 10 records on M-1. verify all records are visible on M-1,S-1
>     d. Stop S-1
>     e. insert 2 records on M-1.
>     f. Stop M-1
>     g. Start S-1
>     h. Promote S-1
>     i. Make M-1 recovery.conf such that it should connect to S-1
>     j. Start M-1. Below error comes on M-1 which is expected as M-1 has more
> data.
>        LOG:  database system was shut down at 2012-10-05 16:45:39 IST
>        LOG:  entering standby mode
>        LOG:  consistent recovery state reached at 0/176A070
>        LOG:  record with zero length at 0/176A070
>        LOG:  database system is ready to accept read only connections
>        LOG:  streaming replication successfully connected to primary
>        LOG:  fetching timeline history file for timeline 2 from primary
> server
>        LOG:  replication terminated by primary server
>        DETAIL:  End of WAL reached on timeline 1
>        LOG:  walreceiver ended streaming and awaits new instructions
>        LOG:  new timeline 2 forked off current database system timeline 1
> before current recovery point 0/176A070
>        LOG:  re-handshaking at position 0/1000000 on tli 1
>        LOG:  replication terminated by primary server
>        DETAIL:  End of WAL reached on timeline 1
>        LOG:  walreceiver ended streaming and awaits new instructions
>        LOG:  new timeline 2 forked off current database system timeline 1
> before current recovery point 0/176A070
>     k. Stop M-1. Start M-1. It is able to successfully connect to S-1 which
> is a problem.
>     l. check in S-1. Records inserted in step-e are not present.
>     m. Now insert records in S-1. M-1 doesn't recieve any records. On M-1
> server following log is getting printed.
>        LOG:  out-of-sequence timeline ID 1 (after 2) in log segment
> 000000020000000000000001, offset 0
>        LOG:  out-of-sequence timeline ID 1 (after 2) in log segment
> 000000020000000000000001, offset 0
>        LOG:  out-of-sequence timeline ID 1 (after 2) in log segment
> 000000020000000000000001, offset 0
>        LOG:  out-of-sequence timeline ID 1 (after 2) in log segment
> 000000020000000000000001, offset 0
>        LOG:  out-of-sequence timeline ID 1 (after 2) in log segment
> 000000020000000000000001, offset 0

Hmm, seems we need to keep track of which timeline we've used to recover
before. Before restart, the master correctly notices that timeline 2
forked off earlier in its history, so it cannot recover to that
timeline. But after restart the master begins recovery from the previous
checkpoint, and because timeline 2 forked off timeline 1 after the
checkpoint, it concludes that it can follow that timeline. It doesn't
realize that it had some already recovered/flushed some WAL in timeline
1 after the fork-point.

Attached is a new version of the patch. I committed the refactoring of
XLogPageRead() already, as that was a readability improvement even
without this patch. All the reported issues should be fixed now,
although I will continue testing this tomorrow. I added various checks
that that the correct timeline is followed during recovery.
minRecoveryPoint is now accompanied by a timeline ID, so that when we
restart recovery, we check that we recover back to minRecoveryPoint
along the same timeline as last time. Also, it now checks at beginning
of recovery that the checkpoint record comes from the correct timeline.
That fixes the problem that you reported above. I also adjusted the
error messages on timeline history problems to be more clear.

- Heikki

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Sébastien Lardière
Дата:
Сообщение: Re: Truncate if exists
Следующее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: Detecting libpq connections improperly shared via fork()