Re: Switching timeline over streaming replication

Поиск
Список
Период
Сортировка
От Thom Brown
Тема Re: Switching timeline over streaming replication
Дата
Msg-id CAA-aLv7UZhOtrymHpxWM1KF_XHJjfAJDsxygwn26cAkDXLFxHA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Switching timeline over streaming replication  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: Switching timeline over streaming replication
Список pgsql-hackers
On 20 December 2012 12:45, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 17.12.2012 15:05, Thom Brown wrote:
I just set up 120 chained standbys, and for some reason I'm seeing these
errors:

LOG:  replication terminated by primary server
DETAIL:  End of WAL reached on timeline 1
LOG:  record with zero length at 0/301EC10
LOG:  fetching timeline history file for timeline 2 from primary server
LOG:  restarted WAL streaming at 0/3000000 on timeline 1
LOG:  replication terminated by primary server
DETAIL:  End of WAL reached on timeline 1
LOG:  new target timeline is 2
LOG:  restarted WAL streaming at 0/3000000 on timeline 2
LOG:  replication terminated by primary server
DETAIL:  End of WAL reached on timeline 2
FATAL:  error reading result of streaming command: ERROR:  requested WAL
segment 000000020000000000000003 has already been removed

ERROR:  requested WAL segment 000000020000000000000003 has already been
removed
LOG:  started streaming WAL from primary at 0/3000000 on timeline 2
ERROR:  requested WAL segment 000000020000000000000003 has already been
removed

I just committed a patch that should make the "requested WAL segment 000000020000000000000003 has already been removed" errors go away. The trick was for walsenders to not switch to the new timeline until at least one record has been replayed on it. That closes the window where the walsender already considers the new timeline to be the latest, but the WAL file has not been created yet.

Now I'm getting this on all standbys after promoting the first standby in a chain.

LOG:  replication terminated by primary server
DETAIL:  End of WAL reached on timeline 1
LOG:  record with zero length at 0/301EC10
LOG:  fetching timeline history file for timeline 2 from primary server
LOG:  restarted WAL streaming at 0/3000000 on timeline 1
FATAL:  could not receive data from WAL stream:
LOG:  new target timeline is 2
FATAL:  could not connect to the primary server: FATAL:  the database system is in recovery mode

LOG:  started streaming WAL from primary at 0/3000000 on timeline 2
TRAP: FailedAssertion("!(((sentPtr) <= (SendRqstPtr)))", File: "walsender.c", Line: 1425)
LOG:  server process (PID 19917) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted while in recovery at log time 2012-12-20 23:41:23 GMT
HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
LOG:  entering standby mode
FATAL:  the database system is in recovery mode
LOG:  redo starts at 0/2000028
LOG:  consistent recovery state reached at 0/20000E8
LOG:  database system is ready to accept read only connections
LOG:  record with zero length at 0/301EC70
LOG:  started streaming WAL from primary at 0/3000000 on timeline 2
LOG:  unexpected EOF on standby connection

And if I restart the new primary, the first new standby connected to it shows:

LOG:  replication terminated by primary server
DETAIL:  End of WAL reached on timeline 2
FATAL:  error reading result of streaming command: server closed the connection unexpectedly
                This probably means the server terminated abnormally
                before or while processing the request.

LOG:  record with zero length at 0/301F1E0

However, all other standbys don't show any additional log output.

-- 
Thom

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Parser Cruft in gram.y
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Review of Row Level Security