RE: Slave stuck in recovery mode

Поиск
Список
Период
Сортировка
От Nicolas Ross
Тема RE: Slave stuck in recovery mode
Дата
Msg-id 011501d7bd44$f8b70a10$ea251e30$@cybercat.net
обсуждение исходный текст
Ответ на Slave stuck in recovery mode  ("Nicolas Ross" <rossnick-lists@cybercat.net>)
Список pgsql-admin
I ended up googling some more and found this :

https://www.enterprisedb.com/blog/be-sure-stop-your-backups

Which is exactly what was happening. Even though I had no backup running, I did the stop_backup command etc.

I planned a restart of the master server, and then re-cloned, I was then all OK !

Strange.

-----Message d'origine-----
De : Nicolas Ross <rossnick-lists@cybercat.net>
Envoyé : 8 octobre 2021 19:16
À : pgsql-admin@lists.postgresql.org
Objet : Slave stuck in recovery mode

Hi !

We’ve been using postgres since some time now (since the
9.3 days).

I’ve got a pair of 9.6 server with 2 nodes, a primary and a
slave. We use repmgr to manage the cluster. When it was
installed, it was something like repmgr 4.x or even 3.

This week, for some reason, I had to rebuild the slave
instance. So I cloned the slave using a command like :

/usr/pgsql-9.6/bin/repmgr -h pgserver2.qualite -U repmgr -f
/etc/repmgr/9.6/repmgr.conf standby clone

After some time (it’s like 250 gigs, so it’s kinda an hour
or 2), the command ends.

If I start the postgres server on the slave with OS
systemcl script, it doesn’t return to the CLI (presumably
it waits for something).

In the log I see :

< 2021-10-08 16:16:47.861 EDT > LOG:  database system was
shut down in recovery at 2021-10-08 16:04:10 EDT
< 2021-10-08 16:16:47.877 EDT > LOG:  entering standby mode
< 2021-10-08 16:16:48.599 EDT > LOG:  redo starts at
13BF/CF000028
< 2021-10-08 16:16:52.899 EDT > LOG:  consistent recovery
state reached at 13BF/D53BA0F0
(Some time passes)
< 2021-10-08 16:46:10.363 EDT > LOG:  started streaming WAL
from primary at 13C9/8C000000 on timeline 1

After that, if I try to connect to the slave, I get :

FATAL:  the database system is starting up

No matter how long I wait (tried more than a day later).

During that time, the master still streams the wal to the
slave.


Notes :

That last log example was taken after trying to clone from
our barman server (tried with and without)

use_replication_slots is set to yes.

hot_standby is on on the primary, hence when cloned it is
also.

Before one of my clone command, I’ve tried cleaning all
residue of repmgr, ie remove the extension, re-register the
master, etc, still the same issue.

If I comment out hot_standby on the slave, it starts
normally, but still doesn’t allow connections.

Recovery.conf is :

standby_mode = 'on'
primary_conninfo = 'host=MASTERIP user=repmgr
application_name=SLAVENAME'
recovery_target_timeline = 'latest'
primary_slot_name = 'repmgr_slot_1'


Any help troubleshooting this would be appreciated !






В списке pgsql-admin по дате отправления:

Предыдущее
От: Wells Oliver
Дата:
Сообщение: Re: 13.4 on RDS, SSL SYSCALL EOF on restore
Следующее
От: Ron
Дата:
Сообщение: Re: 13.4 on RDS, SSL SYSCALL EOF on restore