Re: Streaming replication question

Поиск

Список

Период

Сортировка

От	Keith
Тема	Re: Streaming replication question
Дата	3 августа 2019 г. 18:22:31
Msg-id	CAHw75vu0MXEzdu04grea1=ow2hgMBTmsyN=BLC0Y0-U5jQWWMQ@mail.gmail.com обсуждение исходный текст
Ответ на	Streaming replication question (Mark Steben <mark.steben@drivedominion.com>)
Ответы	Re: Streaming replication question (Mark Steben <mark.steben@drivedominion.com>)
Список	pgsql-admin

Дерево обсуждения

On Sat, Aug 3, 2019 at 9:36 AM Mark Steben <mark.steben@drivedominion.com> wrote:

Good morning,
We run postgres 9.4. Early Saturday morning we had a production postgres outage because our pg_xlog directory ran out of space. Tracing the cause points to a scheduled reboot of the 3 database servers by our IT team to install some monitoring software. Because we did a simple stop/restart of the database our replication slots did not get reset and pg_logs queued up until the out-of-space condition

Can someone please verify that the following WOULD HAVE BEEN the appropriate action:
(Or offer corrections)
1. Stop the postgres database (on both slaves)
2. Run pg_drop_replication_slot (rep1, rep2 slotnames) (on Master)
3. Stop the postgres database (on master)
4. Reboot Linux Server to bring in monitoring software
5. Start the postgres database (on master)
6. Run pg_create_replication_slot (rep1, rep2 slotnames) (On Master)
7. Start the postgres database (on both slaves)

Thank you

--
Mark Steben
Database Administrator
@utoRevenue | Autobase
CRM division of Dominion Dealer Solutions
95D Ashley Ave.
West Springfield, MA 01089
t: 413.327-3045
f: 413.383-9567
www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
www.drivedominion.com

Restarting your databases should not have affected the replication slots like this. If you stopped the replicas first, then the primary would have just started keeping all the WAL files until the replicas came back, resuming from where they left off. If you stopped the primary first then the replicas would have just lost their connection until the primary came back and made its slots available again.

The only time the slots don't stick around is if you do a failover from the primary to one of the replicas. In that case, yes, you do have to recreate the slots. When things were rebooted, was there any failovers kicked off?

I'd check your postgres logs to see if there's any hint as to why things turned out the way they did. If all systems were rebooted at the exact same time, there may be some edge-case bug that's not being accounted for. But without a deeper dive into what happened, that would be hard to say and seems unlikely. I would say to make sure you are on the most recent version of 9.4 to ensure any known bug fixes are in place. Also highly recommend on planning to upgrade to a newer major version. 9.4 goes out of support this fall upon the release of version 12.

Keith

В списке pgsql-admin по дате отправления:

Предыдущее

От: Mark Steben
Дата: 03 августа 2019 г., 16:35:55
Сообщение: Streaming replication question

Следующее

От: Jānis Pūris
Дата: 04 августа 2019 г., 03:27:39
Сообщение: HAProxy + Patroni + pgBouncer High Availability setup

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Streaming replication question

Предыдущее

Следующее