Re: [ADMIN] avoiding split brain with repmgr

Поиск

Список

Период

Сортировка

От	Martin Goodson
Тема	Re: [ADMIN] avoiding split brain with repmgr
Дата	15 августа 2017 г. 10:28:24
Msg-id	8e97642e-b975-6131-d767-8cc755bcbb98@googlemail.com обсуждение исходный текст
Ответ на	Re: [ADMIN] avoiding split brain with repmgr (Aleksander Kamenik <aleksander.kamenik@gmail.com>)
Список	pgsql-admin

Дерево обсуждения

On 15/08/2017 06:57, Aleksander Kamenik wrote:
> I finally found this document NOT referenced from the main README file
> in the repmgr repo.
>
> https://github.com/2ndQuadrant/repmgr/blob/master/docs/repmgrd-node-fencing.md
>
> I guess the default solution is pgbouncer
>
> Any simpler solutions for this tricky problem?
>
> Regards,
>
> Aleksander

This is interesting to me, because I'm faced with a similar problem and
I'm not 100% sold on that fencing technique. If I've misunderstood
things please do yell at me (I welcome it :) ) but ...

The issue I have with that suggested mechanism, and I'd love to hear
suggestions on how to get around it because maybe I missed something
**horribly obvious**, is that repmgr doesn't seem to have a proper
stonith mechanism per se. It's all well and good repmgr being able to
send a message/command to something like pgbouncer, pgpool or whatever
saying 'Hey, server B is the master now so pay no attention to server A'
but that depends on those messages being received. What if, for reasons,
they're not?

Consider the following scenario:

You've two data centres, DC 1 and DC2. On each, you've got two or three
PostgreSQL nodes, a pgbouncer node, and a few application servers. The
Master is on one of the nodes in DC1.

At 3AM there's a power failure in DC1. It only lasts a few minutes, but
it's enough for repmgrd to decide to trigger failover to DC2.

One of the nodes on DC2 becomes master and the other standby(s) on DC2
start to follow it. As per the fencing method above, the repmgrd
promotion triggers a custom script to send instructions to the
pgbouncers in DC1 and DC2 to update their configurations to connect to
the new master in DC2. The pgbouncer in DC2 complies, the pgbouncr in
DC1 doesn't (because it's still down).

A few moments later, after the failover, power is restored/UPS kicks
in/the Ops team puts a coin in the meter. DC1 comes back up.

The master node in DC1 still believes it is the master. The pgbouncer
never got the message to update itself to follow the new master in DC2,
so it is still passing connections through to DC1. The other standby
nodes in DC1 never got the repmgrd command to follow a new master
either, as they were down. So they're still following the DC1 master.

You now have a master in DC1, with standby nodes following it, and a
pgbouncer passing along sessions from the application servers to the DC1
master. You also have a master in DC2, with standby nodes following it,
and a pgbouncer passing along sessions from the application servers to
the DC2 master.

Because DC1 was down at the time that repmgrd was sending along the 'Pay
no attention to DC1 master, update yourself to talk to DC2 instead'
message to the pgbouncers, surely you've now got a split brain scenario?

In the scenario I had we couldn't use a vip (for 'reasons' according to
our unix team :) ) so suggestions included JDBC connect strings with
multiple servers, load balancers, etc. But they'd still see two masters
at that point.

Without a proper mechanism for the 'old' master to be shut down to avoid
a split brain when it comes back up, everything seems to rely upon
repmgrd being able to successfully pass along the 'Pay no attention to
the old node' commands/messages. But what if it can't do that because
some of the servers were unable to receive the message/command?

Sure, maybe a DBA gets a fast page and is able to remote in and shut the
old master down mere minutes after it comes back up (in the ideal world)
but that's still a potential several minutes with a split brain, and
nothing (internal to the cluster, at least) preventing it.

Or am I missing something *really* obvious? If this is a possibility,
and I've not horribly misunderstood things, how can this scenario be
worked around? It seems to be a potential problem with the fencing
method suggested.

Regards,

M.
--
Martin Goodson

"Have you thought up some clever plan, Doctor?"
"Yes, Jamie, I believe I have."
"What're you going to do?"
"Bung a rock at it."

В списке pgsql-admin по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [ADMIN] avoiding split brain with repmgr