Replication stats from slaves

Поиск
Список
Период
Сортировка
От Jehan-Guillaume de Rorthais
Тема Replication stats from slaves
Дата
Msg-id 20151204174555.4e1760ff@erg
обсуждение исходный текст
Список pgsql-hackers
Hi hackers,

We are developing and maintaining a Pacemaker resource agent for PostgreSQL
here. This agent is "stateful", ie. it knows where is the master and where are
the slaves. See [1]. Now that this resource agent version is close to what we
wanted to achieve, we will make some official announce soon, with details &
stuff. We welcome feedback, help issues, etc, but on github please, not in this
thread.

For the next version, we are now considering to improve the switchover
mechanism with appropriate checks for every steps. For reminder, the switchover
in PostgreSQL is possible since the following commit:
 commit 985bd7d49726c9f178558491d31a570d47340459 Author: Fujii Masao <fujii@postgresql.org> Date:   Wed Jun 26 02:14:37
2013+0900
 

It requires:
 (1) shutdown the master first (2) make sure the slave received the shutdown checkpoint from the old master (3) promote
theslave as master (4) start the old master as slave
 

The problem here is step (2). After discussing IRL with Magnus and
Heikki, they confirmed me checking this using pg_xlogdump is fine, eg.
(reformated):
 $ pg_xlogdump 000000010000000B00000014 -s 'B/14D845F8' -n1  rmgr: XLOG len (rec/tot): 72/104,  tx: 0,  lsn:
B/14D845F8,prev B/14D84590, bkp: 0000,  desc: checkpoint: redo B/14D845F8;  tli 1; prev tli 1; fpw true;  xid
0/6422462;   oid 1183646; multi 1; offset 0;  oldest xid 712 in DB 1; oldest multi 1 in DB 1; oldest running xid 0;
shutdown

This is possible from the resource agent point of view, but not really in a
clean way. It requires:
 * to keep in memory the last LSN of the master after shutdown * check on the slave this LSN has been received * check
therecord is a rmgr XLOG with a shutdown information as payload * check this is the very last WAL record received
(nothingafter).
 

First, looking at the last LSN and creating a cluster attribute (in
Pacemaker context) from the old master to share it with slaves is possible, but
not really elegant for a resource agent. Then, the -n1 in sample command here
avoid pg_xlogdump to exit with an error and a rc=1. But it is not compatible
with the last check (very last WAL record) and I need to check the command
succeed.


A best solution here would be to be able to check from a view on the slave, say
pg_stat_standby, when it was connected to the master for the last time, the last
wal restored by log shipping, last LSN received by streaming rep, flushed,
how/why the SR has been disconnected. As instance, reasons for SR
disconnection might be: master shutdown, too much lag, connection reset.

I can try to give a try to such patch after some acceptance and discussing
what exactly we should push in such view.

Comments? Guidance? Thoughts? Other solutions?

Thanks!

Regards,

[1] https://github.com/dalibo/pgsql-resource-agent/tree/master/multistate
-- 
Jehan-Guillaume de Rorthais
Dalibo
http://www.dalibo.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: pg_hba_lookup function to get all matching pg_hba.conf entries
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Random crud left behind by aborted TAP tests