Re: Strange decreasing value of pg_last_wal_receive_lsn()

Поиск
Список
Период
Сортировка
От godjan •
Тема Re: Strange decreasing value of pg_last_wal_receive_lsn()
Дата
Msg-id 6E9C1C03-F7A5-49D2-ABE9-98D12C4804AF@gmail.com
обсуждение исходный текст
Ответ на Re: Strange decreasing value of pg_last_wal_receive_lsn()  (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
Ответы Re: Strange decreasing value of pg_last_wal_receive_lsn()  (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
Список pgsql-hackers
Hi, sorry for 2 weeks latency in answer :)

It fixed out trouble, but there is one another. Now we should wait when all
ha alive hosts finish replaying WAL to failover. It might take a while(for
example WAL contains wal_record about splitting b-tree).

Indeed, this is the concern I wrote about yesterday in a second mail on this
thread.

Actually, I found out that we use the wrong heuristic to understand that standby still replaying WAL.
We compare values of pg_last_wal_replay_lsn() after and before sleeping.
If standby replaying huge wal_record(e.g. splitting b-tree) it gave us the wrong result.


Note that when you promote a node, it first replays available WALs before
acting as a primary.

Do you know how Postgres understand that standby still replays available WAL?
I didn’t get it from the code of promotion.


However, how useful is this test in a context of auto failover for
**service** high availability?
Such a situation has a really low probability in our service. And we thought that it could be okay to resolve such a situation with on-call participation.

Nope, no clean and elegant idea. One your instances are killed, maybe you can
force flush the system cache (secure in-memory-only data)? 
Do "force flush the system cache” means invoke this command https://linux.die.net/man/8/sync on the standby?

and read the latest received WAL using pg_waldump?

I did an experiment with pg_waldump without sync:
- write data on primary 
- kill primary
read the latest received WAL using pg_waldump:
0/1D019F38
- pg_last_wal_replay_lsn():
0/1D019F68

So it’s wrong to use pg_waldump to understand what was latest received LSN. At least without “forcing flush system cache”.

If all your nodes are killed in the same
disaster, how/why an automatic cluster manager should take care of starting all
nodes again and pick the right node to promote?

1. How?
The automatic cluster manager will restart standbys in such a situation.
If the primary lock in ZK is released automatic cluster manager start process of election new primary.
To understand which node should be promoted automatic cluster manager should get LSN of the last wal_record wrote on disk by each potential new primary.
We used pg_last_wal_receive_lsn() for it but it was a mistake. Because after "kill -9” on standby pg_last_wal_receive_lsn() reports first lsn of last segment.

2. Why?
- sleepy on-call in a night can make something bad in such situation)
- pg_waldump didn’t give the last LSN wrote on disk(at least without forcing flush the system cache) so I don’t know how on-call can understand which standby should be promoted
- automatic cluster manager successfully resolve such situations in clusters with one determined synchronous standby for years, and we hope it’s possible to do it in clusters with quorum replication



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: Internal key management system
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: OpenSSL 3.0.0 compatibility