Re: Strange decreasing value of pg_last_wal_receive_lsn()
| От | Jehan-Guillaume de Rorthais |
|---|---|
| Тема | Re: Strange decreasing value of pg_last_wal_receive_lsn() |
| Дата | |
| Msg-id | 20200514184457.48d58ef5@firost обсуждение исходный текст |
| Ответ на | Re: Strange decreasing value of pg_last_wal_receive_lsn() (godjan • <g0dj4n@gmail.com>) |
| Ответы |
Re: Strange decreasing value of pg_last_wal_receive_lsn()
|
| Список | pgsql-hackers |
(please, the list policy is bottom-posting to keep history clean, thanks).
On Thu, 14 May 2020 07:18:33 +0500
godjan • <g0dj4n@gmail.com> wrote:
> -> Why do you kill -9 your standby?
> Hi, it’s Jepsen test for our HA solution. It checks that we don’t lose data
> in such situation.
OK. This test is highly useful to stress data high availability and durability,
of course. However, how useful is this test in a context of auto failover for
**service** high availability? If all your nodes are killed in the same
disaster, how/why an automatic cluster manager should take care of starting all
nodes again and pick the right node to promote?
> So, now we update logic as Michael said. All ha alive standbys now waiting
> for replaying all WAL that they have and after we use pg_last_replay_lsn() to
> choose which standby will be promoted in failover.
>
> It fixed out trouble, but there is one another. Now we should wait when all
> ha alive hosts finish replaying WAL to failover. It might take a while(for
> example WAL contains wal_record about splitting b-tree).
Indeed, this is the concern I wrote about yesterday in a second mail on this
thread.
> We are looking for options that will allow us to find a standby that contains
> all data and replay all WAL only for this standby before failover.
Note that when you promote a node, it first replays available WALs before
acting as a primary. So you can safely signal the promotion to the node and
wait for it to finish the replay and promote.
> Maybe you have ideas on how to keep the last actual value of
> pg_last_wal_receive_lsn()?
Nope, no clean and elegant idea. One your instances are killed, maybe you can
force flush the system cache (secure in-memory-only data) and read the latest
received WAL using pg_waldump?
But, what if some more data are available from archives, but not received from
streaming rep because of a high lag?
> As I understand WAL receiver doesn’t write to disk walrcv->flushedUpto.
I'm not sure to understand what you mean here.
pg_last_wal_receive_lsn() reports the actual value of walrcv->flushedUpto.
walrcv->flushedUpto reports the latest LSN force-flushed to disk.
> > On 13 May 2020, at 19:52, Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
> > wrote:
> >
> >
> > (too bad the history has been removed to keep context)
> >
> > On Fri, 8 May 2020 15:02:26 +0500
> > godjan • <g0dj4n@gmail.com> wrote:
> >
> >> I got it, thank you.
> >> Can you recommend what to use to determine which quorum standby should be
> >> promoted in such case? We planned to use pg_last_wal_receive_lsn() to
> >> determine which has fresh data but if it returns the beginning of the
> >> segment on both replicas we can’t determine which standby confirmed that
> >> write transaction to disk.
> >
> > Wait, pg_last_wal_receive_lsn() only decrease because you killed your
> > standby.
> >
> > pg_last_wal_receive_lsn() returns the value of walrcv->flushedUpto. The
> > later is set to the beginning of the segment requested only during the first
> > walreceiver startup or a timeline fork:
> >
> > /*
> > * If this is the first startup of walreceiver (on this timeline),
> > * initialize flushedUpto and latestChunkStart to the starting
> > point. */
> > if (walrcv->receiveStart == 0 || walrcv->receivedTLI != tli)
> > {
> > walrcv->flushedUpto = recptr;
> > walrcv->receivedTLI = tli;
> > walrcv->latestChunkStart = recptr;
> > }
> > walrcv->receiveStart = recptr;
> > walrcv->receiveStartTLI = tli;
> >
> > After a primary loss, as far as the standby are up and running, it is fine
> > to use pg_last_wal_receive_lsn().
> >
> > Why do you kill -9 your standby? Whay am I missing? Could you explain the
> > usecase you are working on to justify this?
> >
> > Regards,
>
>
>
--
Jehan-Guillaume de Rorthais
Dalibo
В списке pgsql-hackers по дате отправления: